Gremlin python

From BITPlan Wiki
Jump to: navigation, search
id  gremlin-python-tutorial
owner  WolfgangFahl
title  Gremlin-Python mini tutorial
version  0.0.1
date  2019-09-17

python-logo-master-v3-TM.png This tutorial is intended to get you up and running using Gremlin / Apache Tinkerpop with Python. Basic knowledge of Python is assumed.

Do you already now Gremlin / Apache Tinkerpop?

If so you can continue with the preqrequisites part. Otherwise you might want to click on the Gremlin logo below.

Gremlin programming language.png

There is also an explanation of Gremlin steps based on Java in this wiki.

This mini-tutorial is inspired by this stackoverflow question.

The goal is to get access to an apache tinkerpop/gremlin graph database via Python.

The examples in this tutorial have been tested on Ubuntu 18.04 LTS and MacOS with a MacPorts environment as well as in the travis CI environment see


  1. Java
  2. Python
  3. Gremlin-Server
  4. Gremlin-Console (for debugging)

To get the preqequisites you can either follow the manual or script based installation below. The script based installation is quicker - the manual installation gives you more insight and control over the installation steps.

Manual Installation

Installing Java

There are many ways to install Java and your mileage may vary.

sudo apt-get install openjdk-8-jre 
java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

Installing Python and Pip

We assume you'd like to work with python 3.7

sudo apt install python3.7
python --version
Python 3.7.3
sudo apt install python-pip
pip --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.7)

Installing Gremlin-Python

sudo -H pip install -r requirements.txt

Installing Gremlin Server and Console

Download Gremlin Server and optionally Gremlin Console and unzip the downloaded files.

Starting the Gremlin Server

cd apache-tinkerpop-gremlin-server-3.4.3
bin/ conf/gremlin-server-modern.yaml

See #Gremlin-Server_start for the expected result.

Starting the Gremlin Console

cd apache-tinkerpop-gremlin-console-3.4.3

See #Gremlin-Console_start_.28for_debugging.29 for the expected result.

Script based installation

The "run" installation helper script tries to automate the necessary steps

  1. Installation
  2. Gremlin-Server start
  3. Gremlin-Console start (for debugging)
  4. Python script start

The following command should get you going:

git clone
./run -i
./run -s
# in another console
./run -p


./run -h
usage: ./run  [-c|-h|-i|-n|-p|-s|-t|-v]
  -c|--console: start console
  -h|--help: show this usage
  -i|--install: install prerequisites
  -n|--neo4j: start neo4j server
  -p|--python: start python trial code
  -s|--server: start server
  -t|--test: start pytest
  -v|--version: show version


./run -v
apache-tinkerpop-gremlin version 3.4.3


 run -i


  1. gremlin server
  2. gremlin console
  3. gremlin python module
checking prerequisites ...
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
Python 2.7.15+
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
installing needed python modules
Requirement already satisfied: futures in /usr/local/lib/python2.7/dist-packages (from -r requirements.txt (line 2))
Requirement already satisfied: gremlinpython in /usr/local/lib/python2.7/dist-packages (from -r requirements.txt (line 4))
Requirement already satisfied: isodate>=0.6.0 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: aenum>=1.4.5 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: tornado<5.0,>=4.4.1 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: certifi in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: singledispatch in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: backports-abc>=0.4 in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))

Gremlin-Server start

 ./run -s

starts the gremlin server with a default yaml-file in foreground

starting gremlin-server ...
[INFO] GremlinServer - 3.4.3
         (o o)

[INFO] GremlinServer - Configuring Gremlin Server from /home/wf/source/python/gremlin-python-tutorial/apache-tinkerpop-gremlin-server-3.4.3/conf/gremlin-server-modern.yaml
[INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
[INFO] DefaultGraphManager - Graph [graph] was successfully configured via [conf/].
[INFO] ServerGremlinExecutor - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
[INFO] ServerGremlinExecutor - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] OpLoader - Adding the standard OpProcessor.
[INFO] OpLoader - Adding the session OpProcessor.
[INFO] OpLoader - Adding the traversal OpProcessor.
[INFO] TraversalOpProcessor - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
[INFO] GremlinServer - Executing start up LifeCycleHook
[INFO] Logger$info - Loading 'modern' graph data.
[INFO] GremlinServer - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[INFO] GremlinServer - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0 with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0-stringd with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.

Quick way to stop server

If you ran the server in foreground you can stop it with "CTRL-C" in the console where you started it. Otherwise you can simply kill the corresponding process e.g. with:

pkill -9 -fl gremlin-server

Gremlin-Console start (for debugging)

 ./run -c

starts the gremlin console

starting gremlin-console ...
Sep 17, 2019 4:16:03 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.

         (o o)
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph

You can try out

gremlin>  :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/
:> g.V().values('name')
gremlin>  :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/] - type ':remote console' to return to local mode
gremlin> :exit

Python script start

./run -p

starts the python test script.

./run -p
starting python test code

Python unit tests start

./run -t

Starts the pytest unit tests. Please make sure a gremlin-server is running.

./run -t
==================================== test session starts =====================================
platform darwin -- Python 3.7.4, pytest-5.1.2, py-1.8.0, pluggy-0.12.0
rootdir: /Users/wf/source/python/gremlin-python-tutorial
collected 1 item                                                                      .                                                                          [100%]

===================================== 1 passed in 12.92s =====================================

Getting Started

The Apache Tinkerpop Getting Started tutorial assumes you are using the groovy console to try things out. We'll use these steps of the tutorial to show how the same traversals are available via gremlin-python.

The modern graph will be the basis tinkerpop-modern.png for our first steps.

Gremlin-Python is just a Gremlin Language Variant - this means that the Graph Traversals are not executed in the Python enviroment but instead sent as "bytecode" to a server that will execute the traversal and sent back the result.

The first five minutes is the relevant source code for this section.

g - the graph traversal

In the python environment to get the starting point "g" - the graph traversal you need to create a remote connection to a gremlin server. That's why we have to start the gremlin server e.g. with run -s from our automation script above. The gremlin server is configured to supply travesals for the "modern graph" example depicted above.

from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

In there is a helper class "RemoteTraversal" which allows to read the server configuration from a yaml file. In the tutorial examples the above code is reduced to

# see
from tutorial import remote

# initialize a remote traversal
g = remote.RemoteTraversal().g()

Steps 1 to 6

see The source code in github is slightly different since some gremlin-server providers do not work with the set of id's starting from 1. To keep things simple the original source code is shown here:

#gremlin> g.V() //(1)
#    ==>v[1]
#    ==>v[2]
#    ==>v[3]
#    ==>v[4]
#    ==>v[5]
#    ==>v[6]
def test_tutorial1():
  # get the vertices
  # we have a traversal now
  assert isinstance(gV,GraphTraversal)
  # convert it to a list to get the actual vertices
  # there should be 6 vertices
  assert len(vList)==6
  # the default string representation of a vertex is showing the id
  # of a vertex
  assert str(vList)=="[v[1], v[2], v[3], v[4], v[5], v[6]]"

#gremlin> g.V(1) //(2)
#    ==>v[1]
def test_tutorial2():
   assert str(g.V(1).toList())=="[v[1]]"

#gremlin> g.V(1).values('name') //3
#  ==>marko
def test_tutorial3():
    assert str( g.V(1).values('name').toList())=="['marko']"

#     gremlin> g.V(1).outE('knows') //4
#    ==>e[7][1-knows->2]
#    ==>e[8][1-knows->4]
def test_tutorial4():
    assert str(g.V(1).outE("knows").toList()) == "[e[7][1-knows->2], e[8][1-knows->4]]"

#    gremlin> g.V(1).outE('knows').inV().values('name') //5\
#    ==>vadas
#    ==>josh
def test_tutorial5():
    assert str(g.V(1).outE("knows").inV().values("name").toList())=="['vadas', 'josh']"

#     gremlin> g.V(1).out('knows').values('name') //6\
#    ==>vadas
#    ==>josh
def test_tutorial6():
    assert str(g.V(1).out("knows").values("name").toList())=="['vadas', 'josh']"

Loading and Saving a graph

Given that gremlin-python is a Gremlin Language Variant (GLV) and doesn't have it's own traversal implementation loading and saving graphs is a bit more tricky than in non-GLV environments.

For this tutorial we assume you only work with small, experimental, non-production graph databases. Be warned! We simply clear the whole graph when loading!

Loading the air-routes example

Kelvin Lawrence has a nice example in his tutorial - the is also available for this tutorial

from tutorial import remote
import os

# initialize a remote traversal
g = remote.RemoteTraversal().g()

# test loading a graph
def test_loadGraph():
   # make the local file accessible to the server
   # drop the existing content of the graph
   # read the content from the air routes example
   print ("%s has %d vertices" % (graphmlFile,vCount))
   assert vCount==47


Saving a graph

Let's create a graph containing a single node for the fish named Wanda and save it.

# test saving a graph
def test_saveGraph():
   # drop the existing content of the graph
   print("wrote graph to %s" % (graphmlPath))
   # check that the graphml file exists
   assert os.path.isfile(graphmlPath)

Creating a graphical representation of a graph

A simple way to visualize your graphs is using graphviz. There is a graphviz python module with documentation.

Example Graphviz Usage


# see
from tutorial import remote
from graphviz import Digraph
import os.path
from gremlin_python.process.traversal import T

# initialize a remote traversal
g = remote.RemoteTraversal().g()

# test creating a graphviz graph from the tinkerpop graph
def test_createGraphvizGraph():
    # make sure we re-load the tinkerpop modern example
    # start a graphviz
    dot = Digraph(comment='Modern')
    # get vertice properties including id and label as dicts
    for vDict in g.V().valueMap(True).toList():
        # uncomment to debug
        # print vDict
        # get id and label
        # greate a graphviz node label
        # name property is alway there
        gvLabel=r"%s\n%s\nname=%s" % (vId,vLabel,vDict["name"][0])
        # if there is an age property add it to the label
        if "age" in vDict:
            gvLabel=gvLabel+r"\nage=%s" % (vDict["age"][0])
        # create a graphviz node
        dot.node("node%d" % (vId),gvLabel)
    # loop over all edges
    for e in g.E():
        # get the detail information with a second call per edge (what a pitty to be so inefficient ...)
        # uncomment if you'd like to debug
        # print (e,eDict)
        # create a graphviz label
        geLabel=r"%s\n%s\nweight=%s" % (,e.label,eDict["weight"])
        # add a graphviz edge
        dot.edge("node%d" % (,"node%d" % (,label=geLabel)
    # modify the styling see
    # print the source code
    print (dot.source)
    # render without viewing - default is creating a pdf file
    dot.render('/tmp/modern.gv', view=False)
    # check that the pdf file exists
    assert os.path.isfile('/tmp/modern.gv.pdf')

# call the test

Resutling graphviz dot source

// Modern
digraph {
	node [fillcolor="#A8D0E4" style=filled]
	edge [arrowsize=2 penwidth=2]
	node1 [label="1\nperson\nname=marko\nage=29"]
	node2 [label="2\nperson\nname=vadas\nage=27"]
	node3 [label="3\nsoftware\nname=lop"]
	node4 [label="4\nperson\nname=josh\nage=32"]
	node5 [label="5\nsoftware\nname=ripple"]
	node6 [label="6\nperson\nname=peter\nage=35"]
	node1 -> node2 [label="7\nknows\nweight=0.5"]
	node1 -> node4 [label="8\nknows\nweight=1.0"]
	node1 -> node3 [label="9\ncreated\nweight=0.4"]
	node4 -> node5 [label="10\ncreated\nweight=1.0"]
	node4 -> node3 [label="11\ncreated\nweight=0.4"]
	node6 -> node3 [label="12\ncreated\nweight=0.2"]

Resulting pdf file

If you set "view=True" the pdf display will be directly initiated from the python script. load PDF

Connecting to Gremlin enabled graph databases

According to the Gremlin Wiki page there are few different graph databases out there that support Gremlin/Apache Tinkerpop. We'll try to connect to a few of these using gremlin-python.

  • ❌ means we didn't get it to work even after trying
  • ❓ we didn't test it yet
  • ✅ means we got it working

Amazon Neptune ❓

Blazegraph ❓

Cosmos ❓

DataStax ❌


docker pull $image
docker run --name datastax  -e DS_LICENSE=accept -p 8182:8182 $image

JanusGraph ❌


  1. Downloaded 275 MByte - unzipped and started bin/ (already given several error messages)
  2. followed getting started procedure above
  3. started bin/
graph ='conf/')
17:41:38 WARN  - Unable to determine Elasticsearch server version. Default to FIVE. Connection refused

Neo4J ❌


scripts/runNeo4j -rc
./run -n
ln -f Neo4j.yaml server.yaml 
./run -t



Does unfortunately show no results ...

OrientDB ❌


docker pull orientdb:3.0.23-tp3
docker run -d --name odbtp3 -p 2424:2424 -p 2480:2480 -p 8182:8182 -e ORIENTDB_ROOT_PASSWORD=rootpwd orientdb:3.0.23-tp3
ln -f OrientDB.yaml server.yaml
./run -t

Tests fail see: