Difference between revisions of "Gremlin python"
(54 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | =OsProject= | ||
+ | |||
{{OsProject | {{OsProject | ||
|id=gremlin-python-tutorial | |id=gremlin-python-tutorial | ||
+ | |state=active | ||
|owner=WolfgangFahl | |owner=WolfgangFahl | ||
|title=Gremlin-Python mini tutorial | |title=Gremlin-Python mini tutorial | ||
|url=https://github.com/WolfgangFahl/gremlin-python-tutorial | |url=https://github.com/WolfgangFahl/gremlin-python-tutorial | ||
− | |version=0.0. | + | |version=0.0.6 |
− | |date=2019-09-17 | + | |date=2023-07-04 |
+ | |since=2019-09-17 | ||
|storemode=property | |storemode=property | ||
}} | }} | ||
+ | =tickets= | ||
https://www.python.org/static/community_logos/python-logo-master-v3-TM.png | https://www.python.org/static/community_logos/python-logo-master-v3-TM.png | ||
This tutorial is intended to get you up and running using Gremlin / Apache Tinkerpop with [https://www.python.org/ Python]. | This tutorial is intended to get you up and running using Gremlin / Apache Tinkerpop with [https://www.python.org/ Python]. | ||
Line 17: | Line 22: | ||
[[File:Gremlin programming language.png|300px|left|link=https://tinkerpop.apache.org/gremlin.html]] | [[File:Gremlin programming language.png|300px|left|link=https://tinkerpop.apache.org/gremlin.html]] | ||
− | There is also an explanation of {{Link|target=Gremlin}} steps based on Java in this wiki. | + | There is also an explanation of |
+ | |||
+ | =Freitext= | ||
+ | |||
+ | =Freitext= | ||
+ | |||
+ | =Freitext= | ||
+ | {{Link|target=Gremlin}} steps based on Java in this wiki. | ||
This mini-tutorial is inspired by [https://stackoverflow.com/questions/57936915/how-do-i-get-gremlin-python-with-gremlin-server-3-4-3-to-work this stackoverflow question]. | This mini-tutorial is inspired by [https://stackoverflow.com/questions/57936915/how-do-i-get-gremlin-python-with-gremlin-server-3-4-3-to-work this stackoverflow question]. | ||
Line 36: | Line 48: | ||
=== Installing Java === | === Installing Java === | ||
There are many ways to install Java and your mileage may vary. | There are many ways to install Java and your mileage may vary. | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1-2"> |
sudo apt-get install openjdk-8-jre | sudo apt-get install openjdk-8-jre | ||
java -version | java -version | ||
Line 45: | Line 57: | ||
=== Installing Python and Pip === | === Installing Python and Pip === | ||
− | We assume you'd like to work with python 3. | + | We assume you'd like to work with python 3.x |
− | <source lang='bash'> | + | <source lang='bash' highlight="1-2,4-5"> |
− | sudo apt install python3. | + | sudo apt install python3.10 |
python --version | python --version | ||
− | Python 3. | + | Python 3.10.8 |
− | sudo apt install | + | sudo apt install python3-pip |
pip --version | pip --version | ||
− | pip | + | pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10) |
</source> | </source> | ||
=== Installing Gremlin-Python === | === Installing Gremlin-Python === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1-3"> |
− | + | git clone https://github.com/WolfgangFahl/gremlin-python-tutorial | |
+ | cd gremlin-python-tutorial/ | ||
+ | pip install. | ||
</source> | </source> | ||
+ | |||
=== Installing Gremlin Server and Console === | === Installing Gremlin Server and Console === | ||
− | * http://tinkerpop.apache.org/ | + | * http://tinkerpop.apache.org/download.html |
Download Gremlin Server and optionally Gremlin Console and unzip the downloaded files. | Download Gremlin Server and optionally Gremlin Console and unzip the downloaded files. | ||
+ | |||
=== Starting the Gremlin Server === | === Starting the Gremlin Server === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1-2"> |
− | cd apache-tinkerpop-gremlin-server-3. | + | cd apache-tinkerpop-gremlin-server-3.6.3 |
bin/gremlin-server.sh conf/gremlin-server-modern.yaml | bin/gremlin-server.sh conf/gremlin-server-modern.yaml | ||
</source> | </source> | ||
Line 70: | Line 86: | ||
=== Starting the Gremlin Console === | === Starting the Gremlin Console === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1-2"> |
− | cd apache-tinkerpop-gremlin-console-3. | + | cd apache-tinkerpop-gremlin-console-3.6.3 |
bin/gremlin.sh | bin/gremlin.sh | ||
</source> | </source> | ||
Line 84: | Line 100: | ||
The following command should get you going: | The following command should get you going: | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1-3,5"> |
git clone https://github.com/WolfgangFahl/gremlin-python-tutorial | git clone https://github.com/WolfgangFahl/gremlin-python-tutorial | ||
− | ./run -i | + | ./scripts/run -i |
− | ./run -s | + | ./scripts/run -s |
# in another console | # in another console | ||
./run -p | ./run -p | ||
Line 93: | Line 109: | ||
=== Help === | === Help === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
− | usage: ./run [-c|-h|-i|-p|-s|-t|-v] | + | scripts/run -h |
+ | usage: ./run [-c|-h|-i|-n|-p|-s|-t|-v] | ||
-c|--console: start console | -c|--console: start console | ||
-h|--help: show this usage | -h|--help: show this usage | ||
-i|--install: install prerequisites | -i|--install: install prerequisites | ||
+ | -n|--neo4j: start neo4j server | ||
-p|--python: start python trial code | -p|--python: start python trial code | ||
-s|--server: start server | -s|--server: start server | ||
Line 105: | Line 123: | ||
=== Version === | === Version === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
− | + | scripts/run -v | |
− | apache-tinkerpop-gremlin version 3. | + | apache-tinkerpop-gremlin version 3.6.3 |
</source> | </source> | ||
===Installation=== | ===Installation=== | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
run -i | run -i | ||
</source> | </source> | ||
Line 146: | Line 164: | ||
===Gremlin-Server start=== | ===Gremlin-Server start=== | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
− | + | scripts/run -s | |
</source> | </source> | ||
starts the gremlin server with a default yaml-file in foreground | starts the gremlin server with a default yaml-file in foreground | ||
Line 190: | Line 208: | ||
===Gremlin-Console start (for debugging)=== | ===Gremlin-Console start (for debugging)=== | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
− | + | scripts/run -c | |
</source> | </source> | ||
starts the gremlin console | starts the gremlin console | ||
Line 231: | Line 249: | ||
===Python script start=== | ===Python script start=== | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
./run -p | ./run -p | ||
</source> | </source> | ||
starts the python test script. | starts the python test script. | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
./run -p | ./run -p | ||
starting python test code | starting python test code | ||
− | + | g.V().count=6 | |
+ | g.E().count=6 | ||
</source> | </source> | ||
+ | |||
=== Python unit tests start === | === Python unit tests start === | ||
− | <source lang='bash'> | + | <source lang='bash' highlight="1"> |
./run -t | ./run -t | ||
</source> | </source> | ||
Line 256: | Line 276: | ||
===================================== 1 passed in 12.92s ===================================== | ===================================== 1 passed in 12.92s ===================================== | ||
</source> | </source> | ||
+ | |||
= Getting Started = | = Getting Started = | ||
The [http://tinkerpop.apache.org/docs/3.4.3/tutorials/getting-started/ Apache Tinkerpop Getting Started tutorial] assumes you are using the groovy console to try things out. We'll use these steps of the tutorial to show how the same traversals are available via gremlin-python. | The [http://tinkerpop.apache.org/docs/3.4.3/tutorials/getting-started/ Apache Tinkerpop Getting Started tutorial] assumes you are using the groovy console to try things out. We'll use these steps of the tutorial to show how the same traversals are available via gremlin-python. | ||
Line 268: | Line 289: | ||
In the python environment to get the starting point "g" - the graph traversal you need to create a remote connection to a gremlin server. That's why we have to start the gremlin server e.g. with <nowiki>run -s</nowiki> from our automation script above. The gremlin server is configured to supply travesals for the "modern graph" example depicted above. | In the python environment to get the starting point "g" - the graph traversal you need to create a remote connection to a gremlin server. That's why we have to start the gremlin server e.g. with <nowiki>run -s</nowiki> from our automation script above. The gremlin server is configured to supply travesals for the "modern graph" example depicted above. | ||
− | <source lang='python'> | + | <source lang='python' highlight="4"> |
from gremlin_python.process.anonymous_traversal import traversal | from gremlin_python.process.anonymous_traversal import traversal | ||
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection | from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection | ||
Line 276: | Line 297: | ||
In https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/tutorial/remote.py there is a helper class "RemoteTraversal" which allows to read the server configuration from a yaml file. | In https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/tutorial/remote.py there is a helper class "RemoteTraversal" which allows to read the server configuration from a yaml file. | ||
In the tutorial examples the above code is reduced to | In the tutorial examples the above code is reduced to | ||
− | <source lang='python'> | + | <source lang='python' highlight="5"> |
# see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_001.py | # see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_001.py | ||
from tutorial import remote | from tutorial import remote | ||
Line 286: | Line 307: | ||
=== Steps 1 to 6 === | === Steps 1 to 6 === | ||
see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_002_tutorial.py | see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_002_tutorial.py | ||
+ | The source code in github is slightly different since some gremlin-server providers do not work with the set of id's starting from 1. To keep things simple the original source code is shown here: | ||
<source lang='python'> | <source lang='python'> | ||
# http://wiki.bitplan.com/index.php/Gremlin_python#g.V.28.29_-_the_vertices | # http://wiki.bitplan.com/index.php/Gremlin_python#g.V.28.29_-_the_vertices | ||
Line 336: | Line 358: | ||
assert str(g.V(1).out("knows").values("name").toList())=="['vadas', 'josh']" | assert str(g.V(1).out("knows").values("name").toList())=="['vadas', 'josh']" | ||
</source> | </source> | ||
+ | |||
+ | = Loading and Saving a graph = | ||
+ | Given that gremlin-python is a Gremlin Language Variant (GLV) and doesn't have it's own traversal implementation loading and saving graphs is a bit more tricky than in non-GLV environments. | ||
+ | |||
+ | For this tutorial we assume you only work with small, experimental, non-production graph databases. Be warned! We simply clear the whole graph when loading! | ||
+ | == Loading the air-routes example == | ||
+ | Kelvin Lawrence has a nice example in his tutorial - the https://github.com/krlawrence/graph/blob/master/sample-data/air-routes-small.graphml | ||
+ | is also available for this tutorial | ||
+ | <source lang='python'> | ||
+ | from tutorial import remote | ||
+ | import os | ||
+ | |||
+ | # initialize a remote traversal | ||
+ | g = remote.RemoteTraversal().g() | ||
+ | |||
+ | # test loading a graph | ||
+ | def test_loadGraph(): | ||
+ | graphmlFile="air-routes-small.xml"; | ||
+ | # make the local file accessible to the server | ||
+ | airRoutesPath=os.path.abspath(graphmlFile) | ||
+ | # drop the existing content of the graph | ||
+ | g.V().drop().iterate() | ||
+ | # read the content from the air routes example | ||
+ | g.io(airRoutesPath).read().iterate() | ||
+ | vCount=g.V().count().next() | ||
+ | print ("%s has %d vertices" % (graphmlFile,vCount)) | ||
+ | assert vCount==47 | ||
+ | |||
+ | test_loadGraph() | ||
+ | </source> | ||
+ | == Saving a graph == | ||
+ | Let's create a graph containing a single node for the fish named Wanda and save it. | ||
+ | <source lang='python'> | ||
+ | # test saving a graph | ||
+ | def test_saveGraph(): | ||
+ | graphmlPath="/tmp/A-Fish-Named-Wanda.xml" | ||
+ | # drop the existing content of the graph | ||
+ | g.V().drop().iterate() | ||
+ | g.addV("Fish").property("name","Wanda").iterate() | ||
+ | g.io(graphmlPath).write().iterate() | ||
+ | print("wrote graph to %s" % (graphmlPath)) | ||
+ | # check that the graphml file exists | ||
+ | assert os.path.isfile(graphmlPath) | ||
+ | </source> | ||
+ | |||
+ | = Creating a graphical representation of a graph = | ||
+ | A simple way to visualize your graphs is using [http://www.graphviz.org/ graphviz]. | ||
+ | There is a [https://pypi.org/project/graphviz/ graphviz python module] with [https://graphviz.readthedocs.io/en/stable/ documentation]. | ||
+ | |||
+ | == Example Graphviz Usage == | ||
+ | see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_005_graphviz.py | ||
+ | <source lang='python'> | ||
+ | # see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_005_graphviz.py | ||
+ | from tutorial import remote | ||
+ | from graphviz import Digraph | ||
+ | import os.path | ||
+ | from gremlin_python.process.traversal import T | ||
+ | |||
+ | # initialize a remote traversal | ||
+ | g = remote.RemoteTraversal().g() | ||
+ | |||
+ | # test creating a graphviz graph from the tinkerpop graph | ||
+ | def test_createGraphvizGraph(): | ||
+ | # make sure we re-load the tinkerpop modern example | ||
+ | remoteTraversal=remote.RemoteTraversal() | ||
+ | remoteTraversal.load("tinkerpop-modern.xml") | ||
+ | # start a graphviz | ||
+ | dot = Digraph(comment='Modern') | ||
+ | # get vertice properties including id and label as dicts | ||
+ | for vDict in g.V().valueMap(True).toList(): | ||
+ | # uncomment to debug | ||
+ | # print vDict | ||
+ | # get id and label | ||
+ | vId=vDict[T.id] | ||
+ | vLabel=vDict[T.label] | ||
+ | # greate a graphviz node label | ||
+ | # name property is alway there | ||
+ | gvLabel=r"%s\n%s\nname=%s" % (vId,vLabel,vDict["name"][0]) | ||
+ | # if there is an age property add it to the label | ||
+ | if "age" in vDict: | ||
+ | gvLabel=gvLabel+r"\nage=%s" % (vDict["age"][0]) | ||
+ | # create a graphviz node | ||
+ | dot.node("node%d" % (vId),gvLabel) | ||
+ | # loop over all edges | ||
+ | for e in g.E(): | ||
+ | # get the detail information with a second call per edge (what a pitty to be so inefficient ...) | ||
+ | eDict=g.E(e.id).valueMap(True).next() | ||
+ | # uncomment if you'd like to debug | ||
+ | # print (e,eDict) | ||
+ | # create a graphviz label | ||
+ | geLabel=r"%s\n%s\nweight=%s" % (e.id,e.label,eDict["weight"]) | ||
+ | # add a graphviz edge | ||
+ | dot.edge("node%d" % (e.outV.id),"node%d" % (e.inV.id),label=geLabel) | ||
+ | # modify the styling see http://www.graphviz.org/doc/info/attrs.html | ||
+ | dot.edge_attr.update(arrowsize='2',penwidth='2') | ||
+ | dot.node_attr.update(style='filled',fillcolor="#A8D0E4") | ||
+ | # print the source code | ||
+ | print (dot.source) | ||
+ | # render without viewing - default is creating a pdf file | ||
+ | dot.render('/tmp/modern.gv', view=False) | ||
+ | # check that the pdf file exists | ||
+ | assert os.path.isfile('/tmp/modern.gv.pdf') | ||
+ | |||
+ | # call the test | ||
+ | test_createGraphvizGraph() | ||
+ | </source> | ||
+ | === Resutling graphviz dot source === | ||
+ | <source lang='bash'> | ||
+ | // Modern | ||
+ | digraph { | ||
+ | node [fillcolor="#A8D0E4" style=filled] | ||
+ | edge [arrowsize=2 penwidth=2] | ||
+ | node1 [label="1\nperson\nname=marko\nage=29"] | ||
+ | node2 [label="2\nperson\nname=vadas\nage=27"] | ||
+ | node3 [label="3\nsoftware\nname=lop"] | ||
+ | node4 [label="4\nperson\nname=josh\nage=32"] | ||
+ | node5 [label="5\nsoftware\nname=ripple"] | ||
+ | node6 [label="6\nperson\nname=peter\nage=35"] | ||
+ | node1 -> node2 [label="7\nknows\nweight=0.5"] | ||
+ | node1 -> node4 [label="8\nknows\nweight=1.0"] | ||
+ | node1 -> node3 [label="9\ncreated\nweight=0.4"] | ||
+ | node4 -> node5 [label="10\ncreated\nweight=1.0"] | ||
+ | node4 -> node3 [label="11\ncreated\nweight=0.4"] | ||
+ | node6 -> node3 [label="12\ncreated\nweight=0.2"] | ||
+ | } | ||
+ | </source> | ||
+ | === Resulting pdf file === | ||
+ | If you set "view=True" the pdf display will be directly initiated from the python script. | ||
+ | <pdf>modern2019-09-25.pdf</pdf> | ||
= Connecting to Gremlin enabled graph databases = | = Connecting to Gremlin enabled graph databases = | ||
Line 345: | Line 496: | ||
== Amazon Neptune ❓ == | == Amazon Neptune ❓ == | ||
* https://docs.aws.amazon.com/de_de/neptune/latest/userguide/access-graph-gremlin-node-js.html | * https://docs.aws.amazon.com/de_de/neptune/latest/userguide/access-graph-gremlin-node-js.html | ||
+ | == Blazegraph ❓ == | ||
+ | |||
+ | == Cosmos ❓ == | ||
+ | * https://docs.microsoft.com/de-de/azure/cosmos-db/gremlin-support | ||
+ | |||
== DataStax ❌ == | == DataStax ❌ == | ||
=== Trial === | === Trial === | ||
Line 354: | Line 510: | ||
</source> | </source> | ||
− | == JanusGraph | + | == JanusGraph ✅ == |
* https://docs.janusgraph.org/#getting-started | * https://docs.janusgraph.org/#getting-started | ||
* https://github.com/JanusGraph/janusgraph/releases | * https://github.com/JanusGraph/janusgraph/releases | ||
* https://github.com/sunsided/janusgraph-docker | * https://github.com/sunsided/janusgraph-docker | ||
− | === Trial === | + | * https://docs.janusgraph.org/connecting/python/ |
+ | === 3. Trial === | ||
+ | <source lang='bash'> | ||
+ | docker run -it -p 8182:8182 --mount src=<path to graphdata>,target=/graphdata,type=bind janusgraph/janusgraph | ||
+ | </source> | ||
+ | see https://stackoverflow.com/a/60964495/1497139 | ||
+ | |||
+ | With a bash your can check for available files | ||
+ | <source lang='bash' highlight="1-2"> | ||
+ | docker run -it janusgraph/janusgraph /bin/bash | ||
+ | root@8542ed1b8232:/opt/janusgraph# ls data | ||
+ | grateful-dead-janusgraph-schema.groovy tinkerpop-crew-typed.json | ||
+ | grateful-dead-typed.json tinkerpop-crew-v2d0-typed.json | ||
+ | grateful-dead-v2d0-typed.json tinkerpop-crew-v2d0.json | ||
+ | grateful-dead-v2d0.json tinkerpop-crew.json | ||
+ | grateful-dead.json tinkerpop-crew.kryo | ||
+ | grateful-dead.kryo tinkerpop-modern-typed.json | ||
+ | grateful-dead.txt tinkerpop-modern-v2d0-typed.json | ||
+ | grateful-dead.xml tinkerpop-modern-v2d0.json | ||
+ | script-input-grateful-dead.groovy tinkerpop-modern.json | ||
+ | script-input-tinkerpop.groovy tinkerpop-modern.kryo | ||
+ | tinkerpop-classic-typed.json tinkerpop-modern.xml | ||
+ | tinkerpop-classic-v2d0-typed.json tinkerpop-sink-typed.json | ||
+ | tinkerpop-classic-v2d0.json tinkerpop-sink-v2d0-typed.json | ||
+ | tinkerpop-classic.json tinkerpop-sink-v2d0.json | ||
+ | tinkerpop-classic.kryo tinkerpop-sink.json | ||
+ | tinkerpop-classic.txt tinkerpop-sink.kryo | ||
+ | tinkerpop-classic.xml | ||
+ | </source> | ||
+ | |||
+ | for a test i choose tinkerpop-modern.xml: | ||
+ | <source lang='python'> | ||
+ | file="data/tinkerpop-modern.xml"; | ||
+ | g.io(file).read().iterate() | ||
+ | vCount=g.V().count().next() | ||
+ | print ("%s has %d vertices" % (file,vCount)) | ||
+ | assert vCount==6 | ||
+ | </source> | ||
+ | which works. Thanks to Kelvin Lawrence for his comment on stackoverflow! | ||
+ | |||
+ | To make "external" data available to the docker image the --mount option can be used: | ||
+ | <source lang='bash'> | ||
+ | docker run -it -p 8182:8182 --mount src=<path to graphdata>,target=/graphdata,type=bind janusgraph/janusgraph | ||
+ | </source> | ||
+ | The following helper class helps sharing files: | ||
+ | |||
+ | ==== RemoteGremlin ==== | ||
+ | see also {{Link|target=Pyjanusgraph}} | ||
+ | <source lang='python'> | ||
+ | ''' | ||
+ | Created on 2020-03-30 | ||
+ | |||
+ | @author: wf | ||
+ | ''' | ||
+ | from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection | ||
+ | from gremlin_python.structure.graph import Graph | ||
+ | from shutil import copyfile | ||
+ | import os | ||
+ | |||
+ | class RemoteGremlin(object): | ||
+ | ''' | ||
+ | helper for remote gremlin connections | ||
+ | ''' | ||
+ | |||
+ | def __init__(self, server, port=8182): | ||
+ | ''' | ||
+ | construct me with the given server and port | ||
+ | ''' | ||
+ | self.server=server | ||
+ | self.port=port | ||
+ | |||
+ | def sharepoint(self,sharepoint,sharepath): | ||
+ | ''' | ||
+ | set up the sharepoint | ||
+ | ''' | ||
+ | self.sharepoint=sharepoint | ||
+ | self.sharepath=sharepath | ||
+ | |||
+ | |||
+ | def share(self,file): | ||
+ | ''' | ||
+ | share the given file and return the path as seen by the server | ||
+ | ''' | ||
+ | fbase=os.path.basename(file) | ||
+ | copyfile(file,self.sharepoint+fbase) | ||
+ | return self.sharepath+fbase | ||
+ | |||
+ | def open(self): | ||
+ | ''' | ||
+ | open the remote connection | ||
+ | ''' | ||
+ | self.graph = Graph() | ||
+ | self.url='ws://%s:%s/gremlin' % (self.server,self.port) | ||
+ | self.connection = DriverRemoteConnection(self.url, 'g') | ||
+ | # The connection should be closed on shut down to close open connections with connection.close() | ||
+ | self.g = self.graph.traversal().withRemote(self.connection) | ||
+ | |||
+ | def close(self): | ||
+ | ''' | ||
+ | close the remote connection | ||
+ | ''' | ||
+ | self.connection.close() | ||
+ | </source> | ||
+ | |||
+ | ====python unit test ==== | ||
+ | <source lang='python'> | ||
+ | |||
+ | ''' | ||
+ | Created on 2020-03-28 | ||
+ | |||
+ | @author: wf | ||
+ | ''' | ||
+ | import unittest | ||
+ | from tp.gremlin import RemoteGremlin | ||
+ | |||
+ | class JanusGraphTest(unittest.TestCase): | ||
+ | ''' | ||
+ | test access to a janus graph docker instance via the RemoteGremlin helper class | ||
+ | ''' | ||
+ | |||
+ | def setUp(self): | ||
+ | pass | ||
+ | |||
+ | |||
+ | def tearDown(self): | ||
+ | pass | ||
+ | |||
+ | def test_loadGraph(self): | ||
+ | # change to your server | ||
+ | rg=RemoteGremlin("capri.bitplan.com") | ||
+ | rg.open() | ||
+ | # change to your shared path | ||
+ | rg.sharepoint("/Volumes/bitplan/user/wf/graphdata/","/graphdata/") | ||
+ | g=rg.g | ||
+ | graphmlFile="air-routes-small.xml"; | ||
+ | shared=rg.share(graphmlFile) | ||
+ | # drop the existing content of the graph | ||
+ | g.V().drop().iterate() | ||
+ | # read the content from the air routes example | ||
+ | g.io(shared).read().iterate() | ||
+ | vCount=g.V().count().next() | ||
+ | print ("%s has %d vertices" % (shared,vCount)) | ||
+ | assert vCount==47 | ||
+ | |||
+ | |||
+ | if __name__ == "__main__": | ||
+ | #import sys;sys.argv = ['', 'Test.testName'] | ||
+ | unittest.main() | ||
+ | </source> | ||
+ | |||
+ | === 2. Trial === | ||
+ | * https://github.com/JanusGraph/janusgraph-docker | ||
+ | <source lang='bash' highlight='1'> | ||
+ | docker run --rm --name janusgraph-default janusgraph/janusgraph:latest | ||
+ | waiting for storage ... | ||
+ | waiting for storage ... | ||
+ | waiting for storage ... | ||
+ | waiting for storage ... | ||
+ | ... | ||
+ | GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory] | ||
+ | java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory] | ||
+ | at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82) | ||
+ | ... | ||
+ | Caused by: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1 | ||
+ | at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:378) | ||
+ | at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264) | ||
+ | at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:460) | ||
+ | ... 24 more | ||
+ | 4438 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182. | ||
+ | </source> | ||
+ | When trying to connect with python via | ||
+ | <source lang='python'> | ||
+ | def testJanusGraph(self): | ||
+ | graph = Graph() | ||
+ | connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g') | ||
+ | # The connection should be closed on shut down to close open connections with connection.close() | ||
+ | g = graph.traversal().withRemote(connection) | ||
+ | # Reuse 'g' across the application | ||
+ | herculesAge = g.V().has('name', 'hercules').values('age').next() | ||
+ | print('Hercules is {} years old.'.format(herculesAge)) | ||
+ | pass | ||
+ | </source> | ||
+ | the result is | ||
+ | <pre> | ||
+ | File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado/concurrent.py", line 238, in result | ||
+ | raise_exc_info(self._exc_info) | ||
+ | File "<string>", line 4, in raise_exc_info | ||
+ | ConnectionRefusedError: [Errno 61] Connection refused | ||
+ | </pre> | ||
+ | === 1. Trial === | ||
# Downloaded 275 MByte janusgraph-0.4.0-hadoop2.zip - unzipped and started bin/gremlin-server.sh (already given several error messages) | # Downloaded 275 MByte janusgraph-0.4.0-hadoop2.zip - unzipped and started bin/gremlin-server.sh (already given several error messages) | ||
# followed getting started procedure above | # followed getting started procedure above | ||
Line 368: | Line 713: | ||
</source> | </source> | ||
− | == Neo4J | + | == Neo4J ❌ == |
* https://stackoverflow.com/questions/47843862/how-do-i-connect-to-a-remote-neo4j-database-using-gremlin-python | * https://stackoverflow.com/questions/47843862/how-do-i-connect-to-a-remote-neo4j-database-using-gremlin-python | ||
* https://community.neo4j.com/t/neo4j-gremlin-integration/8144 | * https://community.neo4j.com/t/neo4j-gremlin-integration/8144 | ||
Line 378: | Line 723: | ||
ln -f Neo4j.yaml server.yaml | ln -f Neo4j.yaml server.yaml | ||
./run -t | ./run -t | ||
− | |||
− | |||
− | |||
− | |||
− | |||
</source> | </source> | ||
==== Visualization ==== | ==== Visualization ==== | ||
Line 393: | Line 733: | ||
== OrientDB ❌ == | == OrientDB ❌ == | ||
* https://github.com/orientechnologies/orientdb-gremlin/issues/143 | * https://github.com/orientechnologies/orientdb-gremlin/issues/143 | ||
+ | * https://github.com/orientechnologies/orientdb-gremlin/issues/146 | ||
+ | * https://stackoverflow.com/questions/49646876/gremlin-server-connect-to-orient-db | ||
+ | * https://stackoverflow.com/questions/50948180/use-python-with-orientdb-and-gremlin-server | ||
+ | * https://orientdb.com/docs/3.0.x/tinkerpop3/OrientDB-TinkerPop3.html | ||
+ | * https://github.com/orientechnologies/orientdb-docker/blob/master/3.0-tp3/x86_64/alpine/gremlin-server.yaml | ||
+ | * https://github.com/orientechnologies/orientdb-gremlin | ||
see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/scripts/runOrientDB | see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/scripts/runOrientDB | ||
+ | |||
<source lang='bash'> | <source lang='bash'> | ||
# https://hub.docker.com/_/orientdb | # https://hub.docker.com/_/orientdb | ||
docker pull orientdb:3.0.23-tp3 | docker pull orientdb:3.0.23-tp3 | ||
docker run -d --name odbtp3 -p 2424:2424 -p 2480:2480 -p 8182:8182 -e ORIENTDB_ROOT_PASSWORD=rootpwd orientdb:3.0.23-tp3 | docker run -d --name odbtp3 -p 2424:2424 -p 2480:2480 -p 8182:8182 -e ORIENTDB_ROOT_PASSWORD=rootpwd orientdb:3.0.23-tp3 | ||
+ | ln -f OrientDB.yaml server.yaml | ||
+ | ./run -t | ||
</source> | </source> | ||
+ | Tests fail see: | ||
+ | * https://github.com/orientechnologies/orientdb-gremlin/issues/167 | ||
= Links = | = Links = | ||
Line 406: | Line 757: | ||
* http://tinkerpop.apache.org/docs/3.4.3/reference/#connecting-via-console | * http://tinkerpop.apache.org/docs/3.4.3/reference/#connecting-via-console | ||
* https://gist.githubusercontent.com/okram/f193d5616563a69ad5714a42c504276f/raw/b8075410e400e18f18360015945f3760d99d044a/gremlin-python-play.py | * https://gist.githubusercontent.com/okram/f193d5616563a69ad5714a42c504276f/raw/b8075410e400e18f18360015945f3760d99d044a/gremlin-python-play.py | ||
+ | * https://github.com/nedlowe/gremlin-python-example | ||
+ | * https://groups.google.com/forum/#!topic/gremlin-users/9DoPGfx9Jnk | ||
+ | * https://github.com/kuzeko/graph-databases-testsuite | ||
+ | * https://github.com/krlawrence/graph/blob/master/sample-code/glv-client-2.py | ||
+ | [[Category:Tutorial]] |
Latest revision as of 06:10, 4 July 2023
OsProject
OsProject | |
---|---|
edit | |
id | gremlin-python-tutorial |
state | active |
owner | WolfgangFahl |
title | Gremlin-Python mini tutorial |
url | https://github.com/WolfgangFahl/gremlin-python-tutorial |
version | 0.0.6 |
description | |
date | 2023-07-04 |
since | 2019-09-17 |
until |
tickets
This tutorial is intended to get you up and running using Gremlin / Apache Tinkerpop with Python. Basic knowledge of Python is assumed.
Do you already now Gremlin / Apache Tinkerpop?
If so you can continue with the preqrequisites part. Otherwise you might want to click on the Gremlin logo below.
There is also an explanation of
Freitext
Freitext
Freitext
Gremlin steps based on Java in this wiki.
This mini-tutorial is inspired by this stackoverflow question.
The goal is to get access to an apache tinkerpop/gremlin graph database via Python.
The examples in this tutorial have been tested on Ubuntu 18.04 LTS and MacOS with a MacPorts environment as well as in the travis CI environment see https://github.com/WolfgangFahl/gremlin-python-tutorial.
Prerequisites
- Java
- Python
- Gremlin-Server
- Gremlin-Console (for debugging)
To get the preqequisites you can either follow the manual or script based installation below. The script based installation is quicker - the manual installation gives you more insight and control over the installation steps.
Manual Installation
Installing Java
There are many ways to install Java and your mileage may vary.
sudo apt-get install openjdk-8-jre
java -version
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
Installing Python and Pip
We assume you'd like to work with python 3.x
sudo apt install python3.10
python --version
Python 3.10.8
sudo apt install python3-pip
pip --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
Installing Gremlin-Python
git clone https://github.com/WolfgangFahl/gremlin-python-tutorial
cd gremlin-python-tutorial/
pip install.
Installing Gremlin Server and Console
Download Gremlin Server and optionally Gremlin Console and unzip the downloaded files.
Starting the Gremlin Server
cd apache-tinkerpop-gremlin-server-3.6.3
bin/gremlin-server.sh conf/gremlin-server-modern.yaml
See #Gremlin-Server_start for the expected result.
Starting the Gremlin Console
cd apache-tinkerpop-gremlin-console-3.6.3
bin/gremlin.sh
See #Gremlin-Console_start_.28for_debugging.29 for the expected result.
Script based installation
The "run" installation helper script tries to automate the necessary steps
- Installation
- Gremlin-Server start
- Gremlin-Console start (for debugging)
- Python script start
The following command should get you going:
git clone https://github.com/WolfgangFahl/gremlin-python-tutorial
./scripts/run -i
./scripts/run -s
# in another console
./run -p
Help
scripts/run -h
usage: ./run [-c|-h|-i|-n|-p|-s|-t|-v]
-c|--console: start console
-h|--help: show this usage
-i|--install: install prerequisites
-n|--neo4j: start neo4j server
-p|--python: start python trial code
-s|--server: start server
-t|--test: start pytest
-v|--version: show version
Version
scripts/run -v
apache-tinkerpop-gremlin version 3.6.3
Installation
run -i
installs
- gremlin server
- gremlin console
- gremlin python module
checking prerequisites ...
/usr/bin/java
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
/usr/bin/python
Python 2.7.15+
/usr/bin/pip
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
downloading apache-tinkerpop-gremlin-server-3.4.3-bin.zip
unzipping apache-tinkerpop-gremlin-server-3.4.3-bin.zip
downloading apache-tinkerpop-gremlin-console-3.4.3-bin.zip
unzipping apache-tinkerpop-gremlin-console-3.4.3-bin.zip
installing needed python modules
Requirement already satisfied: futures in /usr/local/lib/python2.7/dist-packages (from -r requirements.txt (line 2))
Requirement already satisfied: gremlinpython in /usr/local/lib/python2.7/dist-packages (from -r requirements.txt (line 4))
Requirement already satisfied: isodate>=0.6.0 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: six>=1.10.0 in /usr/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: aenum>=1.4.5 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: tornado<5.0,>=4.4.1 in /usr/local/lib/python2.7/dist-packages (from gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: certifi in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: singledispatch in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))
Requirement already satisfied: backports-abc>=0.4 in /usr/local/lib/python2.7/dist-packages (from tornado<5.0,>=4.4.1->gremlinpython->-r requirements.txt (line 4))
Gremlin-Server start
scripts/run -s
starts the gremlin server with a default yaml-file in foreground
starting gremlin-server ...
[INFO] GremlinServer - 3.4.3
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
[INFO] GremlinServer - Configuring Gremlin Server from /home/wf/source/python/gremlin-python-tutorial/apache-tinkerpop-gremlin-server-3.4.3/conf/gremlin-server-modern.yaml
[INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
[INFO] DefaultGraphManager - Graph [graph] was successfully configured via [conf/tinkergraph-empty.properties].
[INFO] ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
[INFO] ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
[INFO] ServerGremlinExecutor - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
[INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
[INFO] OpLoader - Adding the standard OpProcessor.
[INFO] OpLoader - Adding the session OpProcessor.
[INFO] OpLoader - Adding the traversal OpProcessor.
[INFO] TraversalOpProcessor - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
[INFO] GremlinServer - Executing start up LifeCycleHook
[INFO] Logger$info - Loading 'modern' graph data.
[INFO] GremlinServer - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[INFO] GremlinServer - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[WARN] AbstractChannelizer - The org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0 serialization class is deprecated.
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0 with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0-stringd with org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1
[INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
[INFO] GremlinServer$1 - Channel started at port 8182.
Quick way to stop server
If you ran the server in foreground you can stop it with "CTRL-C" in the console where you started it. Otherwise you can simply kill the corresponding process e.g. with:
pkill -9 -fl gremlin-server
Gremlin-Console start (for debugging)
scripts/run -c
starts the gremlin console
starting gremlin-console ...
Sep 17, 2019 4:16:03 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
You can try out https://stackoverflow.com/a/52998299/1497139:
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
:> g.V().values('name')
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
g.V()
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> :exit
Python script start
./run -p
starts the python test script.
./run -p
starting python test code
g.V().count=6
g.E().count=6
Python unit tests start
./run -t
Starts the pytest unit tests. Please make sure a gremlin-server is running.
./run -t
==================================== test session starts =====================================
platform darwin -- Python 3.7.4, pytest-5.1.2, py-1.8.0, pluggy-0.12.0
rootdir: /Users/wf/source/python/gremlin-python-tutorial
collected 1 item
test_001.py . [100%]
===================================== 1 passed in 12.92s =====================================
Getting Started
The Apache Tinkerpop Getting Started tutorial assumes you are using the groovy console to try things out. We'll use these steps of the tutorial to show how the same traversals are available via gremlin-python.
The modern graph will be the basis for our first steps.
Gremlin-Python is just a Gremlin Language Variant - this means that the Graph Traversals are not executed in the Python enviroment but instead sent as "bytecode" to a server that will execute the traversal and sent back the result.
The first five minutes
test_tutorial.py is the relevant source code for this section.
g - the graph traversal
In the python environment to get the starting point "g" - the graph traversal you need to create a remote connection to a gremlin server. That's why we have to start the gremlin server e.g. with run -s from our automation script above. The gremlin server is configured to supply travesals for the "modern graph" example depicted above.
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))
In https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/tutorial/remote.py there is a helper class "RemoteTraversal" which allows to read the server configuration from a yaml file. In the tutorial examples the above code is reduced to
# see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_001.py
from tutorial import remote
# initialize a remote traversal
g = remote.RemoteTraversal().g()
Steps 1 to 6
see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_002_tutorial.py The source code in github is slightly different since some gremlin-server providers do not work with the set of id's starting from 1. To keep things simple the original source code is shown here:
# http://wiki.bitplan.com/index.php/Gremlin_python#g.V.28.29_-_the_vertices
#gremlin> g.V() //(1)
# ==>v[1]
# ==>v[2]
# ==>v[3]
# ==>v[4]
# ==>v[5]
# ==>v[6]
def test_tutorial1():
# get the vertices
gV=g.V()
# we have a traversal now
assert isinstance(gV,GraphTraversal)
# convert it to a list to get the actual vertices
vList=gV.toList()
# there should be 6 vertices
assert len(vList)==6
# the default string representation of a vertex is showing the id
# of a vertex
assert str(vList)=="[v[1], v[2], v[3], v[4], v[5], v[6]]"
#gremlin> g.V(1) //(2)
# ==>v[1]
def test_tutorial2():
assert str(g.V(1).toList())=="[v[1]]"
#gremlin> g.V(1).values('name') //3
# ==>marko
def test_tutorial3():
assert str( g.V(1).values('name').toList())=="['marko']"
# gremlin> g.V(1).outE('knows') //4
# ==>e[7][1-knows->2]
# ==>e[8][1-knows->4]
def test_tutorial4():
assert str(g.V(1).outE("knows").toList()) == "[e[7][1-knows->2], e[8][1-knows->4]]"
# gremlin> g.V(1).outE('knows').inV().values('name') //5\
# ==>vadas
# ==>josh
def test_tutorial5():
assert str(g.V(1).outE("knows").inV().values("name").toList())=="['vadas', 'josh']"
# gremlin> g.V(1).out('knows').values('name') //6\
# ==>vadas
# ==>josh
def test_tutorial6():
assert str(g.V(1).out("knows").values("name").toList())=="['vadas', 'josh']"
Loading and Saving a graph
Given that gremlin-python is a Gremlin Language Variant (GLV) and doesn't have it's own traversal implementation loading and saving graphs is a bit more tricky than in non-GLV environments.
For this tutorial we assume you only work with small, experimental, non-production graph databases. Be warned! We simply clear the whole graph when loading!
Loading the air-routes example
Kelvin Lawrence has a nice example in his tutorial - the https://github.com/krlawrence/graph/blob/master/sample-data/air-routes-small.graphml is also available for this tutorial
from tutorial import remote
import os
# initialize a remote traversal
g = remote.RemoteTraversal().g()
# test loading a graph
def test_loadGraph():
graphmlFile="air-routes-small.xml";
# make the local file accessible to the server
airRoutesPath=os.path.abspath(graphmlFile)
# drop the existing content of the graph
g.V().drop().iterate()
# read the content from the air routes example
g.io(airRoutesPath).read().iterate()
vCount=g.V().count().next()
print ("%s has %d vertices" % (graphmlFile,vCount))
assert vCount==47
test_loadGraph()
Saving a graph
Let's create a graph containing a single node for the fish named Wanda and save it.
# test saving a graph
def test_saveGraph():
graphmlPath="/tmp/A-Fish-Named-Wanda.xml"
# drop the existing content of the graph
g.V().drop().iterate()
g.addV("Fish").property("name","Wanda").iterate()
g.io(graphmlPath).write().iterate()
print("wrote graph to %s" % (graphmlPath))
# check that the graphml file exists
assert os.path.isfile(graphmlPath)
Creating a graphical representation of a graph
A simple way to visualize your graphs is using graphviz. There is a graphviz python module with documentation.
Example Graphviz Usage
see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_005_graphviz.py
# see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/test_005_graphviz.py
from tutorial import remote
from graphviz import Digraph
import os.path
from gremlin_python.process.traversal import T
# initialize a remote traversal
g = remote.RemoteTraversal().g()
# test creating a graphviz graph from the tinkerpop graph
def test_createGraphvizGraph():
# make sure we re-load the tinkerpop modern example
remoteTraversal=remote.RemoteTraversal()
remoteTraversal.load("tinkerpop-modern.xml")
# start a graphviz
dot = Digraph(comment='Modern')
# get vertice properties including id and label as dicts
for vDict in g.V().valueMap(True).toList():
# uncomment to debug
# print vDict
# get id and label
vId=vDict[T.id]
vLabel=vDict[T.label]
# greate a graphviz node label
# name property is alway there
gvLabel=r"%s\n%s\nname=%s" % (vId,vLabel,vDict["name"][0])
# if there is an age property add it to the label
if "age" in vDict:
gvLabel=gvLabel+r"\nage=%s" % (vDict["age"][0])
# create a graphviz node
dot.node("node%d" % (vId),gvLabel)
# loop over all edges
for e in g.E():
# get the detail information with a second call per edge (what a pitty to be so inefficient ...)
eDict=g.E(e.id).valueMap(True).next()
# uncomment if you'd like to debug
# print (e,eDict)
# create a graphviz label
geLabel=r"%s\n%s\nweight=%s" % (e.id,e.label,eDict["weight"])
# add a graphviz edge
dot.edge("node%d" % (e.outV.id),"node%d" % (e.inV.id),label=geLabel)
# modify the styling see http://www.graphviz.org/doc/info/attrs.html
dot.edge_attr.update(arrowsize='2',penwidth='2')
dot.node_attr.update(style='filled',fillcolor="#A8D0E4")
# print the source code
print (dot.source)
# render without viewing - default is creating a pdf file
dot.render('/tmp/modern.gv', view=False)
# check that the pdf file exists
assert os.path.isfile('/tmp/modern.gv.pdf')
# call the test
test_createGraphvizGraph()
Resutling graphviz dot source
// Modern
digraph {
node [fillcolor="#A8D0E4" style=filled]
edge [arrowsize=2 penwidth=2]
node1 [label="1\nperson\nname=marko\nage=29"]
node2 [label="2\nperson\nname=vadas\nage=27"]
node3 [label="3\nsoftware\nname=lop"]
node4 [label="4\nperson\nname=josh\nage=32"]
node5 [label="5\nsoftware\nname=ripple"]
node6 [label="6\nperson\nname=peter\nage=35"]
node1 -> node2 [label="7\nknows\nweight=0.5"]
node1 -> node4 [label="8\nknows\nweight=1.0"]
node1 -> node3 [label="9\ncreated\nweight=0.4"]
node4 -> node5 [label="10\ncreated\nweight=1.0"]
node4 -> node3 [label="11\ncreated\nweight=0.4"]
node6 -> node3 [label="12\ncreated\nweight=0.2"]
}
Resulting pdf file
If you set "view=True" the pdf display will be directly initiated from the python script.
Connecting to Gremlin enabled graph databases
According to the Gremlin Wiki page there are few different graph databases out there that support Gremlin/Apache Tinkerpop. We'll try to connect to a few of these using gremlin-python.
- ❌ means we didn't get it to work even after trying
- ❓ we didn't test it yet
- ✅ means we got it working
Amazon Neptune ❓
Blazegraph ❓
Cosmos ❓
DataStax ❌
Trial
# https://hub.docker.com/_/datastax
image=datastax/dse-server:6.7.2
docker pull $image
docker run --name datastax -e DS_LICENSE=accept -p 8182:8182 $image
JanusGraph ✅
- https://docs.janusgraph.org/#getting-started
- https://github.com/JanusGraph/janusgraph/releases
- https://github.com/sunsided/janusgraph-docker
- https://docs.janusgraph.org/connecting/python/
3. Trial
docker run -it -p 8182:8182 --mount src=<path to graphdata>,target=/graphdata,type=bind janusgraph/janusgraph
see https://stackoverflow.com/a/60964495/1497139
With a bash your can check for available files
docker run -it janusgraph/janusgraph /bin/bash
root@8542ed1b8232:/opt/janusgraph# ls data
grateful-dead-janusgraph-schema.groovy tinkerpop-crew-typed.json
grateful-dead-typed.json tinkerpop-crew-v2d0-typed.json
grateful-dead-v2d0-typed.json tinkerpop-crew-v2d0.json
grateful-dead-v2d0.json tinkerpop-crew.json
grateful-dead.json tinkerpop-crew.kryo
grateful-dead.kryo tinkerpop-modern-typed.json
grateful-dead.txt tinkerpop-modern-v2d0-typed.json
grateful-dead.xml tinkerpop-modern-v2d0.json
script-input-grateful-dead.groovy tinkerpop-modern.json
script-input-tinkerpop.groovy tinkerpop-modern.kryo
tinkerpop-classic-typed.json tinkerpop-modern.xml
tinkerpop-classic-v2d0-typed.json tinkerpop-sink-typed.json
tinkerpop-classic-v2d0.json tinkerpop-sink-v2d0-typed.json
tinkerpop-classic.json tinkerpop-sink-v2d0.json
tinkerpop-classic.kryo tinkerpop-sink.json
tinkerpop-classic.txt tinkerpop-sink.kryo
tinkerpop-classic.xml
for a test i choose tinkerpop-modern.xml:
file="data/tinkerpop-modern.xml";
g.io(file).read().iterate()
vCount=g.V().count().next()
print ("%s has %d vertices" % (file,vCount))
assert vCount==6
which works. Thanks to Kelvin Lawrence for his comment on stackoverflow!
To make "external" data available to the docker image the --mount option can be used:
docker run -it -p 8182:8182 --mount src=<path to graphdata>,target=/graphdata,type=bind janusgraph/janusgraph
The following helper class helps sharing files:
RemoteGremlin
see also Pyjanusgraph
'''
Created on 2020-03-30
@author: wf
'''
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.structure.graph import Graph
from shutil import copyfile
import os
class RemoteGremlin(object):
'''
helper for remote gremlin connections
'''
def __init__(self, server, port=8182):
'''
construct me with the given server and port
'''
self.server=server
self.port=port
def sharepoint(self,sharepoint,sharepath):
'''
set up the sharepoint
'''
self.sharepoint=sharepoint
self.sharepath=sharepath
def share(self,file):
'''
share the given file and return the path as seen by the server
'''
fbase=os.path.basename(file)
copyfile(file,self.sharepoint+fbase)
return self.sharepath+fbase
def open(self):
'''
open the remote connection
'''
self.graph = Graph()
self.url='ws://%s:%s/gremlin' % (self.server,self.port)
self.connection = DriverRemoteConnection(self.url, 'g')
# The connection should be closed on shut down to close open connections with connection.close()
self.g = self.graph.traversal().withRemote(self.connection)
def close(self):
'''
close the remote connection
'''
self.connection.close()
python unit test
'''
Created on 2020-03-28
@author: wf
'''
import unittest
from tp.gremlin import RemoteGremlin
class JanusGraphTest(unittest.TestCase):
'''
test access to a janus graph docker instance via the RemoteGremlin helper class
'''
def setUp(self):
pass
def tearDown(self):
pass
def test_loadGraph(self):
# change to your server
rg=RemoteGremlin("capri.bitplan.com")
rg.open()
# change to your shared path
rg.sharepoint("/Volumes/bitplan/user/wf/graphdata/","/graphdata/")
g=rg.g
graphmlFile="air-routes-small.xml";
shared=rg.share(graphmlFile)
# drop the existing content of the graph
g.V().drop().iterate()
# read the content from the air routes example
g.io(shared).read().iterate()
vCount=g.V().count().next()
print ("%s has %d vertices" % (shared,vCount))
assert vCount==47
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
2. Trial
docker run --rm --name janusgraph-default janusgraph/janusgraph:latest
waiting for storage ...
waiting for storage ...
waiting for storage ...
waiting for storage ...
...
GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
...
Caused by: javax.script.ScriptException: javax.script.ScriptException: groovy.lang.MissingPropertyException: No such property: graph for class: Script1
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:378)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
at org.apache.tinkerpop.gremlin.jsr223.DefaultGremlinScriptEngineManager.lambda$createGremlinScriptEngine$16(DefaultGremlinScriptEngineManager.java:460)
... 24 more
4438 [gremlin-server-boss-1] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Channel started at port 8182.
When trying to connect with python via
def testJanusGraph(self):
graph = Graph()
connection = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
# The connection should be closed on shut down to close open connections with connection.close()
g = graph.traversal().withRemote(connection)
# Reuse 'g' across the application
herculesAge = g.V().has('name', 'hercules').values('age').next()
print('Hercules is {} years old.'.format(herculesAge))
pass
the result is
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info ConnectionRefusedError: [Errno 61] Connection refused
1. Trial
- Downloaded 275 MByte janusgraph-0.4.0-hadoop2.zip - unzipped and started bin/gremlin-server.sh (already given several error messages)
- followed getting started procedure above
- started bin/gremlin.sh
graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')
17:41:38 WARN org.janusgraph.diskstorage.es.rest.RestElasticSearchClient - Unable to determine Elasticsearch server version. Default to FIVE.
java.net.ConnectException: Connection refused
Neo4J ❌
- https://stackoverflow.com/questions/47843862/how-do-i-connect-to-a-remote-neo4j-database-using-gremlin-python
- https://community.neo4j.com/t/neo4j-gremlin-integration/8144
- https://stackoverflow.com/questions/44645204/gremlin-server-with-neo4j
Trial
scripts/runNeo4j -rc
./run -n
ln -f Neo4j.yaml server.yaml
./run -t
Visualization
MATCH (n) RETURN n
Does unfortunately show no results ...
OrientDB ❌
- https://github.com/orientechnologies/orientdb-gremlin/issues/143
- https://github.com/orientechnologies/orientdb-gremlin/issues/146
- https://stackoverflow.com/questions/49646876/gremlin-server-connect-to-orient-db
- https://stackoverflow.com/questions/50948180/use-python-with-orientdb-and-gremlin-server
- https://orientdb.com/docs/3.0.x/tinkerpop3/OrientDB-TinkerPop3.html
- https://github.com/orientechnologies/orientdb-docker/blob/master/3.0-tp3/x86_64/alpine/gremlin-server.yaml
- https://github.com/orientechnologies/orientdb-gremlin
see https://github.com/WolfgangFahl/gremlin-python-tutorial/blob/master/scripts/runOrientDB
# https://hub.docker.com/_/orientdb
docker pull orientdb:3.0.23-tp3
docker run -d --name odbtp3 -p 2424:2424 -p 2480:2480 -p 8182:8182 -e ORIENTDB_ROOT_PASSWORD=rootpwd orientdb:3.0.23-tp3
ln -f OrientDB.yaml server.yaml
./run -t
Tests fail see:
Links
- https://pypi.org/project/gremlinpython/
- https://stackoverflow.com/questions/tagged/gremlinpython
- http://tinkerpop.apache.org/downloads.html
- http://tinkerpop.apache.org/docs/3.4.3/reference/#connecting-via-console
- https://gist.githubusercontent.com/okram/f193d5616563a69ad5714a42c504276f/raw/b8075410e400e18f18360015945f3760d99d044a/gremlin-python-play.py
- https://github.com/nedlowe/gremlin-python-example
- https://groups.google.com/forum/#!topic/gremlin-users/9DoPGfx9Jnk
- https://github.com/kuzeko/graph-databases-testsuite
- https://github.com/krlawrence/graph/blob/master/sample-code/glv-client-2.py