DgraphAndWeaviateTest
Jump to navigation
Jump to search
OsProject
OsProject | |
---|---|
edit | |
id | DgraphAndWeaviateTest |
state | |
owner | Wolfgang Fahl |
title | DgraphAndWeaviateTest |
url | https://github.com/WolfgangFahl/DgraphAndWeaviateTest |
version | 0.0.1 |
description | |
date | 2020/08/05 |
since | |
until |
This is sample project to test Python based storage with
The motivation for this project was the Choice of a Database storage system for the ProceedingsTitleParser
Installation and test
Prerequisites
- python > version 3.6 - tested with version 3.6/3.7/3.8
- docker e.g. docker desktop community - tested with e.g. docker desktop 2.3.0.4 Docker version 19.03.12
- java e.g. openjdk - tested with Java 1.8 and Java 11
- Operating system that can run bash scripts e.g. macports, linux - tested on Mac OS 10.13.6 Macports 2.6.2, Ubuntu 18.04 LTS
Installation
https://github.com/WolfgangFahl/DgraphAndWeaviateTest
cd DgraphAndWeaviateTest
scripts/install
Starting servers
see also DgraphAndWeaviateTest
# command to run Dgraph
# pull dgraph
scripts/dgraph -p
# run dgraph
scripts/dgraph
# pull and run weaviate
scripts/weaviate
# install apache jena and load example data
scripts/jena -l sampledata/example.ttl
# run apache jena fuseki server
scripts/jena -f example
Test
scripts/test
Sample Data
UML
Python
@staticmethod
def getRoyals():
listOfDicts=[
{'name': 'Elizabeth Alexandra Mary Windsor', 'born': Sample.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
{'name': 'Charles, Prince of Wales', 'born': Sample.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
{'name': 'George of Cambridge', 'born': Sample.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
{'name': 'Harry Duke of Sussex', 'born': Sample.dob('1984-09-15'), 'numberInLine': 6, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
]
today=date.today()
for person in listOfDicts:
born=person['born']
age=(today - born).days / 365.2425
person['age']=age
person['ofAge']=age>=18
person['lastmodified']=datetime.now()
return listOfDicts
Dgraph
Dgraph Installation
docker based pull:
scripts/dgraph -p
Dgraph start
docker based start of
- alpha
- ratel
- zero
scripts/dgraph
Dgraph stop
scripts/dgraph -k
scripts/dgraph usage
scripts/dgraph -h
scripts/dgraph [-b|--bash|-c|--clean|-h|--help|-k|--kill|-p|--pull]
-b | --bash: start a bash terminal shell within the currently running container
-h | --help: show this usage
-k | --kill: stop the docker image
-p | --pull: pull the docker image
-c | --clean: clean start with kill and purge of all data
Apache Jena
The jena -l and jena -f options will automatically download and unpack the needed Apache jena files.
Jena load example dataset
scripts/jena -l sampledata/example.ttl
Jena fuseki server start
scripts/jena -f example
You should be able to browse the admin GUI at http://localhost:3030 and have the example dataset ready for you
Jena fuseki server stop
scripts/jena -k
jena script usage
scripts/jena -h
scripts/jena [-f|--fuseki|-h|--help|-k|--kill|-l|--load]
-f | --fuseki [dataset]: download and start fuseki server with the given dataset
-h | --help: show this usage
-k | --kill: kill the running fuseki server
-l | --load [ttl file]: download jena / tdbloader and load given ttl file
Issues, Questions and Answers
Issues
- https://github.com/semi-technologies/weaviate/issues/1215
- https://discuss.dgraph.io/t/dgraph-v20-07-0-v20-03-0-unreliability-in-mac-os-environment/9376/14
- https://discuss.dgraph.io/t/input-for-predicate-location-of-type-scalar-is-uid/9381
Stackoverflow questions
- https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
- https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki
- https://stackoverflow.com/questions/63358495/how-to-delete-all-nodes-with-a-given-type
- https://stackoverflow.com/questions/63260073/starting-zero-alpha-and-ratel-in-a-single-command-e-g-in-macosx-and-other-envir
- https://stackoverflow.com/questions/63098344/weaviate-error-code-400-parsing-body-from-failed-invalid-character-g-looki
- https://stackoverflow.com/questions/63075787/translating-sidif-to-weaviate
Stackoverflow answers
- https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396#
- https://stackoverflow.com/questions/63358495/how-to-delete-all-nodes-with-a-given-type/63358827#63358827
- https://stackoverflow.com/questions/63260073/starting-zero-alpha-and-ratel-in-a-single-command-e-g-in-macosx-and-other-envir/63265154#63265154
Example unit tests
Apache unit test
see https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testJena.py
'''
Created on 2020-08-14
@author: wf
'''
import unittest
import getpass
from dg.jena import Jena
import time
import sys
from datetime import datetime,date
class TestJena(unittest.TestCase):
''' Test Apache Jena access via Wrapper'''
def setUp(self):
pass
def tearDown(self):
pass
def getJena(self,mode='query',debug=False,typedLiterals=False):
'''
get the jena endpoint for the given mode
'''
endpoint="http://localhost:3030/example"
jena=Jena(endpoint,mode=mode,debug=debug,typedLiterals=typedLiterals)
return jena
def testJenaQuery(self):
'''
test Apache Jena Fuseki SPARQL endpoint with example SELECT query
'''
jena=self.getJena()
queryString = "SELECT * WHERE { ?s ?p ?o. }"
results=jena.query(queryString)
self.assertTrue(len(results)>20)
pass
def testJenaInsert(self):
'''
test a Jena INSERT DATA
'''
jena=self.getJena(mode="update")
insertCommands = [ """
PREFIX cr: <http://cr.bitplan.com/>
INSERT DATA {
cr:version cr:author "Wolfgang Fahl".
}
""",'INVALID COMMAND']
for index,insertCommand in enumerate(insertCommands):
try:
result=jena.insert(insertCommand)
self.assertTrue(index==0)
print(result)
except Exception as ex:
self.assertTrue(index==1)
msg=ex.args[0]
self.assertTrue("QueryBadFormed" in msg)
self.assertTrue("Error 400" in msg)
pass
def checkErrors(self,errors):
if len(errors)>0:
print("ERRORS:")
for error in errors:
print(error)
self.assertEquals(0,len(errors))
def dob(self,isoDateString):
''' get the date of birth from the given iso date state'''
if sys.version_info >= (3, 7):
dt=datetime.fromisoformat(isoDateString)
else:
dt=datetime.strptime(isoDateString,"%y-%m-%d")
return dt.date()
def testListOfDictInsert(self):
'''
test inserting a list of Dicts using FOAF example
https://en.wikipedia.org/wiki/FOAF_(ontology)
'''
listofDicts=[
{'name': 'Elizabeth Alexandra Mary Windsor', 'born': self.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
{'name': 'Charles, Prince of Wales', 'born': self.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
{'name': 'George of Cambridge', 'born': self.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
{'name': 'Harry Duke of Sussex', 'born': self.dob('1984-09-15'), 'numberInLine': 5, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
]
today=date.today()
for person in listofDicts:
born=person['born']
age=(today - born).days / 365.2425
person['age']=age
person['ofAge']=age>=18
typedLiteralModes=[True,False]
for typedLiteralMode in typedLiteralModes:
jena=self.getJena(mode='update',typedLiterals=typedLiteralMode,debug=True)
errors=jena.insertListOfDicts(listofDicts,'foaf:Person','name','PREFIX foaf: <http://xmlns.com/foaf/0.1/>')
self.checkErrors(errors)
def testListOfDictSpeed(self):
'''
test the speed of adding data
'''
listOfDicts=[]
limit=1000
for index in range(limit):
listOfDicts.append({'pkey': "index%d" %index, 'index': "%d" %index})
jena=self.getJena(mode='update',debug=True)
entityType="ex:TestRecord"
primaryKey='pkey'
prefixes='PREFIX ex: <http://example.com/>'
startTime=time.time()
errors=jena.insertListOfDicts(listOfDicts, entityType, primaryKey, prefixes)
self.checkErrors(errors)
elapsed=time.time()-startTime
print ("adding %d records took %5.3f s => %5.f records/s" % (limit,elapsed,limit/elapsed))
def testLocalWikdata(self):
'''
check local wikidata
'''
# check we have local wikidata copy:
if getpass.getuser()=="wf":
# use 2018 wikidata copy
endpoint="http://blazegraph.bitplan.com/sparql"
jena=Jena(endpoint)
queryString="""
SELECT ?item ?coord
WHERE
{
# instance of whisky distillery
?item wdt:P31 wd:Q10373548.
# get the coordindate
?item wdt:P625 ?coord.
}"""
results=jena.query(queryString)
self.assertEqual(238,len(results))
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
Dgraph unit test
def testDgraph(self):
'''
test basic Dgraph operation
'''
dgraph=Dgraph(debug=True)
# drop all data and schemas
dgraph.drop_all()
# create a schema for Pokemons
schema='''
name: string @index(exact) .
weight: float .
height: float .
type Pokemon {
name
weight
height
}'''
dgraph.addSchema(schema)
# prepare a list of Pokemons to be added
pokemonList=[{'name':'Pikachu', 'weight': 6, 'height': 0.4 },
{'name':'Arbok', 'weight': 65, 'height': 3.5 },
{'name':'Raichu', 'weight': 30, 'height': 0.8 },
{'name':'Sandan', 'weight': 12, 'height': 0.6 }]
# add the list in a single transaction
dgraph.addData(obj=pokemonList)
# retrieve the data via GraphQL+ query
graphQuery='''{
# list of pokemons
pokemons(func: has(name), orderasc: name) {
name
weight
height
}
}'''
queryResult=dgraph.query(graphQuery)
# check the result
self.assertTrue('pokemons' in queryResult)
pokemons=queryResult['pokemons']
self.assertEqual(len(pokemonList),len(pokemons))
sortindex=[1,0,2,3]
for index,pokemon in enumerate(pokemons):
expected=pokemonList[sortindex[index]]
self.assertEquals(expected,pokemon)
# close the database connection
dgraph.close()
Example test session
see https://travis-ci.org/github/WolfgangFahl/DgraphAndWeaviateTest/jobs/715131236