Difference between revisions of "DgraphAndWeaviateTest"

From BITPlan Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
}}
 
}}
  
This is sample project to test Python based access to {{Link|target=Dgraph}}, {{Link|target=Weaviate}}
+
This is sample project to test Python based storage with
and {{Link|target=Apache Jena}}
+
* {{Link|target=Dgraph}}
 +
* {{Link|target=Weaviate}}
 +
* {{Link|target=Apache Jena}}
 +
* [https://www.sqlite.org/index.html sqlite]
 +
 
 +
The motivation for this project was the [[ProceedingsTitleParser#Choice_of_Database.2FStorage_system|Choice of a Database storage system]] for the {{Link|target=ProceedingsTitleParser}}
  
 
= Installation and test =
 
= Installation and test =
Line 46: Line 51:
 
scripts/test
 
scripts/test
 
</source>
 
</source>
 +
= Sample Data =
 +
== UML ==
 +
<uml>
 +
package Royals {
 +
  entity Person {
 +
    name : TEXT <<PK>>
 +
    born : DATE
 +
    numberInLine : INTEGER
 +
    wikidataurl : TEXT
 +
    age : FLOAT
 +
    ofAge : BOOLEAN
 +
    lastmodified : TIMESTAMP
 +
  }
 +
}
 +
</uml>
 +
== Python ==
 +
<source lang='python'>
 +
@staticmethod
 +
    def getRoyals():
 +
        listOfDicts=[
 +
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': Sample.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
 +
            {'name': 'Charles, Prince of Wales',        'born': Sample.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
 +
            {'name': 'George of Cambridge',              'born': Sample.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
 +
            {'name': 'Harry Duke of Sussex',            'born': Sample.dob('1984-09-15'), 'numberInLine': 6, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
 +
        ]
 +
        today=date.today()
 +
        for person in listOfDicts:
 +
            born=person['born']
 +
            age=(today - born).days / 365.2425
 +
            person['age']=age
 +
            person['ofAge']=age>=18
 +
            person['lastmodified']=datetime.now()
 +
        return listOfDicts
 +
</source>
 +
 +
= Weaviate =
 +
== Weaviate start ==
 +
<source lang='bash'>
 +
scripts/weaviate
 +
</source>
 +
 
= Dgraph =
 
= Dgraph =
 
== Dgraph Installation ==
 
== Dgraph Installation ==
Line 74: Line 120:
 
-p | --pull: pull the docker image
 
-p | --pull: pull the docker image
 
-c | --clean: clean start with kill and purge of all data
 
-c | --clean: clean start with kill and purge of all data
</source>
 
 
= Apache Jena =
 
The jena -l and jena -f options will automatically download and unpack the needed Apache jena files.
 
== Jena load example dataset ==
 
<source lang='bash'>
 
scripts/jena -l sampledata/example.ttl
 
</source>
 
== Jena fuseki server start ==
 
<source lang='bash'>
 
scripts/jena -f example
 
</source>
 
You should be able to browse the admin GUI at http://localhost:3030 and have the example dataset ready for you
 
== Jena fuseki server stop ==
 
<source lang='bash'>
 
scripts/jena -k
 
</source>
 
 
== jena script usage ==
 
<source lang='bash'>
 
scripts/jena -h
 
scripts/jena [-f|--fuseki|-h|--help|-k|--kill|-l|--load]
 
 
-f | --fuseki [dataset]: download and start fuseki server with the given dataset
 
-h | --help: show this usage
 
-k | --kill: kill the running fuseki server
 
-l | --load [ttl file]: download jena / tdbloader and load given ttl file
 
 
</source>
 
</source>
  
Line 123: Line 142:
  
 
= Example unit tests =
 
= Example unit tests =
== Apache unit test ==
 
see https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testJena.py
 
<source lang='python'>
 
'''
 
Created on 2020-08-14
 
@author: wf
 
'''
 
import unittest
 
import getpass
 
from dg.jena import Jena
 
import time
 
import sys
 
from datetime import datetime,date
 
 
class TestJena(unittest.TestCase):
 
    ''' Test Apache Jena access via Wrapper'''
 
 
    def setUp(self):
 
        pass
 
 
 
    def tearDown(self):
 
        pass
 
 
    def getJena(self,mode='query',debug=False,typedLiterals=False):
 
        '''
 
        get the jena endpoint for the given mode
 
        '''
 
        endpoint="http://localhost:3030/example"
 
        jena=Jena(endpoint,mode=mode,debug=debug,typedLiterals=typedLiterals)
 
        return jena
 
 
    def testJenaQuery(self):
 
        '''
 
        test Apache Jena Fuseki SPARQL endpoint with example SELECT query
 
        '''
 
        jena=self.getJena()
 
        queryString = "SELECT * WHERE { ?s ?p ?o. }"
 
        results=jena.query(queryString)
 
        self.assertTrue(len(results)>20)
 
        pass
 
   
 
    def testJenaInsert(self):
 
        '''
 
        test a Jena INSERT DATA
 
        '''
 
        jena=self.getJena(mode="update")
 
        insertCommands = [ """
 
        PREFIX cr: <http://cr.bitplan.com/>
 
        INSERT DATA {
 
          cr:version cr:author "Wolfgang Fahl".
 
        }
 
        """,'INVALID COMMAND']
 
        for index,insertCommand in enumerate(insertCommands):
 
            try:
 
                result=jena.insert(insertCommand)
 
                self.assertTrue(index==0)
 
                print(result)
 
            except Exception as ex:
 
                self.assertTrue(index==1)
 
                msg=ex.args[0]
 
                self.assertTrue("QueryBadFormed" in msg)
 
                self.assertTrue("Error 400" in msg)
 
                pass
 
           
 
    def checkErrors(self,errors):     
 
        if len(errors)>0:
 
            print("ERRORS:")
 
            for error in errors:
 
                print(error)
 
        self.assertEquals(0,len(errors))   
 
       
 
       
 
    def dob(self,isoDateString):
 
        ''' get the date of birth from the given iso date state'''
 
        if sys.version_info >= (3, 7):
 
            dt=datetime.fromisoformat(isoDateString)
 
        else:
 
            dt=datetime.strptime(isoDateString,"%y-%m-%d") 
 
        return dt.date()   
 
           
 
    def testListOfDictInsert(self):
 
        '''
 
        test inserting a list of Dicts using FOAF example
 
        https://en.wikipedia.org/wiki/FOAF_(ontology)
 
        '''
 
        listofDicts=[
 
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': self.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
 
            {'name': 'Charles, Prince of Wales',        'born': self.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
 
            {'name': 'George of Cambridge',              'born': self.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
 
            {'name': 'Harry Duke of Sussex',            'born': self.dob('1984-09-15'), 'numberInLine': 5, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
 
        ]
 
        today=date.today()
 
        for person in listofDicts:
 
            born=person['born']
 
            age=(today - born).days / 365.2425
 
            person['age']=age
 
            person['ofAge']=age>=18
 
        typedLiteralModes=[True,False]
 
        for typedLiteralMode in typedLiteralModes:
 
            jena=self.getJena(mode='update',typedLiterals=typedLiteralMode,debug=True)
 
            errors=jena.insertListOfDicts(listofDicts,'foaf:Person','name','PREFIX foaf: <http://xmlns.com/foaf/0.1/>')
 
            self.checkErrors(errors)
 
           
 
       
 
    def testListOfDictSpeed(self):
 
        '''
 
        test the speed of adding data
 
        '''
 
        listOfDicts=[]
 
        limit=1000
 
        for index in range(limit):
 
            listOfDicts.append({'pkey': "index%d" %index, 'index': "%d" %index})
 
        jena=self.getJena(mode='update',debug=True)
 
        entityType="ex:TestRecord"
 
        primaryKey='pkey'
 
        prefixes='PREFIX ex: <http://example.com/>'
 
        startTime=time.time()
 
        errors=jena.insertListOfDicts(listOfDicts, entityType, primaryKey, prefixes) 
 
        self.checkErrors(errors)
 
        elapsed=time.time()-startTime
 
        print ("adding %d records took %5.3f s => %5.f records/s" % (limit,elapsed,limit/elapsed))
 
   
 
    def testLocalWikdata(self):
 
        '''
 
        check local wikidata
 
        '''
 
        # check we have local wikidata copy:
 
        if getpass.getuser()=="wf":
 
            # use 2018 wikidata copy
 
            endpoint="http://blazegraph.bitplan.com/sparql"
 
            jena=Jena(endpoint)
 
            queryString="""
 
            SELECT ?item ?coord
 
WHERE
 
{
 
  # instance of whisky distillery
 
  ?item wdt:P31 wd:Q10373548.
 
  # get the coordindate
 
  ?item wdt:P625 ?coord.
 
}"""
 
            results=jena.query(queryString)
 
            self.assertEqual(238,len(results))
 
 
 
if __name__ == "__main__":
 
    #import sys;sys.argv = ['', 'Test.testName']
 
    unittest.main()
 
</source>
 
  
== Dgraph unit test ==
 
<source lang='python'>
 
def testDgraph(self):
 
        '''
 
        test basic Dgraph operation
 
        '''
 
        dgraph=Dgraph(debug=True)
 
        # drop all data and schemas
 
        dgraph.drop_all()
 
        # create a schema for Pokemons
 
        schema='''
 
        name: string @index(exact) .
 
        weight: float .
 
        height: float .
 
type Pokemon {
 
  name
 
  weight
 
  height
 
}'''
 
        dgraph.addSchema(schema)
 
        # prepare a list of Pokemons to be added
 
        pokemonList=[{'name':'Pikachu', 'weight':  6, 'height': 0.4 },
 
                  {'name':'Arbok',  'weight': 65, 'height': 3.5 },
 
                  {'name':'Raichu',  'weight': 30, 'height': 0.8 },
 
                  {'name':'Sandan',  'weight': 12, 'height': 0.6 }]
 
        # add the list in a single transaction
 
        dgraph.addData(obj=pokemonList)
 
        # retrieve the data via GraphQL+ query
 
        graphQuery='''{
 
# list of pokemons
 
  pokemons(func: has(name), orderasc: name) {
 
    name
 
    weight
 
    height
 
  }
 
}'''
 
        queryResult=dgraph.query(graphQuery)
 
        # check the result
 
        self.assertTrue('pokemons' in queryResult)
 
        pokemons=queryResult['pokemons']
 
        self.assertEqual(len(pokemonList),len(pokemons))
 
        sortindex=[1,0,2,3]
 
        for index,pokemon in enumerate(pokemons):
 
            expected=pokemonList[sortindex[index]]
 
            self.assertEquals(expected,pokemon)
 
        # close the database connection
 
        dgraph.close()
 
</source>
 
  
 
== Example test session ==
 
== Example test session ==
 
see https://travis-ci.org/github/WolfgangFahl/DgraphAndWeaviateTest/jobs/715131236
 
see https://travis-ci.org/github/WolfgangFahl/DgraphAndWeaviateTest/jobs/715131236

Latest revision as of 08:05, 22 September 2020

OsProject

OsProject
edit
id  DgraphAndWeaviateTest
state  
owner  Wolfgang Fahl
title  DgraphAndWeaviateTest
url  https://github.com/WolfgangFahl/DgraphAndWeaviateTest
version  0.0.1
description  
date  2020/08/05
since  
until  

This is sample project to test Python based storage with

The motivation for this project was the Choice of a Database storage system for the ProceedingsTitleParser

Installation and test

Prerequisites

  • python > version 3.6 - tested with version 3.6/3.7/3.8
  • docker e.g. docker desktop community - tested with e.g. docker desktop 2.3.0.4 Docker version 19.03.12
  • java e.g. openjdk - tested with Java 1.8 and Java 11
  • Operating system that can run bash scripts e.g. macports, linux - tested on Mac OS 10.13.6 Macports 2.6.2, Ubuntu 18.04 LTS

Installation

https://github.com/WolfgangFahl/DgraphAndWeaviateTest
cd DgraphAndWeaviateTest
scripts/install

Starting servers

see also DgraphAndWeaviateTest

# command to run Dgraph
# pull dgraph
scripts/dgraph -p
# run dgraph
scripts/dgraph
# pull and run weaviate
scripts/weaviate
# install apache jena and load example data
scripts/jena -l sampledata/example.ttl
# run apache jena fuseki server
scripts/jena -f example

Test

scripts/test

Sample Data

UML

Python

@staticmethod
    def getRoyals():
        listOfDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': Sample.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'Charles, Prince of Wales',         'born': Sample.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
            {'name': 'George of Cambridge',              'born': Sample.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': Sample.dob('1984-09-15'), 'numberInLine': 6, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]
        today=date.today()
        for person in listOfDicts:
            born=person['born']
            age=(today - born).days / 365.2425
            person['age']=age
            person['ofAge']=age>=18
            person['lastmodified']=datetime.now()
        return listOfDicts

Weaviate

Weaviate start

scripts/weaviate

Dgraph

Dgraph Installation

docker based pull:

scripts/dgraph -p

Dgraph start

docker based start of

  • alpha
  • ratel
  • zero
scripts/dgraph

Dgraph stop

scripts/dgraph -k

scripts/dgraph usage

scripts/dgraph -h
scripts/dgraph [-b|--bash|-c|--clean|-h|--help|-k|--kill|-p|--pull]

-b | --bash: start a bash terminal shell within the currently running container
-h | --help: show this usage
-k | --kill: stop the docker image
-p | --pull: pull the docker image
-c | --clean: clean start with kill and purge of all data

Issues, Questions and Answers

Issues

Stackoverflow questions

Stackoverflow answers

Example unit tests

Example test session

see https://travis-ci.org/github/WolfgangFahl/DgraphAndWeaviateTest/jobs/715131236