DgraphAndWeaviateTest

From BITPlan Wiki
Revision as of 11:49, 2 September 2020 by Wf (talk | contribs) (→‎Test)
Jump to navigation Jump to search

OsProject

OsProject
edit
id  DgraphAndWeaviateTest
state  
owner  Wolfgang Fahl
title  DgraphAndWeaviateTest
url  https://github.com/WolfgangFahl/DgraphAndWeaviateTest
version  0.0.1
description  
date  2020/08/05
since  
until  

This is sample project to test Python based storage with

The motivation for this project was the Choice of a Database storage system for the ProceedingsTitleParser

Installation and test

Prerequisites

  • python > version 3.6 - tested with version 3.6/3.7/3.8
  • docker e.g. docker desktop community - tested with e.g. docker desktop 2.3.0.4 Docker version 19.03.12
  • java e.g. openjdk - tested with Java 1.8 and Java 11
  • Operating system that can run bash scripts e.g. macports, linux - tested on Mac OS 10.13.6 Macports 2.6.2, Ubuntu 18.04 LTS

Installation

https://github.com/WolfgangFahl/DgraphAndWeaviateTest
cd DgraphAndWeaviateTest
scripts/install

Starting servers

see also DgraphAndWeaviateTest

# command to run Dgraph
# pull dgraph
scripts/dgraph -p
# run dgraph
scripts/dgraph
# pull and run weaviate
scripts/weaviate
# install apache jena and load example data
scripts/jena -l sampledata/example.ttl
# run apache jena fuseki server
scripts/jena -f example

Test

scripts/test

Sample Data

UML

Python

@staticmethod
    def getRoyals():
        listOfDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': Sample.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'Charles, Prince of Wales',         'born': Sample.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
            {'name': 'George of Cambridge',              'born': Sample.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': Sample.dob('1984-09-15'), 'numberInLine': 6, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]
        today=date.today()
        for person in listOfDicts:
            born=person['born']
            age=(today - born).days / 365.2425
            person['age']=age
            person['ofAge']=age>=18
            person['lastmodified']=datetime.now()
        return listOfDicts

Dgraph

Dgraph Installation

docker based pull:

scripts/dgraph -p

Dgraph start

docker based start of

  • alpha
  • ratel
  • zero
scripts/dgraph

Dgraph stop

scripts/dgraph -k

scripts/dgraph usage

scripts/dgraph -h
scripts/dgraph [-b|--bash|-c|--clean|-h|--help|-k|--kill|-p|--pull]

-b | --bash: start a bash terminal shell within the currently running container
-h | --help: show this usage
-k | --kill: stop the docker image
-p | --pull: pull the docker image
-c | --clean: clean start with kill and purge of all data

Apache Jena

The jena -l and jena -f options will automatically download and unpack the needed Apache jena files.

Jena load example dataset

scripts/jena -l sampledata/example.ttl

Jena fuseki server start

scripts/jena -f example

You should be able to browse the admin GUI at http://localhost:3030 and have the example dataset ready for you

Jena fuseki server stop

scripts/jena -k

jena script usage

scripts/jena -h
scripts/jena [-f|--fuseki|-h|--help|-k|--kill|-l|--load]

-f | --fuseki [dataset]: download and start fuseki server with the given dataset
-h | --help: show this usage
-k | --kill: kill the running fuseki server
-l | --load [ttl file]: download jena / tdbloader and load given ttl file

Issues, Questions and Answers

Issues

Stackoverflow questions

Stackoverflow answers

Example unit tests

Apache unit test

see https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testJena.py

'''
Created on 2020-08-14
@author: wf
'''
import unittest
import getpass
from dg.jena import Jena
import time
import sys
from datetime import datetime,date

class TestJena(unittest.TestCase):
    ''' Test Apache Jena access via Wrapper'''

    def setUp(self):
        pass


    def tearDown(self):
        pass

    def getJena(self,mode='query',debug=False,typedLiterals=False):
        '''
        get the jena endpoint for the given mode
        '''
        endpoint="http://localhost:3030/example"
        jena=Jena(endpoint,mode=mode,debug=debug,typedLiterals=typedLiterals)
        return jena

    def testJenaQuery(self):
        '''
        test Apache Jena Fuseki SPARQL endpoint with example SELECT query 
        '''
        jena=self.getJena()
        queryString = "SELECT * WHERE { ?s ?p ?o. }"
        results=jena.query(queryString)
        self.assertTrue(len(results)>20)
        pass
    
    def testJenaInsert(self):
        '''
        test a Jena INSERT DATA
        '''
        jena=self.getJena(mode="update")
        insertCommands = [ """
        PREFIX cr: <http://cr.bitplan.com/>
        INSERT DATA { 
          cr:version cr:author "Wolfgang Fahl". 
        }
        """,'INVALID COMMAND']
        for index,insertCommand in enumerate(insertCommands):
            try:
                result=jena.insert(insertCommand)
                self.assertTrue(index==0)
                print(result)
            except Exception as ex:
                self.assertTrue(index==1)
                msg=ex.args[0]
                self.assertTrue("QueryBadFormed" in msg)
                self.assertTrue("Error 400" in msg)
                pass
            
    def checkErrors(self,errors):      
        if len(errors)>0:
            print("ERRORS:")
            for error in errors:
                print(error)
        self.assertEquals(0,len(errors))    
        
        
    def dob(self,isoDateString):
        ''' get the date of birth from the given iso date state'''
        if sys.version_info >= (3, 7):
            dt=datetime.fromisoformat(isoDateString)
        else:
            dt=datetime.strptime(isoDateString,"%y-%m-%d")  
        return dt.date()    
            
    def testListOfDictInsert(self):
        '''
        test inserting a list of Dicts using FOAF example
        https://en.wikipedia.org/wiki/FOAF_(ontology)
        '''
        listofDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': self.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'Charles, Prince of Wales',         'born': self.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
            {'name': 'George of Cambridge',              'born': self.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': self.dob('1984-09-15'), 'numberInLine': 5, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]
        today=date.today()
        for person in listofDicts:
            born=person['born']
            age=(today - born).days / 365.2425
            person['age']=age
            person['ofAge']=age>=18
        typedLiteralModes=[True,False]
        for typedLiteralMode in typedLiteralModes:
            jena=self.getJena(mode='update',typedLiterals=typedLiteralMode,debug=True)
            errors=jena.insertListOfDicts(listofDicts,'foaf:Person','name','PREFIX foaf: <http://xmlns.com/foaf/0.1/>')
            self.checkErrors(errors)
             
        
    def testListOfDictSpeed(self):
        '''
        test the speed of adding data
        ''' 
        listOfDicts=[]
        limit=1000
        for index in range(limit):
            listOfDicts.append({'pkey': "index%d" %index, 'index': "%d" %index})
        jena=self.getJena(mode='update',debug=True)
        entityType="ex:TestRecord"
        primaryKey='pkey'
        prefixes='PREFIX ex: <http://example.com/>'
        startTime=time.time()
        errors=jena.insertListOfDicts(listOfDicts, entityType, primaryKey, prefixes)   
        self.checkErrors(errors)
        elapsed=time.time()-startTime
        print ("adding %d records took %5.3f s => %5.f records/s" % (limit,elapsed,limit/elapsed))
    
    def testLocalWikdata(self):
        '''
        check local wikidata
        '''
        # check we have local wikidata copy:
        if getpass.getuser()=="wf":
            # use 2018 wikidata copy
            endpoint="http://blazegraph.bitplan.com/sparql"
            jena=Jena(endpoint)
            queryString="""
            SELECT ?item ?coord 
WHERE 
{
  # instance of whisky distillery
  ?item wdt:P31 wd:Q10373548.
  # get the coordindate
  ?item wdt:P625 ?coord.
}"""
            results=jena.query(queryString)
            self.assertEqual(238,len(results))


if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

Dgraph unit test

def testDgraph(self):
        ''' 
        test basic Dgraph operation
        '''
        dgraph=Dgraph(debug=True)
        # drop all data and schemas
        dgraph.drop_all()
        # create a schema for Pokemons
        schema='''
        name: string @index(exact) .
        weight: float .
        height: float .
type Pokemon {
   name
   weight
   height
}'''
        dgraph.addSchema(schema)
        # prepare a list of Pokemons to be added
        pokemonList=[{'name':'Pikachu', 'weight':  6, 'height': 0.4 },
                  {'name':'Arbok',   'weight': 65, 'height': 3.5 }, 
                  {'name':'Raichu',  'weight': 30, 'height': 0.8 }, 
                  {'name':'Sandan',  'weight': 12, 'height': 0.6 }]
        # add the list in a single transaction
        dgraph.addData(obj=pokemonList)
        # retrieve the data via GraphQL+ query
        graphQuery='''{
# list of pokemons
  pokemons(func: has(name), orderasc: name) {
    name
    weight
    height
  }
}'''
        queryResult=dgraph.query(graphQuery)
        # check the result
        self.assertTrue('pokemons' in queryResult)
        pokemons=queryResult['pokemons']
        self.assertEqual(len(pokemonList),len(pokemons))
        sortindex=[1,0,2,3]
        for index,pokemon in enumerate(pokemons):
            expected=pokemonList[sortindex[index]]
            self.assertEquals(expected,pokemon)
        # close the database connection
        dgraph.close()

Example test session

see https://travis-ci.org/github/WolfgangFahl/DgraphAndWeaviateTest/jobs/715131236