SPARQL
What is SPARQL
SPARQL is a query language for semantic databases using the Resource Description Framework (RDF) format
Tutorial
There are quite a few tutorials out there for SPARQL e.g.
This tutorial is for people which are new to semantic concepts but would like to use an example with a fair amount of data but not too much of complexity in the structure of the data.
Semantic Concepts
Personally I learned Semantic Concepts using Semantic MediaWiki see
When using SPARQL a tutorial needs to get a slightly different touch, so for those who know the talk above I'll explain some key concepts based on an example using:
- Countries
- Towns
- Municipal Units
Triples
A semantic statement has the form
<subject> <predicate> <object>
e.g.
Dubai is-located-in AE
is such a semantic statement which is also called a Triple.
The natural language statement "Dubai is located in United Arab Emirates" is purposely slightly modified to a more "computer-ready" form. The predicate has been written as is-located-in to make it a proper Identifier. And the country-name "United Arab Emirates" has been replaced by its two letter United Nations Location Code AE. A triple has a natural graph representation:
TripleStore
A Triplestore is a database that can store and query triples. In fact for educational purposes I have written a simple Triplestore myself:
For that simple triplestore the triples are supplied in Simple Data Interchange Format. Again that format is mostly for educational purposes although it can also be used for small usecases with just a few thousand triples. Please also note that there is no SPARQL support in that project.
For more than a non-educational use a Triplestore is needed that can handle larger amounts of data and support SPARQL. The Wikipedia List of Subject-Predicate-Object Databases shows you some options. For this tutorial we'll use Blazegraph.
Setting up the Blazegraph Triple Store
You need Java to be installed on you machine.
Download the blazegraph.jar file from https://www.blazegraph.com/download/ and start it with
java -jar blazegraph.jar
In fact it's better if you start the jar file with an option to allow bigger xml files to be handled:
java -Djdk.xml.entityExpansionLimit=0 -jar blazegraph.jar
otherwise you might run into the error:
org.openrdf.rio.RDFParseException: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK
you should see
Welcome to the Blazegraph(tm) Database. Go to http://localhost:9999/blazegraph/ to get started.
And you might want to do just that and click that link.
Where Blazegraph stores it's data
The default setting for Blazegraphs journal file is to use blazegraph.jnl in the directory where you started the jar file. On my Mac OS Laptop the initial file size is some 200 MBytes.
ls -l blazegraph.jnl
-rw-r--r-- 1 wf staff 209715200 4 Jan 11:50 blazegraph.jnl
The Blazegraph Web UI
The Web-UI shows the Tabs:
- WELCOME
- QUERY
- UPDATE
- EXPLORE
- NAMESPACES
- STATUS
- PERFORMANCE
Let's start with the UPDATE tab to load some sample data.
The sample Data
The human readable form of some of our sample data and their description is available at:
- https://en.wikipedia.org/wiki/UN/LOCODE
- http://www.unece.org/cefact/locode/service/location
- https://www.unece.org/fileadmin/DAM/cefact/locode/ae.htm
RDF Version of the data
- https://old.datahub.io/dataset/rkb-explorer-unlocode
- http://unlocode.rkbexplorer.com/models/dump.tgz
You might want to download and unzip http://unlocode.rkbexplorer.com/models/dump.tgz. The result should be a directory with the following content:
pan:models wf$ls -l
total 31832
-rw-r--r-- 1 wf staff 265 4 Jan 07:27 catalog-v001.xml
-rw-r--r--@ 1 wf staff 42194 18 Feb 2009 unlocode-countries.rdf
-rw-r--r--@ 1 wf staff 228389 18 Feb 2009 unlocode-municipalunits.rdf
-rw-r--r--@ 1 wf staff 16017733 18 Feb 2009 unlocode-towns.rdf
Now drag and drop the three files:
- unlocode-countries.rdf
- unlocode-municipalunits.rdf
- unlocode-towns.rdf
one after another into the field with the text
(Type in or drag a file containing RDF data, ...
and click the update button below the field after each drag&drop operation. The output will be
Modified: 484 Milliseconds: 430 Modified: 1917 Milliseconds: ... Running update: 287 Modified: 239567 Milliseconds: 2260
The Milliseconds may vary on your machine. If you run into the 64000 entity limit you may need to restart your blazegraph.jar file with the Java VM options outlined above.
SPARQL Queries
Select all Triples
Now our environment should be ready to hit the "QUERY" tab and enter our first SPARQL query:
SPARQL Query
SELECT *
WHERE {
?subject ?predicate ?object
}
Result
Which will have total results of 242375 triples, displaying the first 50:
subject predicate object <http://unlocode.rkbexplorer.com/id/AEDHF> <http://www.aktors.org/ontology/portal#has-longitude> 54.5333333 <http://unlocode.rkbexplorer.com/id/AEDHF> <http://www.aktors.org/ontology/portal#is-located-in> <http://unlocode.rkbexplorer.com/id/AE> <http://unlocode.rkbexplorer.com/id/AEDHF> <http://www.aktors.org/ontology/support#has-pretty-name> Al Dhafra <http://unlocode.rkbexplorer.com/id/AEDHF> rdf:type <http://www.aktors.org/ontology/portal#Town> <http://unlocode.rkbexplorer.com/id/AEDUY> <http://www.aktors.org/ontology/portal#has-latitude> 25.7780637 <http://unlocode.rkbexplorer.com/id/AEDUY> <http://www.aktors.org/ontology/portal#has-longitude> 55.9310912 <http://unlocode.rkbexplorer.com/id/AEDUY> <http://www.aktors.org/ontology/portal#is-located-in> <http://unlocode.rkbexplorer.com/id/AE> <http://unlocode.rkbexplorer.com/id/AEDUY> <http://www.aktors.org/ontology/support#has-pretty-name> Ras Zubbaya (Ras Dubayyah) <http://unlocode.rkbexplorer.com/id/AEDUY> rdf:type <http://www.aktors.org/ontology/portal#Town> <http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#has-latitude> 25.2500000 <http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#has-longitude> 55.2666666 <http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#is-located-in> <http://unlocode.rkbexplorer.com/id/AE> <http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/support#has-pretty-name> Dubai <http://unlocode.rkbexplorer.com/id/AEDXB> rdf:type <http://www.aktors.org/ontology/portal#Town>
Explanation
SELECT *
asked for a selection
WHERE {
?subject ?predicate ?object
}
Specified a condition. Since we use question marks for the three triple parts we made all three parts of the triple variable.
The query shows all triples you uploaded from the RDF files "as is".
Now you can see that RDF unlike SiDiF mostly uses lenghty URLs to express things. So the Triple for Dubai being in AE gets to be:
<http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#is-located-in> <http://unlocode.rkbexplorer.com/id/AE>
And there are multiple triples for the subject <http://unlocode.rkbexplorer.com/id/AEDXB> So lets select only those.
Select by subject
SPARQL Query
SELECT *
WHERE {
<http://unlocode.rkbexplorer.com/id/AEDXB> ?predicate ?object
}
Result
predicate object <http://www.aktors.org/ontology/portal#has-latitude> 25.2500000 <http://www.aktors.org/ontology/portal#has-longitude> 55.2666666 <http://www.aktors.org/ontology/portal#is-located-in> <http://unlocode.rkbexplorer.com/id/AE> <http://www.aktors.org/ontology/support#has-pretty-name> Dubai rdf:type <http://www.aktors.org/ontology/portal#Town>
Select multiple predicates of one subject in one query
SPARQL Query
SELECT ?lat ?lon
WHERE {
<http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#has-latitude> ?lat.
<http://unlocode.rkbexplorer.com/id/AEDXB> <http://www.aktors.org/ontology/portal#has-latitude> ?lon.
}
result
lat lon 25.2500000 25.2500000