Difference between revisions of "Truly Tabular RDF"
(44 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | = Tutorial = | ||
+ | [http://wikidata.bitplan.com/tt/Q44 try Beer!] | ||
+ | If you set min% to 40 and then Generate the SPARQL Query the naive query is something like: | ||
+ | <source lang='sparql'> | ||
+ | # truly tabular naive query for | ||
+ | # Q44:beer | ||
+ | # generated by trulytabular.py version 0.4.6 on 2022-08-08T15:06:07.716686 | ||
+ | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | PREFIX schema: <http://schema.org/> | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX wikibase: <http://wikiba.se/ontology#> | ||
+ | PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
+ | |||
+ | SELECT ?beerItem ?beer | ||
+ | ?image | ||
+ | ?manufacturerItem | ||
+ | ?Google_Knowledge_Graph_ID | ||
+ | WHERE { | ||
+ | # instanceof Q44:beer | ||
+ | ?beerItem wdt:P31 wd:Q44. | ||
+ | # label | ||
+ | ?beerItem rdfs:label ?beer. | ||
+ | FILTER (LANG(?beer) = "en"). | ||
+ | # image (P18) | ||
+ | OPTIONAL { | ||
+ | ?beerItem wdt:P18 ?image. | ||
+ | } | ||
+ | # manufacturer (P176) | ||
+ | OPTIONAL { | ||
+ | ?beerItem wdt:P176 ?manufacturerItem. | ||
+ | ?manufacturerItem rdfs:label ?manufacturer. | ||
+ | FILTER (LANG(?manufacturer) = "en"). | ||
+ | } | ||
+ | # Google Knowledge Graph ID (P2671) | ||
+ | OPTIONAL { | ||
+ | ?beerItem wdt:P2671 ?Google_Knowledge_Graph_ID. | ||
+ | } | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | = Info = | ||
+ | The columns of the truly tabular tool at http://wikidata.bitplan.com are explained at [[Truly Tabular RDF/Info]] | ||
+ | = Motivation = | ||
+ | In the project to create the ConferenceCorpus as part of the ConfIDent project Scientific Event / Proceedings data needed to be imported | ||
+ | from the german national libraries GND Dataset. | ||
+ | The query needed to be modified after extensive analysis of the data. The schema definition as specified in the GND ontology alone was not sufficient. | ||
+ | {{Link|target=Truly Tabular RDF/GND}} | ||
+ | |||
+ | = Querying tabular data from Wikidata = | ||
+ | {{:GOTExample2022}} | ||
+ | |||
= Naive SPARQL Query = | = Naive SPARQL Query = | ||
− | # Start with a wikidata item your are intested in e.g. [https://www.wikidata.org/wiki/Q109296593 International Semantic Web Conference 2022] | + | # Start with a wikidata item your are intested in e.g. [https://www.wikidata.org/wiki/Q109296593 International Semantic Web Conference ISWC 2022] |
− | # use the [https://www.wikidata.org/wiki/Property:P31 instance of property] to find similar items of the same class [https://www.wikidata.org/wiki/Q2020153 academic conference] <source lang='sparql'> | + | # use the [https://www.wikidata.org/wiki/Property:P31 instance of property] to find similar items of the same class [https://www.wikidata.org/wiki/Q2020153 academic conference] |
+ | # straight-forward select further properties by adding statements similar to <source lang='sparql'>OPTIONAL { ?conference wdt:P1813 ?short_name }</source> to the WHERE clause. | ||
+ | ## [https://www.wikidata.org/wiki/Property:P1813 P1813 short name] | ||
+ | ## [https://www.wikidata.org/wiki/Property:P17 P17 country] | ||
+ | ## [https://www.wikidata.org/wiki/Property:P1476 P1476 title] | ||
+ | |||
+ | This naive approach will lead to more results for Step 3 (e.g. 7730) than for step 2 (e.g. 7695) which is a surprise for most novices since this effect would not happen with a similar SQL query <source lang='SQL'>SELECT short_name,country,title from academic_conference</source> | ||
+ | |||
+ | === Result of Step #2 === | ||
+ | <source lang='sparql'> | ||
# Academic conference wikidata query | # Academic conference wikidata query | ||
# WF 2021-01-30 | # WF 2021-01-30 | ||
Line 15: | Line 77: | ||
?conference rdfs:label ?conferenceLabel filter (lang(?conferenceLabel) = "en"). | ?conference rdfs:label ?conferenceLabel filter (lang(?conferenceLabel) = "en"). | ||
}</source> [https://query.wikidata.org/#%23%20Academic%20conference%20wikidata%20query%0A%23%20WF%202021-01-30%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0ASELECT%20%3Fconference%20%3FconferenceLabel%20%0AWHERE%0A%7B%0A%20%20%23%20%20academic%20conference%20series%20%28Q2020153%29%0A%20%20%3Fconference%20wdt%3AP31%20wd%3AQ2020153.%0A%20%20%23%20label%0A%20%20%3Fconference%20rdfs%3Alabel%20%3FconferenceLabel%20filter%20%28lang%28%3FconferenceLabel%29%20%3D%20%22en%22%29.%0A%7D%0A try it!] | }</source> [https://query.wikidata.org/#%23%20Academic%20conference%20wikidata%20query%0A%23%20WF%202021-01-30%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0ASELECT%20%3Fconference%20%3FconferenceLabel%20%0AWHERE%0A%7B%0A%20%20%23%20%20academic%20conference%20series%20%28Q2020153%29%0A%20%20%3Fconference%20wdt%3AP31%20wd%3AQ2020153.%0A%20%20%23%20label%0A%20%20%3Fconference%20rdfs%3Alabel%20%3FconferenceLabel%20filter%20%28lang%28%3FconferenceLabel%29%20%3D%20%22en%22%29.%0A%7D%0A try it!] | ||
+ | {| class="wikitable" style="text-align: left;" | ||
+ | |+ <!-- caption --> | ||
+ | |- | ||
+ | ! conference !! conferenceLabel | ||
+ | |- | ||
+ | | http://www.wikidata.org/entity/Q75698988 || The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies | ||
+ | |- | ||
+ | | http://www.wikidata.org/entity/Q75707991 || Digital Humanities 2020 | ||
+ | |- | ||
+ | | http://www.wikidata.org/entity/Q75709854 || Digital Humanities 2018 | ||
+ | |- | ||
+ | | ... | ||
+ | |} | ||
+ | === Result of Step 3 === | ||
+ | <source lang='SPARQL'> | ||
+ | # Academic conference wikidata query | ||
+ | # WF 2021-01-30 | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | SELECT | ||
+ | ?conference ?conferenceLabel | ||
+ | ?short_name | ||
+ | ?country | ||
+ | ?title | ||
+ | WHERE | ||
+ | { | ||
+ | # academic conference series (Q2020153) | ||
+ | ?conference wdt:P31 wd:Q2020153. | ||
+ | # label | ||
+ | ?conference rdfs:label ?conferenceLabel filter (lang(?conferenceLabel) = "en"). | ||
+ | # short name | ||
+ | OPTIONAL { ?conference wdt:P1813 ?short_name } | ||
+ | # country | ||
+ | OPTIONAL { ?conference wdt:P17 ?country } | ||
+ | # title | ||
+ | OPTIONAL { ?conference wdt:P1476 ?title } | ||
+ | } | ||
+ | </source> | ||
+ | [https://query.wikidata.org/#%23%20Academic%20conference%20wikidata%20query%0A%23%20WF%202021-01-30%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0ASELECT%20%0A%20%20%3Fconference%20%3FconferenceLabel%0A%20%20%3Fshort_name%0A%20%20%3Fcountry%0A%20%20%3Ftitle%0AWHERE%0A%7B%0A%20%20%23%20%20academic%20conference%20series%20%28Q2020153%29%0A%20%20%3Fconference%20wdt%3AP31%20wd%3AQ2020153.%0A%20%20%23%20label%0A%20%20%3Fconference%20rdfs%3Alabel%20%3FconferenceLabel%20filter%20%28lang%28%3FconferenceLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20short%20name%0A%20%20OPTIONAL%20%7B%20%3Fconference%20wdt%3AP1813%20%3Fshort_name%20%7D%0A%20%20%23%20country%0A%20%20OPTIONAL%20%7B%20%3Fconference%20wdt%3AP17%20%3Fcountry%20%7D%0A%20%20%23%20title%0A%20%20OPTIONAL%20%7B%20%3Fconference%20wdt%3AP1476%20%3Ftitle%20%7D%0A%7D%0A try it!] | ||
+ | == More elaborate example: novel series == | ||
+ | # start with [https://www.wikidata.org/wiki/Q15228 Lord of the Rings] | ||
+ | # find similar [https://www.wikidata.org/wiki/Q1667921 Novel Series] | ||
+ | # | ||
+ | === Naive SPARQL Query === | ||
+ | <source lang='sparql'> | ||
+ | # truly tabular query for | ||
+ | # Q1667921:novel series | ||
+ | # generated by trulytabular.py on 2022-07-27T17:33:43.681991 | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | SELECT ?novel_series ?novel_seriesLabel | ||
+ | ?instance_of | ||
+ | ?language_of_work_or_name | ||
+ | ?genre | ||
+ | ?author | ||
+ | ?country_of_origin | ||
+ | ?has_part_s_ | ||
+ | ?publication_date | ||
+ | ?Freebase_ID | ||
+ | ?ISFDB_series_ID | ||
+ | ?title | ||
+ | ?Google_Knowledge_Graph_ID | ||
+ | WHERE { | ||
+ | # instanceof Q1667921:novel series | ||
+ | ?novel_series wdt:P31 wd:Q1667921. | ||
+ | # label | ||
+ | ?novel_series rdfs:label ?novel_seriesLabel | ||
+ | FILTER (LANG(?novel_seriesLabel) = "en"). | ||
+ | # instance of (P31) | ||
+ | OPTIONAL { ?novel_series wdt:P31 ?instance_of. } | ||
+ | # language of work or name (P407) | ||
+ | OPTIONAL { ?novel_series wdt:P407 ?language_of_work_or_name. } | ||
+ | # genre (P136) | ||
+ | OPTIONAL { ?novel_series wdt:P136 ?genre. } | ||
+ | # author (P50) | ||
+ | OPTIONAL { ?novel_series wdt:P50 ?author. } | ||
+ | # country of origin (P495) | ||
+ | OPTIONAL { ?novel_series wdt:P495 ?country_of_origin. } | ||
+ | # has part(s) (P527) | ||
+ | OPTIONAL { ?novel_series wdt:P527 ?has_part_s_. } | ||
+ | # publication date (P577) | ||
+ | OPTIONAL { ?novel_series wdt:P577 ?publication_date. } | ||
+ | # Freebase ID (P646) | ||
+ | OPTIONAL { ?novel_series wdt:P646 ?Freebase_ID. } | ||
+ | # ISFDB series ID (P1235) | ||
+ | OPTIONAL { ?novel_series wdt:P1235 ?ISFDB_series_ID. } | ||
+ | # title (P1476) | ||
+ | OPTIONAL { ?novel_series wdt:P1476 ?title. } | ||
+ | # Google Knowledge Graph ID (P2671) | ||
+ | OPTIONAL { ?novel_series wdt:P2671 ?Google_Knowledge_Graph_ID. } | ||
+ | } | ||
+ | </source> | ||
+ | [https://query.wikidata.org//#%0A%23%20truly%20tabular%20query%20for%20%0A%23%20Q1667921%3Anovel%20series%0A%23%20generated%20by%20trulytabular.py%20on%202022-07-27T17%3A33%3A43.681991%0APREFIX%20wd%3A%20%3Chttp%3A//www.wikidata.org/entity/%3E%0APREFIX%20wdt%3A%20%3Chttp%3A//www.wikidata.org/prop/direct/%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A//www.w3.org/2000/01/rdf-schema%23%3E%0ASELECT%20%3Fnovel_series%20%3Fnovel_seriesLabel%0A%20%20%3Finstance_of%0A%20%20%3Flanguage_of_work_or_name%0A%20%20%3Fgenre%0A%20%20%3Fauthor%0A%20%20%3Fcountry_of_origin%0A%20%20%3Fhas_part_s_%0A%20%20%3Fpublication_date%0A%20%20%3FFreebase_ID%0A%20%20%3FISFDB_series_ID%0A%20%20%3Ftitle%0A%20%20%3FGoogle_Knowledge_Graph_ID%0AWHERE%20%7B%0A%20%20%23%20instanceof%20Q1667921%3Anovel%20series%0A%20%20%3Fnovel_series%20wdt%3AP31%20wd%3AQ1667921.%0A%20%20%23%20label%0A%20%20%3Fnovel_series%20rdfs%3Alabel%20%3Fnovel_seriesLabel%20%20%0A%20%20FILTER%20%28LANG%28%3Fnovel_seriesLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20instance%20of%20%28P31%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP31%20%3Finstance_of.%20%7D%0A%20%20%23%20language%20of%20work%20or%20name%20%28P407%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP407%20%3Flanguage_of_work_or_name.%20%7D%0A%20%20%23%20genre%20%28P136%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP136%20%3Fgenre.%20%7D%0A%20%20%23%20author%20%28P50%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP50%20%3Fauthor.%20%7D%0A%20%20%23%20country%20of%20origin%20%28P495%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP495%20%3Fcountry_of_origin.%20%7D%0A%20%20%23%20has%20part%28s%29%20%28P527%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP527%20%3Fhas_part_s_.%20%7D%0A%20%20%23%20publication%20date%20%28P577%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP577%20%3Fpublication_date.%20%7D%0A%20%20%23%20Freebase%20ID%20%28P646%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP646%20%3FFreebase_ID.%20%7D%0A%20%20%23%20ISFDB%20series%20ID%20%28P1235%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP1235%20%3FISFDB_series_ID.%20%7D%0A%20%20%23%20title%20%28P1476%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP1476%20%3Ftitle.%20%7D%0A%20%20%23%20Google%20Knowledge%20Graph%20ID%20%28P2671%29%0A%20%20OPTIONAL%20%7B%20%3Fnovel_series%20wdt%3AP2671%20%3FGoogle_Knowledge_Graph_ID.%20%7D%0A%7D%0A try it!] | ||
+ | == Aggregate SPARQL Query with SAMPLE == | ||
+ | <source lang='sparql'> | ||
+ | # truly tabular query for | ||
+ | # Q1667921:novel series | ||
+ | # generated by trulytabular.py on 2022-07-27T17:33:43.681991 | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | SELECT DISTINCT ?novel_series ?novel_seriesLabel | ||
+ | (SAMPLE (?instance_of) AS ?instance_of ) | ||
+ | (SAMPLE (?language_of_work_or_name) AS ?language_of_work_or_name) | ||
+ | (SAMPLE (?genre) AS ?genre) | ||
+ | (SAMPLE (?author) AS ?author) | ||
+ | (SAMPLE (?country_of_origin) AS ?country_of_origin) | ||
+ | (SAMPLE (?has_part_s_) AS ?has_part_s_) | ||
+ | (SAMPLE (?publication_date) AS ?publication_date) | ||
+ | (SAMPLE (?Freebase_ID) AS ?Freebase_ID) | ||
+ | (SAMPLE (?ISFDB_series_ID) AS ?ISFDB_series_ID) | ||
+ | (SAMPLE (?title) AS ?title ) | ||
+ | (SAMPLE (?Google_Knowledge_Graph_ID) AS ?Google_Knowledge_Graph_ID) | ||
+ | WHERE { | ||
+ | # instanceof Q1667921:novel series | ||
+ | ?novel_series wdt:P31 wd:Q1667921. | ||
+ | # label | ||
+ | ?novel_series rdfs:label ?novel_seriesLabel | ||
+ | FILTER (LANG(?novel_seriesLabel) = "en"). | ||
+ | # instance of (P31) | ||
+ | OPTIONAL { ?novel_series wdt:P31 ?instance_of. } | ||
+ | # language of work or name (P407) | ||
+ | OPTIONAL { ?novel_series wdt:P407 ?language_of_work_or_name. } | ||
+ | # genre (P136) | ||
+ | OPTIONAL { ?novel_series wdt:P136 ?genre. } | ||
+ | # author (P50) | ||
+ | OPTIONAL { ?novel_series wdt:P50 ?author. } | ||
+ | # country of origin (P495) | ||
+ | OPTIONAL { ?novel_series wdt:P495 ?country_of_origin. } | ||
+ | # has part(s) (P527) | ||
+ | OPTIONAL { ?novel_series wdt:P527 ?has_part_s_. } | ||
+ | # publication date (P577) | ||
+ | OPTIONAL { ?novel_series wdt:P577 ?publication_date. } | ||
+ | # Freebase ID (P646) | ||
+ | OPTIONAL { ?novel_series wdt:P646 ?Freebase_ID. } | ||
+ | # ISFDB series ID (P1235) | ||
+ | OPTIONAL { ?novel_series wdt:P1235 ?ISFDB_series_ID. } | ||
+ | # title (P1476) | ||
+ | OPTIONAL { ?novel_series wdt:P1476 ?title. } | ||
+ | # Google Knowledge Graph ID (P2671) | ||
+ | OPTIONAL { ?novel_series wdt:P2671 ?Google_Knowledge_Graph_ID. } | ||
+ | } GROUP BY ?novel_series ?novel_seriesLabe | ||
+ | </source> | ||
+ | |||
= How tabular are the Academic Conference entries in wikidata? = | = How tabular are the Academic Conference entries in wikidata? = | ||
+ | Result as of 2022-03 | ||
{| class="wikitable" style="text-align: left;" | {| class="wikitable" style="text-align: left;" | ||
|+ <!-- caption --> | |+ <!-- caption --> | ||
Line 22: | Line 231: | ||
|- | |- | ||
| ∑ || align="right"| 7518 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | ∑ || align="right"| 7518 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
+ | |- | ||
+ | | short name || align="right"| 6750 || align="right"| 6741 || align="right"| 89.8 || align="right"| 9 || align="right"| 0.1 || align="right"| 9 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
|- | |- | ||
| country || align="right"| 7077 || align="right"| 7077 || align="right"| 94.1 || align="right"| 0 || align="right"| 0 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | country || align="right"| 7077 || align="right"| 7077 || align="right"| 94.1 || align="right"| 0 || align="right"| 0 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
+ | |- | ||
+ | | title || align="right"| 6718 || align="right"| 6700 || align="right"| 89.4 || align="right"| 18 || align="right"| 0.3 || align="right"| 10 || align="right"| 8 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
|- | |- | ||
| part of the series || align="right"| 7139 || align="right"| 7120 || align="right"| 95 || align="right"| 19 || align="right"| 0.3 || align="right"| 15 || align="right"| 4 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | part of the series || align="right"| 7139 || align="right"| 7120 || align="right"| 95 || align="right"| 19 || align="right"| 0.3 || align="right"| 15 || align="right"| 4 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
Line 42: | Line 255: | ||
|- | |- | ||
| described at URL || align="right"| 6512 || align="right"| 6510 || align="right"| 86.6 || align="right"| 2 || align="right"| 0 || align="right"| 1 || align="right"| 1 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | described at URL || align="right"| 6512 || align="right"| 6510 || align="right"| 86.6 || align="right"| 2 || align="right"| 0 || align="right"| 1 || align="right"| 1 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
| language used || align="right"| 87 || align="right"| 84 || align="right"| 1.2 || align="right"| 3 || align="right"| 3.4 || align="right"| 3 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | language used || align="right"| 87 || align="right"| 84 || align="right"| 1.2 || align="right"| 3 || align="right"| 3.4 || align="right"| 3 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
Line 53: | Line 262: | ||
| WikiCFP event ID || align="right"| 98 || align="right"| 98 || align="right"| 1.3 || align="right"| 0 || align="right"| 0 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | | WikiCFP event ID || align="right"| 98 || align="right"| 98 || align="right"| 1.3 || align="right"| 0 || align="right"| 0 || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| || align="right"| | ||
|} | |} | ||
+ | = Truly tabular examples = | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q2020153 academic conference] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q47258130 scientific conference series] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q644371 international airport] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q3624078 sovereign state] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q131734 brewery] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q5107 continent] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q7075 library] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q1667921 novel series] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q1637706 million city] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q33506 museum] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q44613 monastery] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q2133344 space mission] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q24748034 gravitational wave events] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q3937 supernova] | ||
+ | * [https://confident.dbis.rwth-aachen.de/pyose/tt/Q3918 university] | ||
+ | |||
[[Category:Truly Tabular RDF]] | [[Category:Truly Tabular RDF]] |
Latest revision as of 14:07, 8 August 2022
Tutorial
try Beer! If you set min% to 40 and then Generate the SPARQL Query the naive query is something like:
# truly tabular naive query for
# Q44:beer
# generated by trulytabular.py version 0.4.6 on 2022-08-08T15:06:07.716686
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?beerItem ?beer
?image
?manufacturerItem
?Google_Knowledge_Graph_ID
WHERE {
# instanceof Q44:beer
?beerItem wdt:P31 wd:Q44.
# label
?beerItem rdfs:label ?beer.
FILTER (LANG(?beer) = "en").
# image (P18)
OPTIONAL {
?beerItem wdt:P18 ?image.
}
# manufacturer (P176)
OPTIONAL {
?beerItem wdt:P176 ?manufacturerItem.
?manufacturerItem rdfs:label ?manufacturer.
FILTER (LANG(?manufacturer) = "en").
}
# Google Knowledge Graph ID (P2671)
OPTIONAL {
?beerItem wdt:P2671 ?Google_Knowledge_Graph_ID.
}
}
Info
The columns of the truly tabular tool at http://wikidata.bitplan.com are explained at Truly Tabular RDF/Info
Motivation
In the project to create the ConferenceCorpus as part of the ConfIDent project Scientific Event / Proceedings data needed to be imported from the german national libraries GND Dataset. The query needed to be modified after extensive analysis of the data. The schema definition as specified in the GND ontology alone was not sufficient. Truly Tabular RDF/GND
Querying tabular data from Wikidata
Game of Thrones / Jon Snow Example
A starting point for analyzing the data in a triplestore such as wikidata might be a single item of interest such as the character Jon Snow instance ofGame of thrones character
GOTCountQuery
SPARQL Query to count Game of Thrones character items in Wikidata
SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE
{
# Q20086263:Game of Thrones character
?item wdt:P31 wd:Q20086263;
}
try it! The above SPARQL query will let us know the total amount of similar items we might find in Wikidata. which is 117 at the time of writing this work.
A quite natural idea is to create a query now to get a table of all Game of thrones characters with columns for the most interesting aspects.
As a starting point we might use the Wikidata Query Service and select the sample "Cats" and refine it for our purposes with some properties we find on Jon Snows page to create a "naive" query:
NaiveGOTQuery1
# Game of Thrones Characters
SELECT
?item ?itemLabel
?genderLabel
?countryLabel
?placeOfBirthLabel
?placeOfDeathLabel
WHERE
{
?item wdt:P31 wd:Q20086263. # must be a game of thrones character
?item wdt:P21 ?gender.
?item wdt:P27 ?country.
?item wdt:P19 ?placeOfBirth.
?item wdt:P20 ?placeOfDeath.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
result
item | itemLabel | genderLabel | countryLabel | placeOfBirthLabel | placeOfDeathLabel |
---|---|---|---|---|---|
http://www.wikidata.org/entity/Q259818 | Eddard Stark | male | Seven Kingdoms | Winterfell | King's Landing |
http://www.wikidata.org/entity/Q1120793 | Theon Greyjoy | male | Seven Kingdoms | Pyke | Winterfell |
http://www.wikidata.org/entity/Q2941743 | Catelyn Stark | female | Seven Kingdoms | Riverrun | Twins |
http://www.wikidata.org/entity/Q3183235 | Jon Snow | male | Seven Kingdoms | Tower of Joy | Castle Black |
http://www.wikidata.org/entity/Q3665163 | Cersei Lannister | female | Seven Kingdoms | Casterly Rock | Red Keep |
http://www.wikidata.org/entity/Q3806180 | Jaime Lannister | male | Seven Kingdoms | Casterly Rock | Red Keep |
http://www.wikidata.org/entity/Q11680202 | Beric Dondarrion | male | Seven Kingdoms | The Stormlands | Winterfell |
http://www.wikidata.org/entity/Q12056060 | Stannis Baratheon | male | Seven Kingdoms | Storm's End | The North |
http://www.wikidata.org/entity/Q12815803 | Rickon Stark | male | Seven Kingdoms | Winterfell | The North |
http://www.wikidata.org/entity/Q12900597 | Joffrey Baratheon | male | Seven Kingdoms | King's Landing | King's Landing |
http://www.wikidata.org/entity/Q12900933 | Margaery Tyrell | female | Seven Kingdoms | Highgarden | Great Sept of Baelor |
http://www.wikidata.org/entity/Q12902445 | Tywin Lannister | male | Seven Kingdoms | Casterly Rock | King's Landing |
http://www.wikidata.org/entity/Q12902594 | Viserys Targaryen | male | Seven Kingdoms | King's Landing | Vaes Dothrak |
http://www.wikidata.org/entity/Q13634884 | Robb Stark | male | Seven Kingdoms | Riverrun | Twins |
http://www.wikidata.org/entity/Q13634885 | Robert Baratheon | male | Seven Kingdoms | Storm's End | King's Landing |
http://www.wikidata.org/entity/Q17355556 | Janos Slynt | male | Seven Kingdoms | King's Landing | The North |
http://www.wikidata.org/entity/Q18920137 | Loras Tyrell | male | Seven Kingdoms | Highgarden | Great Sept of Baelor |
http://www.wikidata.org/entity/Q19791067 | Lysa Arryn | female | Seven Kingdoms | Riverrun | Eyrie |
http://www.wikidata.org/entity/Q23730371 | Hoster Tully | male | Seven Kingdoms | Riverrun | Winterfell |
http://www.wikidata.org/entity/Q23747242 | Shireen Baratheon | female | Seven Kingdoms | Dragonstone | The North |
http://www.wikidata.org/entity/Q23759020 | Rickard Stark | male | Seven Kingdoms | Winterfell | King's Landing |
http://www.wikidata.org/entity/Q23759067 | Benjen Stark | male | Seven Kingdoms | Winterfell | Beyond the Wall |
http://www.wikidata.org/entity/Q23759071 | Lyanna Stark | female | Seven Kingdoms | Winterfell | Tower of Joy |
http://www.wikidata.org/entity/Q23782360 | Brynden Tully | male | Seven Kingdoms | Riverrun | Riverrun |
http://www.wikidata.org/entity/Q23782368 | Walder Frey | male | Seven Kingdoms | Twins | Twins |
http://www.wikidata.org/entity/Q24814689 | Aemon Targaryen | male | Seven Kingdoms | King's Landing | Castle Black |
http://www.wikidata.org/entity/Q28468107 | Doran Martell | male | Seven Kingdoms | Sunspear | Dorne |
http://www.wikidata.org/entity/Q55433701 | Jojen Reed | male | Seven Kingdoms | Greywater Watch | Beyond the Wall |
The result of this query is nicely tabular and can be exported to CSV format to be imported to our favorite relational database or spreadsheet - but it does not contain 117 entries any more but only 28. SPARQL queries will not return "NULL" values as our relational database query language SQL would. The query needs to be amended with the "OPTIONAL" keyword as shown below.
NaiveGOTQuery2
Naive Game of Thrones Character tabular query with OPTIONAL
# Game of Thrones Characters
SELECT
?item ?itemLabel
?genderLabel
?countryLabel
?placeOfBirthLabel
?placeOfDeathLabel
WHERE
{
?item wdt:P31 wd:Q20086263. # must be a game of thrones character
OPTIONAL { ?item wdt:P21 ?gender}.
OPTIONAL { ?item wdt:P27 ?country}.
OPTIONAL { ?item wdt:P19 ?placeOfBirth}.
OPTIONAL { ?item wdt:P20 ?placeOfDeath}.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This table looks promising it has 117 entries and the columns are filled as expected.
The real surprise for a novice will be the result of the Query below when three more columns have been added. All over sudden the query result has 304 rows for our 117 characters. There are duplicate rows for each character and strange combinations of entries. A notorious example seems to be Jorah Mormont having 8 different values for the "occupation" column.
NaiveGOTQuery3
Naive Game of Thrones Character tabular query with duplicate rows for some characters
# Game of Thrones Characters
SELECT
?item ?itemLabel
?genderLabel
?countryLabel
?placeOfBirthLabel
?placeOfDeathLabel
?killedByLabel
?familyLabel
?occupationLabel
WHERE
{
?item wdt:P31 wd:Q20086263. # must be a game of thrones character
OPTIONAL { ?item wdt:P21 ?gender}.
OPTIONAL { ?item wdt:P27 ?country}.
OPTIONAL { ?item wdt:P19 ?placeOfBirth}.
OPTIONAL { ?item wdt:P20 ?placeOfDeath}.
OPTIONAL { ?item wdt:P157 ?killedBy}.
OPTIONAL { ?item wdt:P53 ?family}.
OPTIONAL { ?item wdt:P106 ?occupation}.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Naive SPARQL Query
- Start with a wikidata item your are intested in e.g. International Semantic Web Conference ISWC 2022
- use the instance of property to find similar items of the same class academic conference
- straight-forward select further properties by adding statements similar to to the WHERE clause.
OPTIONAL { ?conference wdt:P1813 ?short_name }
This naive approach will lead to more results for Step 3 (e.g. 7730) than for step 2 (e.g. 7695) which is a surprise for most novices since this effect would not happen with a similar SQL query
SELECT short_name,country,title from academic_conference
Result of Step #2
# Academic conference wikidata query
# WF 2021-01-30
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?conference ?conferenceLabel
WHERE
{
# academic conference series (Q2020153)
?conference wdt:P31 wd:Q2020153.
# label
?conference rdfs:label ?conferenceLabel filter (lang(?conferenceLabel) = "en").
}
conference | conferenceLabel |
---|---|
http://www.wikidata.org/entity/Q75698988 | The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
http://www.wikidata.org/entity/Q75707991 | Digital Humanities 2020 |
http://www.wikidata.org/entity/Q75709854 | Digital Humanities 2018 |
... |
Result of Step 3
# Academic conference wikidata query
# WF 2021-01-30
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT
?conference ?conferenceLabel
?short_name
?country
?title
WHERE
{
# academic conference series (Q2020153)
?conference wdt:P31 wd:Q2020153.
# label
?conference rdfs:label ?conferenceLabel filter (lang(?conferenceLabel) = "en").
# short name
OPTIONAL { ?conference wdt:P1813 ?short_name }
# country
OPTIONAL { ?conference wdt:P17 ?country }
# title
OPTIONAL { ?conference wdt:P1476 ?title }
}
More elaborate example: novel series
- start with Lord of the Rings
- find similar Novel Series
Naive SPARQL Query
# truly tabular query for
# Q1667921:novel series
# generated by trulytabular.py on 2022-07-27T17:33:43.681991
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?novel_series ?novel_seriesLabel
?instance_of
?language_of_work_or_name
?genre
?author
?country_of_origin
?has_part_s_
?publication_date
?Freebase_ID
?ISFDB_series_ID
?title
?Google_Knowledge_Graph_ID
WHERE {
# instanceof Q1667921:novel series
?novel_series wdt:P31 wd:Q1667921.
# label
?novel_series rdfs:label ?novel_seriesLabel
FILTER (LANG(?novel_seriesLabel) = "en").
# instance of (P31)
OPTIONAL { ?novel_series wdt:P31 ?instance_of. }
# language of work or name (P407)
OPTIONAL { ?novel_series wdt:P407 ?language_of_work_or_name. }
# genre (P136)
OPTIONAL { ?novel_series wdt:P136 ?genre. }
# author (P50)
OPTIONAL { ?novel_series wdt:P50 ?author. }
# country of origin (P495)
OPTIONAL { ?novel_series wdt:P495 ?country_of_origin. }
# has part(s) (P527)
OPTIONAL { ?novel_series wdt:P527 ?has_part_s_. }
# publication date (P577)
OPTIONAL { ?novel_series wdt:P577 ?publication_date. }
# Freebase ID (P646)
OPTIONAL { ?novel_series wdt:P646 ?Freebase_ID. }
# ISFDB series ID (P1235)
OPTIONAL { ?novel_series wdt:P1235 ?ISFDB_series_ID. }
# title (P1476)
OPTIONAL { ?novel_series wdt:P1476 ?title. }
# Google Knowledge Graph ID (P2671)
OPTIONAL { ?novel_series wdt:P2671 ?Google_Knowledge_Graph_ID. }
}
Aggregate SPARQL Query with SAMPLE
# truly tabular query for
# Q1667921:novel series
# generated by trulytabular.py on 2022-07-27T17:33:43.681991
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?novel_series ?novel_seriesLabel
(SAMPLE (?instance_of) AS ?instance_of )
(SAMPLE (?language_of_work_or_name) AS ?language_of_work_or_name)
(SAMPLE (?genre) AS ?genre)
(SAMPLE (?author) AS ?author)
(SAMPLE (?country_of_origin) AS ?country_of_origin)
(SAMPLE (?has_part_s_) AS ?has_part_s_)
(SAMPLE (?publication_date) AS ?publication_date)
(SAMPLE (?Freebase_ID) AS ?Freebase_ID)
(SAMPLE (?ISFDB_series_ID) AS ?ISFDB_series_ID)
(SAMPLE (?title) AS ?title )
(SAMPLE (?Google_Knowledge_Graph_ID) AS ?Google_Knowledge_Graph_ID)
WHERE {
# instanceof Q1667921:novel series
?novel_series wdt:P31 wd:Q1667921.
# label
?novel_series rdfs:label ?novel_seriesLabel
FILTER (LANG(?novel_seriesLabel) = "en").
# instance of (P31)
OPTIONAL { ?novel_series wdt:P31 ?instance_of. }
# language of work or name (P407)
OPTIONAL { ?novel_series wdt:P407 ?language_of_work_or_name. }
# genre (P136)
OPTIONAL { ?novel_series wdt:P136 ?genre. }
# author (P50)
OPTIONAL { ?novel_series wdt:P50 ?author. }
# country of origin (P495)
OPTIONAL { ?novel_series wdt:P495 ?country_of_origin. }
# has part(s) (P527)
OPTIONAL { ?novel_series wdt:P527 ?has_part_s_. }
# publication date (P577)
OPTIONAL { ?novel_series wdt:P577 ?publication_date. }
# Freebase ID (P646)
OPTIONAL { ?novel_series wdt:P646 ?Freebase_ID. }
# ISFDB series ID (P1235)
OPTIONAL { ?novel_series wdt:P1235 ?ISFDB_series_ID. }
# title (P1476)
OPTIONAL { ?novel_series wdt:P1476 ?title. }
# Google Knowledge Graph ID (P2671)
OPTIONAL { ?novel_series wdt:P2671 ?Google_Knowledge_Graph_ID. }
} GROUP BY ?novel_series ?novel_seriesLabe
How tabular are the Academic Conference entries in wikidata?
Result as of 2022-03
property | total | f1 | total% | non tabular | non tabular% | f2 | f3 | f14 | f4 | f7 | f5 | f9 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
∑ | 7518 | |||||||||||
short name | 6750 | 6741 | 89.8 | 9 | 0.1 | 9 | ||||||
country | 7077 | 7077 | 94.1 | 0 | 0 | |||||||
title | 6718 | 6700 | 89.4 | 18 | 0.3 | 10 | 8 | |||||
part of the series | 7139 | 7120 | 95 | 19 | 0.3 | 15 | 4 | |||||
VIAF ID | 2096 | 2092 | 27.9 | 4 | 0.2 | 3 | 1 | |||||
GND ID | 3049 | 3043 | 40.6 | 6 | 0.2 | 4 | 2 | |||||
location | 7209 | 7180 | 95.9 | 29 | 0.4 | 24 | 4 | 1 | ||||
start time | 6916 | 6914 | 92 | 2 | 0 | 2 | ||||||
end time | 6912 | 6909 | 91.9 | 3 | 0 | 3 | ||||||
official website | 596 | 586 | 7.9 | 10 | 1.7 | 9 | 1 | |||||
main subject | 1882 | 1722 | 25 | 160 | 8.5 | 131 | 23 | 2 | 2 | 1 | 1 | |
described at URL | 6512 | 6510 | 86.6 | 2 | 0 | 1 | 1 | |||||
language used | 87 | 84 | 1.2 | 3 | 3.4 | 3 | ||||||
is proceedings from | 921 | 901 | 12.3 | 20 | 2.2 | 16 | 3 | 1 | ||||
WikiCFP event ID | 98 | 98 | 1.3 | 0 | 0 |