Difference between revisions of "Truly Tabular RDF/Info"

From BITPlan Wiki
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
= <nowiki>#</nowiki> =
+
This page explains the table columns being used in the [[Truly Tabular RDF]] analysis tool at http://wikidata.bitplan.com
 +
 
 +
= Property columns =
 +
== <nowiki>#</nowiki> ==
 
rank of the property in order of the percentage of instances where at lest one values is available for the property
 
rank of the property in order of the percentage of instances where at lest one values is available for the property
 
+
== <nowiki>%</nowiki> ==
= pareto =
+
The percentage of instances where at least one value is available for the property
Level according to the [https://en.wikipedia.org/wiki/Pareto_principle Pareto principle] 80:20 (1 out of 5).
+
== pareto ==
 +
The Pareto level according to the [https://en.wikipedia.org/wiki/Pareto_principle Pareto principle] 80:20 (1 out of 5) as a logarithmic scale to the basis 5..
  
 
{| class="wikitable" style="text-align: left;"
 
{| class="wikitable" style="text-align: left;"
Line 29: Line 33:
 
|}
 
|}
  
 
+
== property ==
= property =
 
 
A [https://www.wikidata.org/wiki/Wikidata:List_of_properties Wikidata Property] e.g.
 
A [https://www.wikidata.org/wiki/Wikidata:List_of_properties Wikidata Property] e.g.
 
[https://www.wikidata.org/wiki/Property:P31 P31/instance of]
 
[https://www.wikidata.org/wiki/Property:P31 P31/instance of]
 +
== propertyId ==
 +
The property Identifier for a Property e.g. P31 for [https://www.wikidata.org/wiki/Property:P31 P31/instance of]
 +
== type ==
 +
a wikibase type see [https://www.wikidata.org/wiki/Help:Data_type#Supported_data_types Supported data types]
 +
 +
= Statistics =
 +
== 1 ==
 +
number of truly tabular entries with a cardinality of 1
 +
 +
== maxf ==
 +
maximum frequency / cardinality of the property
 +
 +
== nt ==
 +
Number of non tabular entries - having a cardinality > 1
 +
 +
== nt% ==
 +
Percentage of non tabular entries.
 +
 +
== ?f ==
 +
try it link to query that retrievs the frequency histogramm for this property
 +
E.g. for the  property [https://www.wikidata.org/wiki/Property:P856 official website(P856)] as queried for
 +
instances of the class Q3918 university the query used is:
 +
<source lang='sparql'>
 +
# This query was generated by Truly Tabular
 +
# Count all Q3918:university items
 +
# with the given official website(P856) https://www.wikidata.org/wiki/Property:P856
 +
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 +
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 +
PREFIX schema: <http://schema.org/>
 +
PREFIX wd: <http://www.wikidata.org/entity/>
 +
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
 +
PREFIX wikibase: <http://wikiba.se/ontology#>
 +
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 +
 +
SELECT ?count (COUNT(?count) AS ?frequency) WHERE {{
 +
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
 +
WHERE
 +
{
 +
  # instance of university
 +
  ?item wdt:P31 wd:Q3918.
 +
  ?item rdfs:label ?itemLabel.
 +
  FILTER (LANG(?itemLabel) = "en").
 +
  # official website
 +
  ?item wdt:P856 ?value.
 +
} GROUP BY ?item ?itemLabel
 +
 +
}}
 +
GROUP BY ?count
 +
ORDER BY DESC (?frequency)
 +
</source>
 +
[https://query.wikidata.org/#%23%20This%20query%20was%20generated%20by%20Truly%20Tabular%0A%23%20Count%20all%20Q3918%3Auniversity%20items%0A%23%20with%20the%20given%20official%20website%28P856%29%20https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FProperty%3AP856%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fcount%20%28COUNT%28%3Fcount%29%20AS%20%3Ffrequency%29%20WHERE%20%7B%7B%0ASELECT%20%3Fitem%20%3FitemLabel%20%28COUNT%20%28%3Fvalue%29%20AS%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%23%20instance%20of%20university%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ3918.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3FitemLabel.%0A%20%20FILTER%20%28LANG%28%3FitemLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20official%20website%0A%20%20%3Fitem%20wdt%3AP856%20%3Fvalue.%0A%7D%20GROUP%20BY%20%3Fitem%20%3FitemLabel%0A%0A%7D%7D%0AGROUP%20BY%20%3Fcount%0AORDER%20BY%20DESC%20%28%3Ffrequency%29 try it!]
 +
 +
== ?ex ==
 +
try it! link to examples for "non-tabular" entries. E.g. for the property "manufacturer" of the class "beer" the query
 +
<source lang='SPARQL'>
 +
# This query was generated by Truly Tabular
 +
# Count all Q44:beer items
 +
# with the given manufacturer(P176) https://www.wikidata.org/wiki/Property:P176
 +
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 +
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 +
PREFIX schema: <http://schema.org/>
 +
PREFIX wd: <http://www.wikidata.org/entity/>
 +
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
 +
PREFIX wikibase: <http://wikiba.se/ontology#>
 +
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 +
 +
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
 +
WHERE
 +
{
 +
  # instance of beer
 +
  ?item wdt:P31 wd:Q44.
 +
  ?item rdfs:label ?itemLabel.
 +
  FILTER (LANG(?itemLabel) = "en").
 +
  # manufacturer
 +
  ?item wdt:P176 ?value.
 +
} GROUP BY ?item ?itemLabel
 +
 +
HAVING (COUNT (?value) > 1)
 +
ORDER BY DESC(?count) try it
 +
</source>
 +
[https://query.wikidata.org/#%23%20This%20query%20was%20generated%20by%20Truly%20Tabular%0A%23%20Count%20all%20Q44%3Abeer%20items%0A%23%20with%20the%20given%20manufacturer%28P176%29%20https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FProperty%3AP176%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fitem%20%3FitemLabel%20%28COUNT%20%28%3Fvalue%29%20AS%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%23%20instance%20of%20beer%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ44.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3FitemLabel.%0A%20%20FILTER%20%28LANG%28%3FitemLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20manufacturer%0A%20%20%3Fitem%20wdt%3AP176%20%3Fvalue.%0A%7D%20GROUP%20BY%20%3Fitem%20%3FitemLabel%0A%0AHAVING%20%28COUNT%20%28%3Fvalue%29%20%3E%201%29%0AORDER%20BY%20DESC%28%3Fcount%29 try it!]
 +
will be generated which reveals that there are two kinds of beers that have two manufacturers:
 +
[http://www.wikidata.org/entity/Q15980473 Žatecký Gus] which is manufactured by [https://www.wikidata.org/wiki/Q4035888 Carlsberg Ukraine] and [https://www.wikidata.org/wiki/Q805734 Baltika Breweries]
 +
and [http://www.wikidata.org/entity/Q789278 Balatoni Világos] which is manufactured by [https://www.wikidata.org/wiki/Q1214899 Nagykanizsai Sörgyár Rt. (until 1999)] and [https://www.wikidata.org/wiki/Q909817 Dreher Breweries]
 +
 +
== ✔ ==
 +
Check mark that the property statistics for this property have been calculated successfully.
 +
 +
= Aggregates =
 +
== count ==
 +
apply SPARQL <source lang='sparql'>COUNT()</source> aggregate
 +
== min ==
 +
apply SPARQL <source lang='sparql'>MIN()</source> aggregate
 +
== max ==
 +
apply SPARQL <source lang='sparql'>MAX()</source> aggregate
 +
== avg ==
 +
apply SPARQL <source lang='sparql'>AVG()</source> aggregate
 +
== sample ==
 +
apply SPARQL <source lang='sparql'>SAMPLE()</source> aggregate
 +
== list ==
 +
apply <source lang='sparql'>GROUP_CONCAT()</source> aggregate to avoid multiple solutions for the same instance
 +
 +
= ignore =
 +
Ignore SPARQL solutions that have multiple values for the given property by using a <source lang='sparql'>HAVING COUNT<=1</source> aggregate condition in the generated query
 +
= label =
 +
Show the label of the property result in the generated SPARQL query.
 +
 +
= select =
 +
If a property is selected it will be included in the generated SPARQL query

Latest revision as of 05:34, 5 August 2022

This page explains the table columns being used in the Truly Tabular RDF analysis tool at http://wikidata.bitplan.com

Property columns

#

rank of the property in order of the percentage of instances where at lest one values is available for the property

%

The percentage of instances where at least one value is available for the property

pareto

The Pareto level according to the Pareto principle 80:20 (1 out of 5) as a logarithmic scale to the basis 5..

Pareto levels
level ratio 1 out of
1 80:20 5
2 96:4 25
3 99.2:0.8 125
4 99.84:0.16 625
5 99.97:0.03 3125
6 99.994:0.006 15625
7 99.9987:0.0013 78125
8 99.99974:0.00026 390625
9 99.99995:0.00005 1953125

property

A Wikidata Property e.g. P31/instance of

propertyId

The property Identifier for a Property e.g. P31 for P31/instance of

type

a wikibase type see Supported data types

Statistics

1

number of truly tabular entries with a cardinality of 1

maxf

maximum frequency / cardinality of the property

nt

Number of non tabular entries - having a cardinality > 1

nt%

Percentage of non tabular entries.

?f

try it link to query that retrievs the frequency histogramm for this property E.g. for the property official website(P856) as queried for instances of the class Q3918 university the query used is:

# This query was generated by Truly Tabular
# Count all Q3918:university items
# with the given official website(P856) https://www.wikidata.org/wiki/Property:P856 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?count (COUNT(?count) AS ?frequency) WHERE {{
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
  # instance of university
  ?item wdt:P31 wd:Q3918.
  ?item rdfs:label ?itemLabel.
  FILTER (LANG(?itemLabel) = "en").
  # official website
  ?item wdt:P856 ?value.
} GROUP BY ?item ?itemLabel

}}
GROUP BY ?count
ORDER BY DESC (?frequency)

try it!

?ex

try it! link to examples for "non-tabular" entries. E.g. for the property "manufacturer" of the class "beer" the query

# This query was generated by Truly Tabular
# Count all Q44:beer items
# with the given manufacturer(P176) https://www.wikidata.org/wiki/Property:P176 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
  # instance of beer
  ?item wdt:P31 wd:Q44.
  ?item rdfs:label ?itemLabel.
  FILTER (LANG(?itemLabel) = "en").
  # manufacturer
  ?item wdt:P176 ?value.
} GROUP BY ?item ?itemLabel

HAVING (COUNT (?value) > 1)
ORDER BY DESC(?count) try it

try it! will be generated which reveals that there are two kinds of beers that have two manufacturers: Žatecký Gus which is manufactured by Carlsberg Ukraine and Baltika Breweries and Balatoni Világos which is manufactured by Nagykanizsai Sörgyár Rt. (until 1999) and Dreher Breweries

Check mark that the property statistics for this property have been calculated successfully.

Aggregates

count

apply SPARQL

COUNT()

aggregate

min

apply SPARQL

MIN()

aggregate

max

apply SPARQL

MAX()

aggregate

avg

apply SPARQL

AVG()

aggregate

sample

apply SPARQL

SAMPLE()

aggregate

list

apply

GROUP_CONCAT()

aggregate to avoid multiple solutions for the same instance

ignore

Ignore SPARQL solutions that have multiple values for the given property by using a

HAVING COUNT<=1

aggregate condition in the generated query

label

Show the label of the property result in the generated SPARQL query.

select

If a property is selected it will be included in the generated SPARQL query