Difference between revisions of "Truly Tabular RDF/Info"
(→pareto) |
|||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | = <nowiki>#</nowiki> = | + | This page explains the table columns being used in the [[Truly Tabular RDF]] analysis tool at http://wikidata.bitplan.com |
+ | |||
+ | = Property columns = | ||
+ | == <nowiki>#</nowiki> == | ||
rank of the property in order of the percentage of instances where at lest one values is available for the property | rank of the property in order of the percentage of instances where at lest one values is available for the property | ||
− | = <nowiki>%</nowiki> = | + | == <nowiki>%</nowiki> == |
The percentage of instances where at least one value is available for the property | The percentage of instances where at least one value is available for the property | ||
− | = pareto = | + | == pareto == |
The Pareto level according to the [https://en.wikipedia.org/wiki/Pareto_principle Pareto principle] 80:20 (1 out of 5) as a logarithmic scale to the basis 5.. | The Pareto level according to the [https://en.wikipedia.org/wiki/Pareto_principle Pareto principle] 80:20 (1 out of 5) as a logarithmic scale to the basis 5.. | ||
Line 30: | Line 33: | ||
|} | |} | ||
− | = property = | + | == property == |
A [https://www.wikidata.org/wiki/Wikidata:List_of_properties Wikidata Property] e.g. | A [https://www.wikidata.org/wiki/Wikidata:List_of_properties Wikidata Property] e.g. | ||
[https://www.wikidata.org/wiki/Property:P31 P31/instance of] | [https://www.wikidata.org/wiki/Property:P31 P31/instance of] | ||
+ | == propertyId == | ||
+ | The property Identifier for a Property e.g. P31 for [https://www.wikidata.org/wiki/Property:P31 P31/instance of] | ||
+ | == type == | ||
+ | a wikibase type see [https://www.wikidata.org/wiki/Help:Data_type#Supported_data_types Supported data types] | ||
+ | |||
+ | = Statistics = | ||
+ | == 1 == | ||
+ | number of truly tabular entries with a cardinality of 1 | ||
+ | |||
+ | == maxf == | ||
+ | maximum frequency / cardinality of the property | ||
+ | |||
+ | == nt == | ||
+ | Number of non tabular entries - having a cardinality > 1 | ||
+ | |||
+ | == nt% == | ||
+ | Percentage of non tabular entries. | ||
+ | |||
+ | == ?f == | ||
+ | try it link to query that retrievs the frequency histogramm for this property | ||
+ | E.g. for the property [https://www.wikidata.org/wiki/Property:P856 official website(P856)] as queried for | ||
+ | instances of the class Q3918 university the query used is: | ||
+ | <source lang='sparql'> | ||
+ | # This query was generated by Truly Tabular | ||
+ | # Count all Q3918:university items | ||
+ | # with the given official website(P856) https://www.wikidata.org/wiki/Property:P856 | ||
+ | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | PREFIX schema: <http://schema.org/> | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX wikibase: <http://wikiba.se/ontology#> | ||
+ | PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
+ | |||
+ | SELECT ?count (COUNT(?count) AS ?frequency) WHERE {{ | ||
+ | SELECT ?item ?itemLabel (COUNT (?value) AS ?count) | ||
+ | WHERE | ||
+ | { | ||
+ | # instance of university | ||
+ | ?item wdt:P31 wd:Q3918. | ||
+ | ?item rdfs:label ?itemLabel. | ||
+ | FILTER (LANG(?itemLabel) = "en"). | ||
+ | # official website | ||
+ | ?item wdt:P856 ?value. | ||
+ | } GROUP BY ?item ?itemLabel | ||
+ | |||
+ | }} | ||
+ | GROUP BY ?count | ||
+ | ORDER BY DESC (?frequency) | ||
+ | </source> | ||
+ | [https://query.wikidata.org/#%23%20This%20query%20was%20generated%20by%20Truly%20Tabular%0A%23%20Count%20all%20Q3918%3Auniversity%20items%0A%23%20with%20the%20given%20official%20website%28P856%29%20https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FProperty%3AP856%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fcount%20%28COUNT%28%3Fcount%29%20AS%20%3Ffrequency%29%20WHERE%20%7B%7B%0ASELECT%20%3Fitem%20%3FitemLabel%20%28COUNT%20%28%3Fvalue%29%20AS%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%23%20instance%20of%20university%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ3918.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3FitemLabel.%0A%20%20FILTER%20%28LANG%28%3FitemLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20official%20website%0A%20%20%3Fitem%20wdt%3AP856%20%3Fvalue.%0A%7D%20GROUP%20BY%20%3Fitem%20%3FitemLabel%0A%0A%7D%7D%0AGROUP%20BY%20%3Fcount%0AORDER%20BY%20DESC%20%28%3Ffrequency%29 try it!] | ||
+ | |||
+ | == ?ex == | ||
+ | try it! link to examples for "non-tabular" entries. E.g. for the property "manufacturer" of the class "beer" the query | ||
+ | <source lang='SPARQL'> | ||
+ | # This query was generated by Truly Tabular | ||
+ | # Count all Q44:beer items | ||
+ | # with the given manufacturer(P176) https://www.wikidata.org/wiki/Property:P176 | ||
+ | PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> | ||
+ | PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
+ | PREFIX schema: <http://schema.org/> | ||
+ | PREFIX wd: <http://www.wikidata.org/entity/> | ||
+ | PREFIX wdt: <http://www.wikidata.org/prop/direct/> | ||
+ | PREFIX wikibase: <http://wikiba.se/ontology#> | ||
+ | PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
+ | |||
+ | SELECT ?item ?itemLabel (COUNT (?value) AS ?count) | ||
+ | WHERE | ||
+ | { | ||
+ | # instance of beer | ||
+ | ?item wdt:P31 wd:Q44. | ||
+ | ?item rdfs:label ?itemLabel. | ||
+ | FILTER (LANG(?itemLabel) = "en"). | ||
+ | # manufacturer | ||
+ | ?item wdt:P176 ?value. | ||
+ | } GROUP BY ?item ?itemLabel | ||
+ | |||
+ | HAVING (COUNT (?value) > 1) | ||
+ | ORDER BY DESC(?count) try it | ||
+ | </source> | ||
+ | [https://query.wikidata.org/#%23%20This%20query%20was%20generated%20by%20Truly%20Tabular%0A%23%20Count%20all%20Q44%3Abeer%20items%0A%23%20with%20the%20given%20manufacturer%28P176%29%20https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FProperty%3AP176%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%3Fitem%20%3FitemLabel%20%28COUNT%20%28%3Fvalue%29%20AS%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%23%20instance%20of%20beer%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ44.%0A%20%20%3Fitem%20rdfs%3Alabel%20%3FitemLabel.%0A%20%20FILTER%20%28LANG%28%3FitemLabel%29%20%3D%20%22en%22%29.%0A%20%20%23%20manufacturer%0A%20%20%3Fitem%20wdt%3AP176%20%3Fvalue.%0A%7D%20GROUP%20BY%20%3Fitem%20%3FitemLabel%0A%0AHAVING%20%28COUNT%20%28%3Fvalue%29%20%3E%201%29%0AORDER%20BY%20DESC%28%3Fcount%29 try it!] | ||
+ | will be generated which reveals that there are two kinds of beers that have two manufacturers: | ||
+ | [http://www.wikidata.org/entity/Q15980473 Žatecký Gus] which is manufactured by [https://www.wikidata.org/wiki/Q4035888 Carlsberg Ukraine] and [https://www.wikidata.org/wiki/Q805734 Baltika Breweries] | ||
+ | and [http://www.wikidata.org/entity/Q789278 Balatoni Világos] which is manufactured by [https://www.wikidata.org/wiki/Q1214899 Nagykanizsai Sörgyár Rt. (until 1999)] and [https://www.wikidata.org/wiki/Q909817 Dreher Breweries] | ||
+ | |||
+ | == ✔ == | ||
+ | Check mark that the property statistics for this property have been calculated successfully. | ||
+ | |||
+ | = Aggregates = | ||
+ | == count == | ||
+ | apply SPARQL <source lang='sparql'>COUNT()</source> aggregate | ||
+ | == min == | ||
+ | apply SPARQL <source lang='sparql'>MIN()</source> aggregate | ||
+ | == max == | ||
+ | apply SPARQL <source lang='sparql'>MAX()</source> aggregate | ||
+ | == avg == | ||
+ | apply SPARQL <source lang='sparql'>AVG()</source> aggregate | ||
+ | == sample == | ||
+ | apply SPARQL <source lang='sparql'>SAMPLE()</source> aggregate | ||
+ | == list == | ||
+ | apply <source lang='sparql'>GROUP_CONCAT()</source> aggregate to avoid multiple solutions for the same instance | ||
+ | |||
+ | = ignore = | ||
+ | Ignore SPARQL solutions that have multiple values for the given property by using a <source lang='sparql'>HAVING COUNT<=1</source> aggregate condition in the generated query | ||
+ | = label = | ||
+ | Show the label of the property result in the generated SPARQL query. | ||
+ | |||
+ | = select = | ||
+ | If a property is selected it will be included in the generated SPARQL query |
Latest revision as of 05:34, 5 August 2022
This page explains the table columns being used in the Truly Tabular RDF analysis tool at http://wikidata.bitplan.com
Property columns
#
rank of the property in order of the percentage of instances where at lest one values is available for the property
%
The percentage of instances where at least one value is available for the property
pareto
The Pareto level according to the Pareto principle 80:20 (1 out of 5) as a logarithmic scale to the basis 5..
level | ratio | 1 out of |
---|---|---|
1 | 80:20 | 5 |
2 | 96:4 | 25 |
3 | 99.2:0.8 | 125 |
4 | 99.84:0.16 | 625 |
5 | 99.97:0.03 | 3125 |
6 | 99.994:0.006 | 15625 |
7 | 99.9987:0.0013 | 78125 |
8 | 99.99974:0.00026 | 390625 |
9 | 99.99995:0.00005 | 1953125 |
property
A Wikidata Property e.g. P31/instance of
propertyId
The property Identifier for a Property e.g. P31 for P31/instance of
type
a wikibase type see Supported data types
Statistics
1
number of truly tabular entries with a cardinality of 1
maxf
maximum frequency / cardinality of the property
nt
Number of non tabular entries - having a cardinality > 1
nt%
Percentage of non tabular entries.
?f
try it link to query that retrievs the frequency histogramm for this property E.g. for the property official website(P856) as queried for instances of the class Q3918 university the query used is:
# This query was generated by Truly Tabular
# Count all Q3918:university items
# with the given official website(P856) https://www.wikidata.org/wiki/Property:P856
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?count (COUNT(?count) AS ?frequency) WHERE {{
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
# instance of university
?item wdt:P31 wd:Q3918.
?item rdfs:label ?itemLabel.
FILTER (LANG(?itemLabel) = "en").
# official website
?item wdt:P856 ?value.
} GROUP BY ?item ?itemLabel
}}
GROUP BY ?count
ORDER BY DESC (?frequency)
?ex
try it! link to examples for "non-tabular" entries. E.g. for the property "manufacturer" of the class "beer" the query
# This query was generated by Truly Tabular
# Count all Q44:beer items
# with the given manufacturer(P176) https://www.wikidata.org/wiki/Property:P176
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
# instance of beer
?item wdt:P31 wd:Q44.
?item rdfs:label ?itemLabel.
FILTER (LANG(?itemLabel) = "en").
# manufacturer
?item wdt:P176 ?value.
} GROUP BY ?item ?itemLabel
HAVING (COUNT (?value) > 1)
ORDER BY DESC(?count) try it
try it! will be generated which reveals that there are two kinds of beers that have two manufacturers: Žatecký Gus which is manufactured by Carlsberg Ukraine and Baltika Breweries and Balatoni Világos which is manufactured by Nagykanizsai Sörgyár Rt. (until 1999) and Dreher Breweries
✔
Check mark that the property statistics for this property have been calculated successfully.
Aggregates
count
apply SPARQL
COUNT()
aggregate
min
apply SPARQL
MIN()
aggregate
max
apply SPARQL
MAX()
aggregate
avg
apply SPARQL
AVG()
aggregate
sample
apply SPARQL
SAMPLE()
aggregate
list
apply
GROUP_CONCAT()
aggregate to avoid multiple solutions for the same instance
ignore
Ignore SPARQL solutions that have multiple values for the given property by using a
HAVING COUNT<=1
aggregate condition in the generated query
label
Show the label of the property result in the generated SPARQL query.
select
If a property is selected it will be included in the generated SPARQL query