Truly Tabular RDF/Info

From BITPlan Wiki
Jump to navigation Jump to search

This page explains the table columns being used in the Truly Tabular RDF analysis tool at http://wikidata.bitplan.com

Property columns

#

rank of the property in order of the percentage of instances where at lest one values is available for the property

%

The percentage of instances where at least one value is available for the property

pareto

The Pareto level according to the Pareto principle 80:20 (1 out of 5) as a logarithmic scale to the basis 5..

Pareto levels
level ratio 1 out of
1 80:20 5
2 96:4 25
3 99.2:0.8 125
4 99.84:0.16 625
5 99.97:0.03 3125
6 99.994:0.006 15625
7 99.9987:0.0013 78125
8 99.99974:0.00026 390625
9 99.99995:0.00005 1953125

property

A Wikidata Property e.g. P31/instance of

propertyId

The property Identifier for a Property e.g. P31 for P31/instance of

type

a wikibase type see Supported data types

Statistics

1

number of truly tabular entries with a cardinality of 1

maxf

maximum frequency / cardinality of the property

nt

Number of non tabular entries - having a cardinality > 1

nt%

Percentage of non tabular entries.

?f

try it link to query that retrievs the frequency histogramm for this property E.g. for the property official website(P856) as queried for instances of the class Q3918 university the query used is:

# This query was generated by Truly Tabular
# Count all Q3918:university items
# with the given official website(P856) https://www.wikidata.org/wiki/Property:P856 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?count (COUNT(?count) AS ?frequency) WHERE {{
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
  # instance of university
  ?item wdt:P31 wd:Q3918.
  ?item rdfs:label ?itemLabel.
  FILTER (LANG(?itemLabel) = "en").
  # official website
  ?item wdt:P856 ?value.
} GROUP BY ?item ?itemLabel

}}
GROUP BY ?count
ORDER BY DESC (?frequency)

try it!

?ex

try it! link to examples for "non-tabular" entries. E.g. for the property "manufacturer" of the class "beer" the query

# This query was generated by Truly Tabular
# Count all Q44:beer items
# with the given manufacturer(P176) https://www.wikidata.org/wiki/Property:P176 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
  # instance of beer
  ?item wdt:P31 wd:Q44.
  ?item rdfs:label ?itemLabel.
  FILTER (LANG(?itemLabel) = "en").
  # manufacturer
  ?item wdt:P176 ?value.
} GROUP BY ?item ?itemLabel

HAVING (COUNT (?value) > 1)
ORDER BY DESC(?count) try it

try it! will be generated which reveals that there are two kinds of beers that have two manufacturers: Žatecký Gus which is manufactured by Carlsberg Ukraine and Baltika Breweries and Balatoni Világos which is manufactured by Nagykanizsai Sörgyár Rt. (until 1999) and Dreher Breweries

Check mark that the property statistics for this property have been calculated successfully.

Aggregates

count

apply SPARQL

COUNT()

aggregate

min

apply SPARQL

MIN()

aggregate

max

apply SPARQL

MAX()

aggregate

avg

apply SPARQL

AVG()

aggregate

sample

apply SPARQL

SAMPLE()

aggregate

list

apply

GROUP_CONCAT()

aggregate to avoid multiple solutions for the same instance

ignore

Ignore SPARQL solutions that have multiple values for the given property by using a

HAVING COUNT<=1

aggregate condition in the generated query

label

Show the label of the property result in the generated SPARQL query.

select

If a property is selected it will be included in the generated SPARQL query