Wikidata Import 2025-06-06

From BITPlan Wiki
Revision as of 19:46, 7 June 2025 by Wf (talk | contribs)
Jump to navigation Jump to search

Import

Import
edit
state  ✅
url  https://wiki.bitplan.com/index.php/Wikidata_Import_2025-06-06
target  blazegraph
start  2025-06-06
end  2025-06-07
days  1
os  Ubuntu 22.04.5 LTS
cpu  AMD Ryzen 9 5900X 12-Core Processor
ram  128
triples  
comment  seeded with 1.3 TB data.jnl file originally provided by James Hare


This "import" is not using a dump and indexing approach but directly copying a blazegraph journal file.

Steps

Copy journal file

md5sum data.jnl
6ebe0cced1a22c6cf3fecb56afcf1c10  data.jnl
blockdownload --name wikidata --blocksize 512  --boost 8 --progress https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl . 
Blocks ∅: 100%|████████████████████████████████████████████| 1.39T/1.39T [7:31:03<00:00, 51.5MB/s]
blockdownload --name wikidata --output data.jnl https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl 2025-06-05 --progress

setup wdqs environment

  1. see also README.md at https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md
git clone https://github.com/scatter-llc/private-wikidata-query wdqs
mkdir wdqs/data
mv data.jnl wdqs/data
docker compose up -d
[+] Running 28/28
 ✔ wdqs-frontend Pulled                                                                      5.4s 
 ✔ wdqs Pulled                                                                              17.2s 
 ✔ wdqs-proxy Pulled                                                                        11.7s 
[+] Running 4/4
 ✔ Network wdqs_default            Created                                                   0.1s 
 ✔ Container wdqs-wdqs-1           Started                                                   3.6s 
 ✔ Container wdqs-wdqs-proxy-1     Started                                                   0.4s 
 ✔ Container wdqs-wdqs-frontend-1  Started                                                   0.6s

Wikidata state

Returns total triple count and dateModified of the Wikidata root node

query

# show the number of triples and the timestamp of the last modification
PREFIX schema: <http://schema.org/>

SELECT
  (?count as ?tripleCount)
  ?dateModified
  (STR(?dateModified) as ?timestamp)
WHERE {
  {
    SELECT (COUNT(*) AS ?count) {
      ?s ?p ?o
    }
  }
  OPTIONAL {
    <http://www.wikidata.org> schema:dateModified ?dateModified
  }
}

try it!

result

tripleCount dateModified timestamp
16842771273 2025-06-07 17:45:08 2025-06-07T17:45:08Z