Difference between revisions of "Wikidata Import 2025-06-06"

From BITPlan Wiki
Jump to navigation Jump to search
(Created page with "{{PageSequence|prev=Wikidata Import 2025-06-02|next=|category=Wikidata|categoryIcon=cloud-download}} =Import= {{Import |state=✅ |url=https://wiki.bitplan.com/index.php/Wi...")
 
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{PageSequence|prev=Wikidata Import 2025-06-02|next=|category=Wikidata|categoryIcon=cloud-download}}
+
{{PageSequence|prev=Wikidata Import 2025-06-02|next=Wikidata Import 2025-06-07|category=Wikidata|categoryIcon=cloud-download}}
  
 
=Import=
 
=Import=
Line 8: Line 8:
 
|target=blazegraph
 
|target=blazegraph
 
|start=2025-06-06
 
|start=2025-06-06
|end=
+
|end=2025-06-07
|days=
+
|days=1
|os=Ubuntu 22.04.3 LTS
+
|os=Ubuntu 22.04.5 LTS
|cpu=
+
|cpu=AMD Ryzen 9 5900X 12-Core Processor
 
|ram=128
 
|ram=128
 
|storemode=property
 
|storemode=property
Line 23: Line 23:
  
 
== Copy journal file ==
 
== Copy journal file ==
 +
<source lang='bash' highlight='1,3'>
 +
md5sum data.jnl
 +
6ebe0cced1a22c6cf3fecb56afcf1c10  data.jnl
 +
blockdownload --name wikidata --blocksize 512  --boost 8 --progress https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl .
 +
Blocks ∅: 100%|████████████████████████████████████████████| 1.39T/1.39T [7:31:03<00:00, 51.5MB/s]
 +
blockdownload --name wikidata --output data.jnl https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl 2025-06-05 --progress
 +
</source>
 +
== setup wdqs environment ==
 +
# see also  [https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md README.md] at https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md
 +
<source lang='bash' highlight='1-4'>
 +
git clone https://github.com/scatter-llc/private-wikidata-query wdqs
 +
mkdir wdqs/data
 +
mv data.jnl wdqs/data
 +
docker compose up -d
 +
[+] Running 28/28
 +
✔ wdqs-frontend Pulled                                                                      5.4s
 +
✔ wdqs Pulled                                                                              17.2s
 +
✔ wdqs-proxy Pulled                                                                        11.7s
 +
[+] Running 4/4
 +
✔ Network wdqs_default            Created                                                  0.1s
 +
✔ Container wdqs-wdqs-1          Started                                                  3.6s
 +
✔ Container wdqs-wdqs-proxy-1    Started                                                  0.4s
 +
✔ Container wdqs-wdqs-frontend-1  Started                                                  0.6s
 +
</source>
 +
 +
== Wikidata state ==
 +
Returns total triple count and dateModified of the Wikidata root node
 +
=== query ===
 +
<source lang='sparql'>
 +
# show the number of triples and the timestamp of the last modification
 +
PREFIX schema: <http://schema.org/>
 +
 +
SELECT
 +
  (?count as ?tripleCount)
 +
  ?dateModified
 +
  (STR(?dateModified) as ?timestamp)
 +
WHERE {
 +
  {
 +
    SELECT (COUNT(*) AS ?count) {
 +
      ?s ?p ?o
 +
    }
 +
  }
 +
  OPTIONAL {
 +
    <http://www.wikidata.org> schema:dateModified ?dateModified
 +
  }
 +
}
 +
 +
</source>
 +
 +
[https://wdqs.wikidata.dbis.rwth-aachen.de/#%23%20show%20the%20number%20of%20triples%20and%20the%20timestamp%20of%20the%20last%20modification%0APREFIX%20schema%3A%20%3Chttp%3A//schema.org/%3E%0A%0ASELECT%0A%20%20%28%3Fcount%20as%20%3FtripleCount%29%0A%20%20%3FdateModified%0A%20%20%28STR%28%3FdateModified%29%20as%20%3Ftimestamp%29%0AWHERE%20%7B%0A%20%20%7B%0A%20%20%20%20SELECT%20%28COUNT%28%2A%29%20AS%20%3Fcount%29%20%7B%0A%20%20%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%20%20%7D%0A%20%20%7D%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Chttp%3A//www.wikidata.org%3E%20schema%3AdateModified%20%3FdateModified%0A%20%20%7D%0A%7D%0A try it!]
 +
=== result ===
 +
{| class="wikitable" style="text-align: left;"
 +
|+ <!-- caption -->
 +
|-
 +
! align="right"|  tripleCount !! dateModified        !! timestamp
 +
|-
 +
| align="right"|  16842771273 || 2025-06-07 17:45:08 || 2025-06-07T17:45:08Z
 +
|}

Latest revision as of 19:55, 7 June 2025

Import

Import
edit
state  ✅
url  https://wiki.bitplan.com/index.php/Wikidata_Import_2025-06-06
target  blazegraph
start  2025-06-06
end  2025-06-07
days  1
os  Ubuntu 22.04.5 LTS
cpu  AMD Ryzen 9 5900X 12-Core Processor
ram  128
triples  
comment  seeded with 1.3 TB data.jnl file originally provided by James Hare


This "import" is not using a dump and indexing approach but directly copying a blazegraph journal file.

Steps

Copy journal file

md5sum data.jnl
6ebe0cced1a22c6cf3fecb56afcf1c10  data.jnl
blockdownload --name wikidata --blocksize 512  --boost 8 --progress https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl . 
Blocks ∅: 100%|████████████████████████████████████████████| 1.39T/1.39T [7:31:03<00:00, 51.5MB/s]
blockdownload --name wikidata --output data.jnl https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl 2025-06-05 --progress

setup wdqs environment

  1. see also README.md at https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md
git clone https://github.com/scatter-llc/private-wikidata-query wdqs
mkdir wdqs/data
mv data.jnl wdqs/data
docker compose up -d
[+] Running 28/28
 ✔ wdqs-frontend Pulled                                                                      5.4s 
 ✔ wdqs Pulled                                                                              17.2s 
 ✔ wdqs-proxy Pulled                                                                        11.7s 
[+] Running 4/4
 ✔ Network wdqs_default            Created                                                   0.1s 
 ✔ Container wdqs-wdqs-1           Started                                                   3.6s 
 ✔ Container wdqs-wdqs-proxy-1     Started                                                   0.4s 
 ✔ Container wdqs-wdqs-frontend-1  Started                                                   0.6s

Wikidata state

Returns total triple count and dateModified of the Wikidata root node

query

# show the number of triples and the timestamp of the last modification
PREFIX schema: <http://schema.org/>

SELECT
  (?count as ?tripleCount)
  ?dateModified
  (STR(?dateModified) as ?timestamp)
WHERE {
  {
    SELECT (COUNT(*) AS ?count) {
      ?s ?p ?o
    }
  }
  OPTIONAL {
    <http://www.wikidata.org> schema:dateModified ?dateModified
  }
}

try it!

result

tripleCount dateModified timestamp
16842771273 2025-06-07 17:45:08 2025-06-07T17:45:08Z