Difference between revisions of "Wikidata Import 2025-06-06"
Jump to navigation
Jump to search
(Created page with "{{PageSequence|prev=Wikidata Import 2025-06-02|next=|category=Wikidata|categoryIcon=cloud-download}} =Import= {{Import |state=✅ |url=https://wiki.bitplan.com/index.php/Wi...") |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | {{PageSequence|prev=Wikidata Import 2025-06-02|next=|category=Wikidata|categoryIcon=cloud-download}} | + | {{PageSequence|prev=Wikidata Import 2025-06-02|next=Wikidata Import 2025-06-07|category=Wikidata|categoryIcon=cloud-download}} |
=Import= | =Import= | ||
Line 8: | Line 8: | ||
|target=blazegraph | |target=blazegraph | ||
|start=2025-06-06 | |start=2025-06-06 | ||
− | |end= | + | |end=2025-06-07 |
− | |days= | + | |days=1 |
− | |os=Ubuntu 22.04. | + | |os=Ubuntu 22.04.5 LTS |
− | |cpu= | + | |cpu=AMD Ryzen 9 5900X 12-Core Processor |
|ram=128 | |ram=128 | ||
|storemode=property | |storemode=property | ||
Line 23: | Line 23: | ||
== Copy journal file == | == Copy journal file == | ||
+ | <source lang='bash' highlight='1,3'> | ||
+ | md5sum data.jnl | ||
+ | 6ebe0cced1a22c6cf3fecb56afcf1c10 data.jnl | ||
+ | blockdownload --name wikidata --blocksize 512 --boost 8 --progress https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl . | ||
+ | Blocks ∅: 100%|████████████████████████████████████████████| 1.39T/1.39T [7:31:03<00:00, 51.5MB/s] | ||
+ | blockdownload --name wikidata --output data.jnl https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl 2025-06-05 --progress | ||
+ | </source> | ||
+ | == setup wdqs environment == | ||
+ | # see also [https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md README.md] at https://github.com/scatter-llc/private-wikidata-query/blob/main/README.md | ||
+ | <source lang='bash' highlight='1-4'> | ||
+ | git clone https://github.com/scatter-llc/private-wikidata-query wdqs | ||
+ | mkdir wdqs/data | ||
+ | mv data.jnl wdqs/data | ||
+ | docker compose up -d | ||
+ | [+] Running 28/28 | ||
+ | ✔ wdqs-frontend Pulled 5.4s | ||
+ | ✔ wdqs Pulled 17.2s | ||
+ | ✔ wdqs-proxy Pulled 11.7s | ||
+ | [+] Running 4/4 | ||
+ | ✔ Network wdqs_default Created 0.1s | ||
+ | ✔ Container wdqs-wdqs-1 Started 3.6s | ||
+ | ✔ Container wdqs-wdqs-proxy-1 Started 0.4s | ||
+ | ✔ Container wdqs-wdqs-frontend-1 Started 0.6s | ||
+ | </source> | ||
+ | |||
+ | == Wikidata state == | ||
+ | Returns total triple count and dateModified of the Wikidata root node | ||
+ | === query === | ||
+ | <source lang='sparql'> | ||
+ | # show the number of triples and the timestamp of the last modification | ||
+ | PREFIX schema: <http://schema.org/> | ||
+ | |||
+ | SELECT | ||
+ | (?count as ?tripleCount) | ||
+ | ?dateModified | ||
+ | (STR(?dateModified) as ?timestamp) | ||
+ | WHERE { | ||
+ | { | ||
+ | SELECT (COUNT(*) AS ?count) { | ||
+ | ?s ?p ?o | ||
+ | } | ||
+ | } | ||
+ | OPTIONAL { | ||
+ | <http://www.wikidata.org> schema:dateModified ?dateModified | ||
+ | } | ||
+ | } | ||
+ | |||
+ | </source> | ||
+ | |||
+ | [https://wdqs.wikidata.dbis.rwth-aachen.de/#%23%20show%20the%20number%20of%20triples%20and%20the%20timestamp%20of%20the%20last%20modification%0APREFIX%20schema%3A%20%3Chttp%3A//schema.org/%3E%0A%0ASELECT%0A%20%20%28%3Fcount%20as%20%3FtripleCount%29%0A%20%20%3FdateModified%0A%20%20%28STR%28%3FdateModified%29%20as%20%3Ftimestamp%29%0AWHERE%20%7B%0A%20%20%7B%0A%20%20%20%20SELECT%20%28COUNT%28%2A%29%20AS%20%3Fcount%29%20%7B%0A%20%20%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%20%20%7D%0A%20%20%7D%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Chttp%3A//www.wikidata.org%3E%20schema%3AdateModified%20%3FdateModified%0A%20%20%7D%0A%7D%0A try it!] | ||
+ | === result === | ||
+ | {| class="wikitable" style="text-align: left;" | ||
+ | |+ <!-- caption --> | ||
+ | |- | ||
+ | ! align="right"| tripleCount !! dateModified !! timestamp | ||
+ | |- | ||
+ | | align="right"| 16842771273 || 2025-06-07 17:45:08 || 2025-06-07T17:45:08Z | ||
+ | |} |
Latest revision as of 19:55, 7 June 2025
Import
Import | |
---|---|
state | ✅ |
url | https://wiki.bitplan.com/index.php/Wikidata_Import_2025-06-06 |
target | blazegraph |
start | 2025-06-06 |
end | 2025-06-07 |
days | 1 |
os | Ubuntu 22.04.5 LTS |
cpu | AMD Ryzen 9 5900X 12-Core Processor |
ram | 128 |
triples | |
comment | seeded with 1.3 TB data.jnl file originally provided by James Hare |
This "import" is not using a dump and indexing approach but directly copying a blazegraph journal file.
Steps
Copy journal file
md5sum data.jnl
6ebe0cced1a22c6cf3fecb56afcf1c10 data.jnl
blockdownload --name wikidata --blocksize 512 --boost 8 --progress https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl .
Blocks ∅: 100%|████████████████████████████████████████████| 1.39T/1.39T [7:31:03<00:00, 51.5MB/s]
blockdownload --name wikidata --output data.jnl https://wikidata-dump.wikidata.dbis.rwth-aachen.de/data.jnl 2025-06-05 --progress
setup wdqs environment
git clone https://github.com/scatter-llc/private-wikidata-query wdqs
mkdir wdqs/data
mv data.jnl wdqs/data
docker compose up -d
[+] Running 28/28
✔ wdqs-frontend Pulled 5.4s
✔ wdqs Pulled 17.2s
✔ wdqs-proxy Pulled 11.7s
[+] Running 4/4
✔ Network wdqs_default Created 0.1s
✔ Container wdqs-wdqs-1 Started 3.6s
✔ Container wdqs-wdqs-proxy-1 Started 0.4s
✔ Container wdqs-wdqs-frontend-1 Started 0.6s
Wikidata state
Returns total triple count and dateModified of the Wikidata root node
query
# show the number of triples and the timestamp of the last modification
PREFIX schema: <http://schema.org/>
SELECT
(?count as ?tripleCount)
?dateModified
(STR(?dateModified) as ?timestamp)
WHERE {
{
SELECT (COUNT(*) AS ?count) {
?s ?p ?o
}
}
OPTIONAL {
<http://www.wikidata.org> schema:dateModified ?dateModified
}
}
result
tripleCount | dateModified | timestamp |
---|---|---|
16842771273 | 2025-06-07 17:45:08 | 2025-06-07T17:45:08Z |