Wikidata Import 2023-01-24

From BITPlan Wiki
Jump to navigation Jump to search

Download latest wikidata Dump ~10 hours

https://dumps.wikimedia.org/wikidatawiki/entities

latest-all.json.bz2                                18-Jan-2023 17:40         79779054481
latest-all.json.gz                                 18-Jan-2023 10:51        121027823223
latest-all.nt.bz2                                  19-Jan-2023 17:00        155239026614
latest-all.nt.gz                                   18-Jan-2023 23:55        200917826250
latest-all.ttl.bz2                                 19-Jan-2023 04:34         99583991786
latest-all.ttl.gz                                  18-Jan-2023 19:25        121477047220
latest-lexemes.json.bz2                            18-Jan-2023 03:47           270280878
latest-lexemes.json.gz                             18-Jan-2023 03:46           369955852
latest-lexemes.nt.bz2                              20-Jan-2023 23:32           717929951
latest-lexemes.nt.gz                               20-Jan-2023 23:27           947996669
latest-lexemes.ttl.bz2                             20-Jan-2023 23:28           402494804
latest-lexemes.ttl.gz                              20-Jan-2023 23:25           503140103
latest-truthy.nt.bz2                               20-Jan-2023 19:30         35434201681
latest-truthy.nt.gz                                20-Jan-2023 16:19         58740712185
wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.nt.bz2
--2023-01-25 11:41:45--  https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.nt.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 208.80.154.142, 2620:0:861:2:208:80:154:142
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 717929951 (685M) [application/octet-stream]
Saving to: ‘latest-lexemes.nt.bz2’

...

sudo nohup wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2&
tail -f nohup.out
--2023-01-24 10:33:23--  https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 208.80.154.142, 2620:0:861:2:208:80:154:142
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 155239026614 (145G) [application/octet-stream]
Saving to: ‘latest-all.nt.bz2’

        0K .......... .......... .......... .......... ..........  0%  337K 5d4h
       50K .......... .......... .......... .......... ..........  0%  245K 6d4h
      100K .......... .......... .......... .......... ..........  0%  425K 5d11h
  ...
     1000K .......... .......... .......... .......... ..........  0% 58.4M 47h38m
  ...
    10000K .......... .......... .......... .......... ..........  0% 3.66M 11h20m
  ...
   100000K .......... .......... .......... .......... ..........  0% 23.6M 12h37m
  ...
  1000000K .......... .......... .......... .......... ..........  0% 3.40M 9h1m
  ...
 10000000K .......... .......... .......... .......... ..........  6% 3.05M 8h39m
...
100000000K .......... .......... .......... .......... .......... 65% 87.7M 3h18m
...
130000000K .......... .......... .......... .......... .......... 85%  101M 82m47s
...
140000000K .......... .......... .......... .......... .......... 92% 9.05M 44m30s
...
150000000K .......... .......... .......... .......... .......... 98% 3.94M 6m8s
...
151600600K .......... .                                          100%  623K=9h41m

2023-01-24 20:14:37 (4.25 MB/s) - ‘latest-all.nt.bz2’ saved [155239026614/155239026614]

unzip

bunzip2 with nohup does not work properly without being careful in its use!

nohup bunzip2 latest-all.nt.bz2 &
bunzip2: Control-C or similar caught, quitting.
bunzip2: Deleting output file latest-all.nt, if it exists.

see

redirect bunzip output and show progress

progress script

cat progress 
#!/bin/bash
# show decompression progress
while :
do
  date
  du -sm *.nt
  sleep 10
done

uncompress

nohup bunzip2 latest-all.nt.bz2  &> /dev/null &
./progress 
Wed Jan 25 06:15:32 AM CET 2023
5714	latest-all.nt
...
Wed Jan 25 07:36:32 AM CET 2023
300255	latest-all.nt

QLever control

https://github.com/ad-freiburg/qlever-control

mkdir qlever
cd qlever
git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 426, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (170/170), done.
remote: Total 426 (delta 108), reused 231 (delta 95), pack-reused 160
Receiving objects: 100% (426/426), 131.00 KiB | 585.00 KiB/s, done.
Resolving deltas: 100% (163/163), done.

setup wikidata

mkdir wikidata
cd wikidata/
. ../qlever-control/qlever wikidata

QLEVER CONFIG

Checking your PATH ...
The directory "/hd/eneco/qlever/qlever-control" is already contained in your PATH

Setting up bash autocompletion ...
Done, number of completions: 35

Creating new Qleverfile ...
Copied pre-configured Qleverfile for "wikidata" into current directory.

Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:

qlever get-data
qlever index
qlever start
qlever example-query