WikiData Import 2022-06-25
retry with more disk space
Context
see QLever/script as discussed in QLever Issue #562 for the script which makes reproducing this attempt easier.
see QLever Discussions for more details on this attempt series.
since the https://github.com/ad-freiburg/qlever-control now has an official "qlever" script we have rename the script that has the purpose to make the import attempts reproducible to qleverauto.
Beware of https://github.com/ad-freiburg/qlever-control/issues/4 - make sure ulimit -n is set!. This attempt had to be restarted since setting the value within a script did not work.
Preparations
see WikiData_Import_2022-06-24#Preparations
Wikidata data download
df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 3844660232 97397140 3551895876 3% /hd/seel
./qleverauto -wd
downloading wikidata lexemes:latest-lexemes.ttl.bz2 ... please wait typically 3min ...
wikidata lexemes download started at Sa 25. Jun 19:44:32 CEST 2022
--2022-06-25 19:44:32-- https://dumps.wikimedia.org/wikidatawiki/entities//latest-lexemes.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 329141448 (314M) [application/octet-stream]
Saving to: ‘latest-lexemes.ttl.bz2’
latest-lexemes.ttl.bz2 10%[===> ] 34,29M 4,60MB/s eta 64s
...
latest-lexemes.ttl.bz2 100%[===========================================>] 313,89M 4,56MB/s in 70s
2022-06-25 19:45:43 (4,49 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [329141448/329141448]
wikidata lexemes download finished at Sa 25. Jun 19:45:43 CEST 2022 after 71 seconds
downloading wikidata dump:latest-all.ttl.bz2 ... please wait typically 6hours ...
wikidata dump download started at Sa 25. Jun 19:45:43 CEST 2022
--2022-06-25 19:45:43-- https://dumps.wikimedia.org/wikidatawiki/entities//latest-all.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 95427655992 (89G) [application/octet-stream]
Saving to: ‘latest-all.ttl.bz2’
latest-all.ttl.bz2 0%[ ] 50,66M 4,42MB/s eta 5h 38m
...
latest-all.ttl.bz2 100%[===========================================>] 88,87G 5,01MB/s in 5h 38m
2022-06-26 01:24:35 (4,48 MB/s) - ‘latest-all.ttl.bz2’ saved [95427655992/95427655992]
wikidata dump download finished at So 26. Jun 01:24:35 CEST 2022 after 20332 seconds
qleverauto environment checks
./qleverauto -v
qleverauto version : 1.29 $ : 2022/05/23 06:15:28 $
./qleverauto -e
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
docker version
Docker version 20.10.16, build aa7e414
memory
total used free shared buff/cache available
Mem: 125Gi 1,8Gi 30Gi 27Mi 93Gi 122Gi
Swap: 2,0Gi 57Mi 1,9Gi
diskspace
/dev/sdb5 116G 25G 86G 23% /
tmpfs 63G 16K 63G 1% /dev/shm
/dev/sda1 3,6T 183G 3,3T 6% /hd/seel
/dev/sdb1 511M 4,0K 511M 1% /boot/efi
soft ulimit for files
1048576