WikiData Import 2022-05-22

From BITPlan Wiki
Jump to navigation Jump to search

see QLever/script as discussed in QLever Issue #562 for the script which makes reproducing this attempt easier.

since the https://github.com/ad-freiburg/qlever-control now has an official "qlever" script we have rename the script that has the purpose to make the import attempts reproducible to qleverauto.

Preparations

WikiData_Import_2022-05-21#Build_code steps still apply for this attempt using the native/compiled version of qlever.


qleverauto environment checks

./qleverauto -v
qleverauto version : 1.28 $ : 2022/05/23 05:59:46 $
# some changes were done during this attempt
./qleverauto -v
qleverauto version : 1.29 $ : 2022/05/23 06:15:28 $
./qleverauto -e
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal
docker version
Docker version 20.10.16, build aa7e414
memory
              total        used        free      shared  buff/cache   available
Mem:          125Gi       5,3Gi        22Gi        43Mi        98Gi       119Gi
Swap:         2,0Gi       8,0Mi       2,0Gi
diskspace
/dev/sdb5       116G   25G   86G  23% /
tmpfs            63G   16K   63G   1% /dev/shm
/dev/sda1       3,6T  2,3T  1,2T  66% /hd/seel
/dev/sdb1       511M  4,0K  511M   1% /boot/efi
soft ulimit for files
1048576

Wikidata dump download

Dump is still quite recent from latest attempt

./qleverauto -wd
wikidata lexemes:latest-lexemes.ttl.bz2 already downloaded
wikidata dump:latest-all.ttl.bz2 already downloaded

Wikidata indexing

ulimit -n 1000000  
ulimit -a | grep '(-n)'
open files                      (-n) 1000000
nohup ./qleverauto -wi&
head nohup.out -200
creating wikidata index started at Mo 23 Mai 2022 14:49:22 CEST

Checking your PATH ...

The directory "/hd/seel/qlever/qlever-control" is already contained in your PATH
The directory "/local/data/qlever/qlever-code/build" is already contained in your PATH

Setting up bash autocompletion ...

Done, the following completions are now available:

autocompletion-warmup cache-stats cat-files clear-cache clear-cache-complete 
disk-usage docker-off docker-on download-data help help-install index 
index-stats log log-until-server-up memory-usage pin-INTERNAL rdf-files 
remove-data remove-index restart server-settings start status stop 
text-input-from-nt-literals ui update wait where

Checking Qleverfile ...

There is alreay a QLeverfile in this directory. If you want a freshly generated
basic Qleverfile, remove or move the existing one and run ". qlever" again.

Setup is complete

Type "qlever" and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
"qlever index show"). Typing "qlever" without arguments gives some basic help
and pointers for further help. Edit your local "Qleverfile" to change settings.

IndexBuilderMain → /home/wf/bin/IndexBuilderMain ✅

This is the "qlever" script, call without argument for help

Executing "index":

bzcat latest-all.ttl.bz2 latest-lexemes.ttl.bz2 | IndexBuilderMain -F ttl -K wikidata -f - -i wikidata -s wikidata.settings.json | tee wikidata.index-log.txt

2022-05-23 14:52:17.915	- INFO:  QLever IndexBuilder, compiled on May 21 2022 08:50:52
2022-05-23 14:52:17.915	- INFO:  You specified the input format: TTL
2022-05-23 14:52:17.915	- INFO:  Locale was not specified in settings file, default is en_US
2022-05-23 14:52:17.916	- INFO:  You specified "locale = en_US" and "ignore-punctuation = 0"
2022-05-23 14:52:17.916	- INFO:  You specified "num-triples-per-batch = 10,000,000", choose a lower value if the index builder runs out of memory
2022-05-23 14:52:17.916	- INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-05-23 14:52:17.916	- INFO:  Processing input triples from /dev/stdin ...
...

Progress