Difference between revisions of "WikiData Import 2022-05-21"

From BITPlan Wiki
Jump to navigation Jump to search
Line 117: Line 117:
 
</source>
 
</source>
 
= Wikidata dump download =
 
= Wikidata dump download =
<source lang='bash' highlight='1'>  
+
<source lang='bash' highlight='1'>
 
./qlever --wikidata_download
 
./qlever --wikidata_download
 
qlever-indices/wikidata already exists
 
qlever-indices/wikidata already exists

Revision as of 08:44, 21 May 2022

QLever trial

>=64 GB RAM and docker environment (e.g. Ubuntu) >1 TB diskspace (SSD preferred for speed)

./qlever -v -e
qlever version : 1.27 $ : 2022/03/16 08:54:18 $
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal
docker version
Docker version 20.10.13, build a224086
memory
              total        used        free      shared  buff/cache   available
Mem:          125Gi       1,1Gi       121Gi        31Mi       2,9Gi       123Gi
Swap:         2,0Gi          0B       2,0Gi
diskspace
/dev/sdb5       116G   23G   88G  21% /
tmpfs            63G     0   63G   0% /dev/shm
/dev/sda1       3,6T  987G  2,5T  29% /hd/seel
/dev/sdb1       511M  4,0K  511M   1% /boot/efi
soft ulimit for files
1048576

QLever clone

./qlever -c
cloning qlever - please wait typically 1 min ...
cloning qlever started at Sa 21. Mai 08:33:35 CEST 2022
Cloning into 'qlever-code'...
remote: Enumerating objects: 13828, done.
remote: Counting objects: 100% (973/973), done.
remote: Compressing objects: 100% (705/705), done.
remote: Total 13828 (delta 574), reused 451 (delta 267), pack-reused 12855
Receiving objects: 100% (13828/13828), 111.72 MiB | 6.86 MiB/s, done.
Resolving deltas: 100% (10707/10707), done.
Submodule 'third_party/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'third_party/abseil-cpp'
Submodule 'third_party/antlr4' (https://github.com/antlr/antlr4.git) registered for path 'third_party/antlr4'
Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
Submodule 'third_party/re2' (https://github.com/google/re2.git) registered for path 'third_party/re2'
Submodule 'third_party/stxxl' (https://github.com/ad-freiburg/stxxl) registered for path 'third_party/stxxl'
Cloning into '/hd/seel/qlever/qlever-code/third_party/abseil-cpp'...
remote: Enumerating objects: 16841, done.        
remote: Counting objects: 100% (149/149), done.        
remote: Compressing objects: 100% (78/78), done.        
remote: Total 16841 (delta 83), reused 112 (delta 71), pack-reused 16692        
Receiving objects: 100% (16841/16841), 10.55 MiB | 6.78 MiB/s, done.
Resolving deltas: 100% (13078/13078), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/antlr4'...
remote: Enumerating objects: 128025, done.        
remote: Counting objects: 100% (13/13), done.        
remote: Compressing objects: 100% (11/11), done.        
remote: Total 128025 (delta 3), reused 3 (delta 1), pack-reused 128012        
Receiving objects: 100% (128025/128025), 65.33 MiB | 6.76 MiB/s, done.
Resolving deltas: 100% (75484/75484), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/googletest'...
remote: Enumerating objects: 24402, done.        
remote: Counting objects: 100% (67/67), done.        
remote: Compressing objects: 100% (32/32), done.        
remote: Total 24402 (delta 31), reused 53 (delta 28), pack-reused 24335        
Receiving objects: 100% (24402/24402), 10.27 MiB | 6.87 MiB/s, done.
Resolving deltas: 100% (18049/18049), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/re2'...
remote: Enumerating objects: 7130, done.        
remote: Counting objects: 100% (961/961), done.        
remote: Compressing objects: 100% (86/86), done.        
remote: Total 7130 (delta 891), reused 878 (delta 875), pack-reused 6169        
Receiving objects: 100% (7130/7130), 3.18 MiB | 6.86 MiB/s, done.
Resolving deltas: 100% (5485/5485), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl'...
remote: Enumerating objects: 40997, done.        
remote: Counting objects: 100% (60/60), done.        
remote: Compressing objects: 100% (40/40), done.        
remote: Total 40997 (delta 22), reused 39 (delta 12), pack-reused 40937        
Receiving objects: 100% (40997/40997), 14.15 MiB | 6.80 MiB/s, done.
Resolving deltas: 100% (30921/30921), done.
Submodule path 'third_party/abseil-cpp': checked out 'b9b925341f9e90f5e7aa0cf23f036c29c7e454eb'
Submodule path 'third_party/antlr4': checked out 'e4c1a74c66bd5290364ea2b36c97cd724b247357'
Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'third_party/re2': checked out '13ebb377c6ad763ca61d12dd6f88b1126bd0b911'
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (1/1), 215 bytes | 215.00 KiB/s, done.
From https://github.com/ad-freiburg/stxxl
 * branch              4f368a8eacc965a775f208df0c2d3a0721f4bdf1 -> FETCH_HEAD
Submodule path 'third_party/stxxl': checked out '4f368a8eacc965a775f208df0c2d3a0721f4bdf1'
Submodule 'extlib/foxxll' (https://github.com/ad-freiburg/foxxll.git) registered for path 'third_party/stxxl/extlib/foxxll'
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl/extlib/foxxll'...
remote: Enumerating objects: 21414, done.        
remote: Counting objects: 100% (28/28), done.        
remote: Compressing objects: 100% (22/22), done.        
remote: Total 21414 (delta 9), reused 13 (delta 4), pack-reused 21386        
Receiving objects: 100% (21414/21414), 4.60 MiB | 2.12 MiB/s, done.
Resolving deltas: 100% (15789/15789), done.
Submodule path 'third_party/stxxl/extlib/foxxll': checked out '8cbca7bedcdb0b84a6de99e927c5fa27a4bbbfb2'
Submodule 'extlib/tlx' (https://github.com/joka921/tlx.git) registered for path 'third_party/stxxl/extlib/foxxll/extlib/tlx'
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl/extlib/foxxll/extlib/tlx'...
remote: Enumerating objects: 3418, done.        
remote: Counting objects: 100% (53/53), done.        
remote: Compressing objects: 100% (33/33), done.        
remote: Total 3418 (delta 25), reused 39 (delta 20), pack-reused 3365        
Receiving objects: 100% (3418/3418), 1.11 MiB | 6.59 MiB/s, done.
Resolving deltas: 100% (2612/2612), done.
Submodule path 'third_party/stxxl/extlib/foxxll/extlib/tlx': checked out 'ef81a598d9880cc7d242afc47de7328634f07f1d'
cloning qlever finished at Sa 21. Mai 08:34:20 CEST 2022 after 45 seconds

Wikidata dump download

./qlever --wikidata_download
qlever-indices/wikidata already exists
wikidata.settings.json already copied to qlever-indices/wikidata
downloading wikidata lexemes:latest-lexemes.ttl.bz2 ... please wait typically 3min ...
wikidata lexemes download started at Sa 21. Mai 08:38:56 CEST 2022
--2022-05-21 08:38:56--  https://dumps.wikimedia.org/wikidatawiki/entities//latest-lexemes.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 327629685 (312M) [application/octet-stream]
Saving to: ‘latest-lexemes.ttl.bz2’

latest-lexemes.ttl. 100%[===================>] 312,45M  4,17MB/s    in 77s     

2022-05-21 08:40:14 (4,08 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [327629685/327629685]

wikidata lexemes download finished at Sa 21. Mai 08:40:14 CEST 2022 after 78 seconds