Difference between revisions of "WikiData Import 2022-01-29"
Jump to navigation
Jump to search
(→Steps) |
|||
Line 169: | Line 169: | ||
</source> | </source> | ||
+ | === Indexing === | ||
+ | <source lang='bash' highlight='1'> | ||
+ | ./qlever --wikidata_index | ||
+ | </source> | ||
+ | |||
== Issues == | == Issues == | ||
see | see |
Revision as of 08:44, 7 February 2022
QLever trial
see https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md
see QLever/script as discussed in QLever Issue #562 for the script which makes reproducing this attempt easier.
Environment
Operating System
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
docker version
docker --version
Docker version 19.03.13, build 4484c46d9d
Memory
wf@merkur:~$ free -h
total used free shared buff/cache available
Mem: 62Gi 1,3Gi 60Gi 30Mi 1,4Gi 60Gi
Swap: 18Gi 0B 18Gi
diskspace
vendor | model | size | days | years | ----------------------+--------------------+-----------+-------+-------+ ST6000VX0023-2EF110 | | 6,00TB | 300 | 0.8 | SanDisk | SSD PLUS 120GB | 120GB | 237 | 0.6 | WDC | WD20EARS-00MVWB1 | 2,00TB | 2232 | 6.1 | ST8000DM004-2U9188 | | 8,00TB | 8 | 0.0 | Model Family: Seagate Barracuda Compute Device Model: ST8000DM004-2U9188 Firmware Version: 0001 User Capacity: 8.001.563.222.016 bytes [8,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 3.5 inches
Steps
The steps below maybe automated with the QLever/script
git clone (1 min)
export QLEVER_HOME=$(pwd)
date;git clone --recursive https://github.com/ad-freiburg/qlever qlever-code;date
Sa 29. Jan 17:12:45 CET 2022
Cloning into 'qlever-code'...
remote: Enumerating objects: 12917, done.
remote: Counting objects: 100% (622/622), done.
remote: Compressing objects: 100% (514/514), done.
remote: Total 12917 (delta 382), reused 206 (delta 103), pack-reused 12295
Receiving objects: 100% (12917/12917), 111.10 MiB | 6.81 MiB/s, done.
Resolving deltas: 100% (10067/10067), done.
Submodule 'third_party/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'third_party/abseil-cpp'
Submodule 'third_party/antlr4' (https://github.com/antlr/antlr4.git) registered for path 'third_party/antlr4'
Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
Submodule 'third_party/re2' (https://github.com/google/re2.git) registered for path 'third_party/re2'
Submodule 'third_party/stxxl' (https://github.com/ad-freiburg/stxxl) registered for path 'third_party/stxxl'
Cloning into '/hd/jurob/qlever/qlever-code/third_party/abseil-cpp'...
remote: Enumerating objects: 16122, done.
remote: Counting objects: 100% (248/248), done.
remote: Compressing objects: 100% (193/193), done.
remote: Total 16122 (delta 116), reused 106 (delta 55), pack-reused 15874
Receiving objects: 100% (16122/16122), 10.39 MiB | 6.85 MiB/s, done.
Resolving deltas: 100% (12398/12398), done.
Cloning into '/hd/jurob/qlever/qlever-code/third_party/antlr4'...
remote: Enumerating objects: 125326, done.
remote: Counting objects: 100% (2734/2734), done.
remote: Compressing objects: 100% (1022/1022), done.
remote: Total 125326 (delta 1488), reused 2292 (delta 1246), pack-reused 122592
Receiving objects: 100% (125326/125326), 64.11 MiB | 6.72 MiB/s, done.
Resolving deltas: 100% (73647/73647), done.
Cloning into '/hd/jurob/qlever/qlever-code/third_party/googletest'...
remote: Enumerating objects: 23795, done.
remote: Counting objects: 100% (259/259), done.
remote: Compressing objects: 100% (146/146), done.
remote: Total 23795 (delta 134), reused 200 (delta 105), pack-reused 23536
Receiving objects: 100% (23795/23795), 9.90 MiB | 6.89 MiB/s, done.
Resolving deltas: 100% (17517/17517), done.
Cloning into '/hd/jurob/qlever/qlever-code/third_party/re2'...
remote: Enumerating objects: 6996, done.
remote: Counting objects: 100% (617/617), done.
remote: Compressing objects: 100% (388/388), done.
remote: Total 6996 (delta 392), reused 399 (delta 223), pack-reused 6379
Receiving objects: 100% (6996/6996), 3.49 MiB | 6.94 MiB/s, done.
Resolving deltas: 100% (5287/5287), done.
Cloning into '/hd/jurob/qlever/qlever-code/third_party/stxxl'...
remote: Enumerating objects: 40982, done.
remote: Counting objects: 100% (45/45), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 40982 (delta 15), reused 28 (delta 7), pack-reused 40937
Receiving objects: 100% (40982/40982), 14.13 MiB | 6.77 MiB/s, done.
Resolving deltas: 100% (30914/30914), done.
Submodule path 'third_party/abseil-cpp': checked out 'b9b925341f9e90f5e7aa0cf23f036c29c7e454eb'
Submodule path 'third_party/antlr4': checked out 'e4c1a74c66bd5290364ea2b36c97cd724b247357'
Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'third_party/re2': checked out '13ebb377c6ad763ca61d12dd6f88b1126bd0b911'
Submodule path 'third_party/stxxl': checked out 'a4f884f2a2b4ea078c34c48e1e9a0003f4619f00'
Submodule 'extlib/foxxll' (https://github.com/ad-freiburg/foxxll.git) registered for path 'third_party/stxxl/extlib/foxxll'
Cloning into '/hd/jurob/qlever/qlever-code/third_party/stxxl/extlib/foxxll'...
remote: Enumerating objects: 21414, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 21414 (delta 9), reused 13 (delta 4), pack-reused 21386
Receiving objects: 100% (21414/21414), 4.60 MiB | 2.48 MiB/s, done.
Resolving deltas: 100% (15789/15789), done.
Submodule path 'third_party/stxxl/extlib/foxxll': checked out '8cbca7bedcdb0b84a6de99e927c5fa27a4bbbfb2'
Submodule 'extlib/tlx' (https://github.com/joka921/tlx.git) registered for path 'third_party/stxxl/extlib/foxxll/extlib/tlx'
Cloning into '/hd/jurob/qlever/qlever-code/third_party/stxxl/extlib/foxxll/extlib/tlx'...
remote: Enumerating objects: 3418, done.
remote: Counting objects: 100% (53/53), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 3418 (delta 24), reused 22 (delta 9), pack-reused 3365
Receiving objects: 100% (3418/3418), 1.11 MiB | 2.75 MiB/s, done.
Resolving deltas: 100% (2611/2611), done.
Submodule path 'third_party/stxxl/extlib/foxxll/extlib/tlx': checked out 'ef81a598d9880cc7d242afc47de7328634f07f1d'
Sa 29. Jan 17:13:49 CET 2022
docker build (15 mins)
cd qlever-code/
wf@merkur:/hd/jurob/qlever/qlever-code$ date;sudo docker build --file Dockerfiles/Dockerfile.Ubuntu20.04 -t qlever .;date
Sa 29. Jan 17:14:51 CET 2022
Sending build context to Docker daemon 126.1MB
Step 1/43 : FROM ubuntu:20.04 as base
...
Successfully tagged qlever:latest
Sa 29. Jan 17:29:27 CET 2022
Wikidata dump Download (6h)
This had been done on another machine a few days earlier.
date;wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2;date
Thu Jan 27 11:08:10 CET 2022
--2022-01-27 11:08:10-- https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620::861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620::861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 92422961847 (86G) [application/octet-stream]
Saving to: ‘latest-all.ttl.bz2’
latest-all.ttl.bz2 100%[===================>] 86.08G 4.60MB/s in 5h 46m
2022-01-27 16:55:06 (4.23 MB/s) - ‘latest-all.ttl.bz2’ saved [92422961847/92422961847]
Thu Jan 27 16:55:06 CET 2022
date;wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2;date
Thu Jan 27 17:36:38 CET 2022
--2022-01-27 17:36:38-- https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620::861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620::861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 315591211 (301M) [application/octet-stream]
Saving to: ‘latest-lexemes.ttl.bz2’
latest-lexemes.ttl. 100%[===================>] 300.97M 4.95MB/s in 60s
2022-01-27 17:37:39 (5.00 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [315591211/315591211]
Thu Jan 27 17:37:39 CET 2022
Indexing
./qlever --wikidata_index
Issues
see