Difference between revisions of "WikiData Import 2022-03-16"

From BITPlan Wiki
Jump to navigation Jump to search
(Created page with "{{PageSequence|prev=WikiData Import 2022-03-11|category=WikiData}}")
 
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{PageSequence|prev=WikiData Import 2022-03-11|category=WikiData}}
+
{{PageSequence|prev=WikiData Import 2022-03-11|next=WikiData Import 2022-05-21|category=Wikidata|categoryIcon=cloud-download}}
 +
= QLever trial =
 +
❌ This attempt failed see https://github.com/ad-freiburg/qlever/issues/636
 +
see https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md
 +
 
 +
see {{Link|target=QLever/script}} as discussed in [https://github.com/ad-freiburg/qlever/issues/562 QLever Issue #562] for the script which makes reproducing this attempt easier.
 +
 
 +
= Environment/prerequisites =
 +
>=64 GB RAM and docker environment (e.g. Ubuntu)
 +
>1 TB diskspace (SSD preferred for speed)
 +
<source lang='bash' highlight='1'>
 +
./qlever -v -e
 +
qlever version : 1.27 $ : 2022/03/16 08:54:18 $
 +
needed software
 +
docker → /usr/bin/docker ✅
 +
top → /usr/bin/top ✅
 +
df → /usr/bin/df ✅
 +
jq → /usr/bin/jq ✅
 +
lsb_release → /usr/bin/lsb_release ✅
 +
free → /usr/bin/free ✅
 +
operating system
 +
No LSB modules are available.
 +
Distributor ID: Ubuntu
 +
Description: Ubuntu 20.04.4 LTS
 +
Release: 20.04
 +
Codename: focal
 +
docker version
 +
Docker version 20.10.13, build a224086
 +
memory
 +
              total        used        free      shared  buff/cache  available
 +
Mem:          125Gi      1,1Gi      121Gi        31Mi      2,9Gi      123Gi
 +
Swap:        2,0Gi          0B      2,0Gi
 +
diskspace
 +
/dev/sdb5      116G  23G  88G  21% /
 +
tmpfs            63G    0  63G  0% /dev/shm
 +
/dev/sda1      3,6T  987G  2,5T  29% /hd/seel
 +
/dev/sdb1      511M  4,0K  511M  1% /boot/efi
 +
soft ulimit for files
 +
1048576
 +
</source>
 +
= Wikidata dump download =
 +
<source lang='bash' highlight='1'>
 +
./qlever --wikidata_download
 +
qlever-indices/wikidata already exists
 +
wikidata.settings.json already copied to qlever-indices/wikidata
 +
downloading wikidata lexemes:latest-lexemes.ttl.bz2 ... please wait typically 3min ...
 +
wikidata lexemes download started at Mi 16. Mär 09:55:07 CET 2022
 +
--2022-03-16 09:55:07--  https://dumps.wikimedia.org/wikidatawiki/entities//latest-lexemes.ttl.bz2
 +
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
 +
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
 +
HTTP request sent, awaiting response... 200 OK
 +
Length: 319665811 (305M) [application/octet-stream]
 +
Saving to: ‘latest-lexemes.ttl.bz2’
 +
 
 +
latest-lexemes.ttl.bz2                    100%[========================================================================================>] 304,86M  4,41MB/s    in 70s   
 +
 
 +
2022-03-16 09:56:17 (4,37 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [319665811/319665811]
 +
 
 +
wikidata lexemes download finished at Mi 16. Mär 09:56:17 CET 2022 after 70 seconds
 +
downloading wikidata dump:latest-all.ttl.bz2 ... please wait typically 6hours ...
 +
wikidata dump download started at Mi 16. Mär 09:56:17 CET 2022
 +
--2022-03-16 09:56:17--  https://dumps.wikimedia.org/wikidatawiki/entities//latest-all.ttl.bz2
 +
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
 +
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
 +
HTTP request sent, awaiting response... 200 OK
 +
Length: 93072933618 (87G) [application/octet-stream]
 +
Saving to: ‘latest-all.ttl.bz2’
 +
 
 +
latest-all.ttl.bz2                          1%[>                                                                                        ]  1,02G  4,08MB/s    eta 6h 0m
 +
2022-03-16 15:39:52 (4,31 MB/s) - ‘latest-all.ttl.bz2’ saved [93072933618/93072933618]
 +
 
 +
wikidata dump download finished at Mi 16. Mär 15:39:52 CET 2022 after 20615 seconds
 +
</source>
 +
= Native approach =
 +
follow steps of https://github.com/ad-freiburg/qlever/blob/master/Dockerfiles/Dockerfile.Ubuntu20.04
 +
<source lang='bash'>
 +
apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git
 +
# When gcc-11 will appear in Ubuntu repositories?
 +
# https://stackoverflow.com/a/67453352/1497139
 +
sudo apt install build-essential manpages-dev software-properties-common
 +
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
 +
sudo apt update && sudo apt install gcc-11 g++-11
 +
sudo apt install -y libjemalloc-dev ninja-build libzstd-dev
 +
# does not work without external ppa
 +
# sudo apt install -y libboost1.74-dev
 +
# won't compile
 +
# sudo apt-get install libboost-all-dev
 +
# sudo apt-get remove libboost-all-dev
 +
# sudo apt autoremove
 +
sudo add-apt-repository -y ppa:mhier/libboost-latest
 +
sudo apt install -y libboost1.74-dev
 +
mkdir build
 +
cd build
 +
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="g++-11" -DLOGLEVEL=INFO -DUSE_PARALLEL=true -GNinja ..
 +
</source>
 +
see https://github.com/ad-freiburg/qlever/issues/636 for the result

Latest revision as of 15:47, 22 July 2022

QLever trial

❌ This attempt failed see https://github.com/ad-freiburg/qlever/issues/636 see https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md

see QLever/script as discussed in QLever Issue #562 for the script which makes reproducing this attempt easier.

Environment/prerequisites

>=64 GB RAM and docker environment (e.g. Ubuntu) >1 TB diskspace (SSD preferred for speed)

./qlever -v -e
qlever version : 1.27 $ : 2022/03/16 08:54:18 $
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal
docker version
Docker version 20.10.13, build a224086
memory
              total        used        free      shared  buff/cache   available
Mem:          125Gi       1,1Gi       121Gi        31Mi       2,9Gi       123Gi
Swap:         2,0Gi          0B       2,0Gi
diskspace
/dev/sdb5       116G   23G   88G  21% /
tmpfs            63G     0   63G   0% /dev/shm
/dev/sda1       3,6T  987G  2,5T  29% /hd/seel
/dev/sdb1       511M  4,0K  511M   1% /boot/efi
soft ulimit for files
1048576

Wikidata dump download

./qlever --wikidata_download
qlever-indices/wikidata already exists
wikidata.settings.json already copied to qlever-indices/wikidata
downloading wikidata lexemes:latest-lexemes.ttl.bz2 ... please wait typically 3min ...
wikidata lexemes download started at Mi 16. Mär 09:55:07 CET 2022
--2022-03-16 09:55:07--  https://dumps.wikimedia.org/wikidatawiki/entities//latest-lexemes.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 319665811 (305M) [application/octet-stream]
Saving to: ‘latest-lexemes.ttl.bz2’

latest-lexemes.ttl.bz2                     100%[========================================================================================>] 304,86M  4,41MB/s    in 70s     

2022-03-16 09:56:17 (4,37 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [319665811/319665811]

wikidata lexemes download finished at Mi 16. Mär 09:56:17 CET 2022 after 70 seconds
downloading wikidata dump:latest-all.ttl.bz2 ... please wait typically 6hours ...
wikidata dump download started at Mi 16. Mär 09:56:17 CET 2022
--2022-03-16 09:56:17--  https://dumps.wikimedia.org/wikidatawiki/entities//latest-all.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93072933618 (87G) [application/octet-stream]
Saving to: ‘latest-all.ttl.bz2’

latest-all.ttl.bz2                           1%[>                                                                                        ]   1,02G  4,08MB/s    eta 6h 0m 
2022-03-16 15:39:52 (4,31 MB/s) - ‘latest-all.ttl.bz2’ saved [93072933618/93072933618]

wikidata dump download finished at Mi 16. Mär 15:39:52 CET 2022 after 20615 seconds

Native approach

follow steps of https://github.com/ad-freiburg/qlever/blob/master/Dockerfiles/Dockerfile.Ubuntu20.04

apt-get install -y build-essential cmake libicu-dev tzdata pkg-config uuid-runtime uuid-dev git
# When gcc-11 will appear in Ubuntu repositories? 
# https://stackoverflow.com/a/67453352/1497139
sudo apt install build-essential manpages-dev software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update && sudo apt install gcc-11 g++-11
sudo apt install -y libjemalloc-dev ninja-build libzstd-dev
# does not work without external ppa
# sudo apt install -y libboost1.74-dev
# won't compile
# sudo apt-get install libboost-all-dev
# sudo apt-get remove libboost-all-dev
# sudo apt autoremove
sudo add-apt-repository -y ppa:mhier/libboost-latest
sudo apt install -y libboost1.74-dev
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="g++-11" -DLOGLEVEL=INFO -DUSE_PARALLEL=true -GNinja ..

see https://github.com/ad-freiburg/qlever/issues/636 for the result