Difference between revisions of "WikiData Import 2022-05-21"
Jump to navigation
Jump to search
Line 331: | Line 331: | ||
</source> | </source> | ||
= Install needed packages = | = Install needed packages = | ||
− | <source lang='bash'> | + | <source lang='bash' highlight='2'> |
sudo apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime | sudo apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime | ||
+ | sudo apt install -y lbzip2 libjemalloc-dev libzstd-dev | ||
</source> | </source> |
Revision as of 08:00, 21 May 2022
QLever trial
>=64 GB RAM and docker environment (e.g. Ubuntu) >1 TB diskspace (SSD preferred for speed)
./qlever -v -e
qlever version : 1.27 $ : 2022/03/16 08:54:18 $
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
docker version
Docker version 20.10.13, build a224086
memory
total used free shared buff/cache available
Mem: 125Gi 1,1Gi 121Gi 31Mi 2,9Gi 123Gi
Swap: 2,0Gi 0B 2,0Gi
diskspace
/dev/sdb5 116G 23G 88G 21% /
tmpfs 63G 0 63G 0% /dev/shm
/dev/sda1 3,6T 987G 2,5T 29% /hd/seel
/dev/sdb1 511M 4,0K 511M 1% /boot/efi
soft ulimit for files
1048576
QLever clone
./qlever -c
cloning qlever - please wait typically 1 min ...
cloning qlever started at Sa 21. Mai 08:33:35 CEST 2022
Cloning into 'qlever-code'...
remote: Enumerating objects: 13828, done.
remote: Counting objects: 100% (973/973), done.
remote: Compressing objects: 100% (705/705), done.
remote: Total 13828 (delta 574), reused 451 (delta 267), pack-reused 12855
Receiving objects: 100% (13828/13828), 111.72 MiB | 6.86 MiB/s, done.
Resolving deltas: 100% (10707/10707), done.
Submodule 'third_party/abseil-cpp' (https://github.com/abseil/abseil-cpp.git) registered for path 'third_party/abseil-cpp'
Submodule 'third_party/antlr4' (https://github.com/antlr/antlr4.git) registered for path 'third_party/antlr4'
Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
Submodule 'third_party/re2' (https://github.com/google/re2.git) registered for path 'third_party/re2'
Submodule 'third_party/stxxl' (https://github.com/ad-freiburg/stxxl) registered for path 'third_party/stxxl'
Cloning into '/hd/seel/qlever/qlever-code/third_party/abseil-cpp'...
remote: Enumerating objects: 16841, done.
remote: Counting objects: 100% (149/149), done.
remote: Compressing objects: 100% (78/78), done.
remote: Total 16841 (delta 83), reused 112 (delta 71), pack-reused 16692
Receiving objects: 100% (16841/16841), 10.55 MiB | 6.78 MiB/s, done.
Resolving deltas: 100% (13078/13078), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/antlr4'...
remote: Enumerating objects: 128025, done.
remote: Counting objects: 100% (13/13), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 128025 (delta 3), reused 3 (delta 1), pack-reused 128012
Receiving objects: 100% (128025/128025), 65.33 MiB | 6.76 MiB/s, done.
Resolving deltas: 100% (75484/75484), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/googletest'...
remote: Enumerating objects: 24402, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (32/32), done.
remote: Total 24402 (delta 31), reused 53 (delta 28), pack-reused 24335
Receiving objects: 100% (24402/24402), 10.27 MiB | 6.87 MiB/s, done.
Resolving deltas: 100% (18049/18049), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/re2'...
remote: Enumerating objects: 7130, done.
remote: Counting objects: 100% (961/961), done.
remote: Compressing objects: 100% (86/86), done.
remote: Total 7130 (delta 891), reused 878 (delta 875), pack-reused 6169
Receiving objects: 100% (7130/7130), 3.18 MiB | 6.86 MiB/s, done.
Resolving deltas: 100% (5485/5485), done.
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl'...
remote: Enumerating objects: 40997, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 40997 (delta 22), reused 39 (delta 12), pack-reused 40937
Receiving objects: 100% (40997/40997), 14.15 MiB | 6.80 MiB/s, done.
Resolving deltas: 100% (30921/30921), done.
Submodule path 'third_party/abseil-cpp': checked out 'b9b925341f9e90f5e7aa0cf23f036c29c7e454eb'
Submodule path 'third_party/antlr4': checked out 'e4c1a74c66bd5290364ea2b36c97cd724b247357'
Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'third_party/re2': checked out '13ebb377c6ad763ca61d12dd6f88b1126bd0b911'
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (1/1), 215 bytes | 215.00 KiB/s, done.
From https://github.com/ad-freiburg/stxxl
* branch 4f368a8eacc965a775f208df0c2d3a0721f4bdf1 -> FETCH_HEAD
Submodule path 'third_party/stxxl': checked out '4f368a8eacc965a775f208df0c2d3a0721f4bdf1'
Submodule 'extlib/foxxll' (https://github.com/ad-freiburg/foxxll.git) registered for path 'third_party/stxxl/extlib/foxxll'
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl/extlib/foxxll'...
remote: Enumerating objects: 21414, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 21414 (delta 9), reused 13 (delta 4), pack-reused 21386
Receiving objects: 100% (21414/21414), 4.60 MiB | 2.12 MiB/s, done.
Resolving deltas: 100% (15789/15789), done.
Submodule path 'third_party/stxxl/extlib/foxxll': checked out '8cbca7bedcdb0b84a6de99e927c5fa27a4bbbfb2'
Submodule 'extlib/tlx' (https://github.com/joka921/tlx.git) registered for path 'third_party/stxxl/extlib/foxxll/extlib/tlx'
Cloning into '/hd/seel/qlever/qlever-code/third_party/stxxl/extlib/foxxll/extlib/tlx'...
remote: Enumerating objects: 3418, done.
remote: Counting objects: 100% (53/53), done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 3418 (delta 25), reused 39 (delta 20), pack-reused 3365
Receiving objects: 100% (3418/3418), 1.11 MiB | 6.59 MiB/s, done.
Resolving deltas: 100% (2612/2612), done.
Submodule path 'third_party/stxxl/extlib/foxxll/extlib/tlx': checked out 'ef81a598d9880cc7d242afc47de7328634f07f1d'
cloning qlever finished at Sa 21. Mai 08:34:20 CEST 2022 after 45 seconds
Wikidata dump download
./qlever --wikidata_download
qlever-indices/wikidata already exists
wikidata.settings.json already copied to qlever-indices/wikidata
downloading wikidata lexemes:latest-lexemes.ttl.bz2 ... please wait typically 3min ...
wikidata lexemes download started at Sa 21. Mai 08:38:56 CEST 2022
--2022-05-21 08:38:56-- https://dumps.wikimedia.org/wikidatawiki/entities//latest-lexemes.ttl.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 2620:0:861:1:208:80:154:7, 208.80.154.7
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|2620:0:861:1:208:80:154:7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 327629685 (312M) [application/octet-stream]
Saving to: ‘latest-lexemes.ttl.bz2’
latest-lexemes.ttl. 100%[===================>] 312,45M 4,17MB/s in 77s
2022-05-21 08:40:14 (4,08 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [327629685/327629685]
wikidata lexemes download finished at Sa 21. Mai 08:40:14 CEST 2022 after 78 seconds
downloading wikidata dump:latest-all.ttl.bz2 ... please wait typically 6hours ...
wikidata dump download started at Sa 21. Mai 08:42:59 CEST 2022
Build code
see WikiData_Import_2022-03-16#Native_approach for preparation steps and follow steps of https://github.com/ad-freiburg/qlever/blob/master/Dockerfiles/Dockerfile.Ubuntu20.04
wf@sun:/hd/seel/qlever/qlever-code$ mkdir build
wf@sun:/hd/seel/qlever/qlever-code$ cd build
wf@sun:/hd/seel/qlever/qlever-code/build$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="g++-11" -DLOGLEVEL=INFO -DUSE_PARALLEL=true -GNinja ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 11.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-11
-- Check for working CXX compiler: /usr/bin/g++-11 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAS_COROUTINES
-- Performing Test HAS_COROUTINES - Success
-- Building without demo. To enable demo build use: -DWITH_DEMO=True
CMake Deprecation Warning at third_party/antlr4/runtime/Cpp/CMakeLists.txt:31 (CMAKE_POLICY):
The OLD behavior for policy CMP0054 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
CMake Deprecation Warning at third_party/antlr4/runtime/Cpp/CMakeLists.txt:32 (CMAKE_POLICY):
The OLD behavior for policy CMP0045 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
CMake Deprecation Warning at third_party/antlr4/runtime/Cpp/CMakeLists.txt:33 (CMAKE_POLICY):
The OLD behavior for policy CMP0042 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
CMake Deprecation Warning at third_party/antlr4/runtime/Cpp/CMakeLists.txt:38 (CMAKE_POLICY):
The OLD behavior for policy CMP0059 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
CMake Deprecation Warning at third_party/antlr4/runtime/Cpp/CMakeLists.txt:39 (CMAKE_POLICY):
The OLD behavior for policy CMP0054 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.1")
-- Checking for module 'uuid'
-- Found uuid, version 2.34.0
-- Output libraries to /hd/seel/qlever/qlever-code/dist
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found the following ICU libraries:
-- uc (required)
-- i18n (required)
-- Found ICU: /usr/include (found suitable version "66.1", minimum required is "60")
-- Checking for module 'jemalloc'
-- Found jemalloc, version 5.2.1_0
CMake Warning at /usr/share/cmake-3.16/Modules/FindBoost.cmake:1161 (message):
New Boost version may have incorrect or missing dependencies and imported
targets
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1283 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1921 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:88 (find_package)
CMake Warning at /usr/share/cmake-3.16/Modules/FindBoost.cmake:1161 (message):
New Boost version may have incorrect or missing dependencies and imported
targets
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1283 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1921 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:88 (find_package)
CMake Warning at /usr/share/cmake-3.16/Modules/FindBoost.cmake:1161 (message):
New Boost version may have incorrect or missing dependencies and imported
targets
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1283 (_Boost_COMPONENT_DEPENDENCIES)
/usr/share/cmake-3.16/Modules/FindBoost.cmake:1921 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:88 (find_package)
-- Found Boost: /usr/include (found suitable version "1.74.0", minimum required is "1.74") found components: iostreams program_options regex
-- Found Python: /usr/bin/python3.8 (found version "3.8.10") found components: Interpreter
CMake Warning at third_party/abseil-cpp/CMakeLists.txt:74 (message):
A future Abseil release will default ABSL_PROPAGATE_CXX_STD to ON for CMake
3.8 and up. We recommend enabling this option to ensure your project still
builds correctly.
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found Git: /usr/bin/git (found version "2.25.1")
-- Detected git refspec 1.4.1-825-g4f368a8e sha 4f368a8eacc965a775f208df0c2d3a0721f4bdf1
-- Performing Test CXX_HAS_FLAGS_WEXTRA
-- Performing Test CXX_HAS_FLAGS_WEXTRA - Success
-- Performing Test CXX_HAS_TEMPLATE_DEPTH
-- Performing Test CXX_HAS_TEMPLATE_DEPTH - Success
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test HAVE_STD_MUTEX
-- Performing Test HAVE_STD_MUTEX - Success
-- Performing Test HAVE_STD_THREAD
-- Performing Test HAVE_STD_THREAD - Success
-- Looking for C++ include random
-- Looking for C++ include random - found
-- Checking for 64-bit off_t
-- Checking for 64-bit off_t - present
-- Checking for fseeko/ftello
-- Checking for fseeko/ftello - present
-- Performing Test STXXL_HAVE_O_DIRECT
-- Performing Test STXXL_HAVE_O_DIRECT - Success
-- Looking for mmap
-- Looking for mmap - found
-- Performing Test STXXL_HAVE_LINUXAIO_FILE
-- Performing Test STXXL_HAVE_LINUXAIO_FILE - Success
-- Performing Test STXXL_HAVE_SYNC_ADD_AND_FETCH
-- Performing Test STXXL_HAVE_SYNC_ADD_AND_FETCH - Success
-- OpenMP disabled in STXXL (no parallelism is used).
-- Looking for mallinfo
-- Looking for mallinfo - found
-- Looking for mlock
-- Looking for mlock - found
-- Detected git refspec 1.4.1-460-g8cbca7be sha 8cbca7bedcdb0b84a6de99e927c5fa27a4bbbfb2
-- Performing Test FOXXLL_HAVE_O_DIRECT
-- Performing Test FOXXLL_HAVE_O_DIRECT - Success
-- Looking for mmap
-- Looking for mmap - found
-- Performing Test FOXXLL_HAVE_LINUXAIO_FILE
-- Performing Test FOXXLL_HAVE_LINUXAIO_FILE - Success
-- Performing Test TLX_CXX_HAS_CXX17
-- Performing Test TLX_CXX_HAS_CXX17 - Success
-- TLX CMAKE_CXX_FLAGS: -Wshadow -Wold-style-cast -std=c++17 -g -W -Wall -Wextra -fPIC -Wall -Wextra -fopenmp -W -Wall -pedantic -Wno-long-long -Wextra -ftemplate-depth=1024 -W -Wall -pedantic -Wno-long-long -Wextra -ftemplate-depth=1024 -Wcast-qual -Winit-self -Wnoexcept -Woverloaded-virtual -Wredundant-decls
-- ---
-- CXX_FLAGS are : -Wall -Wextra -fopenmp
-- CXX_FLAGS_RELEASE are : -O3 -DNDEBUG -O3
-- CXX_FLAGS_DEBUG are : -g
-- IMPORTANT: Make sure you have selected the desired CMAKE_BUILD_TYPE
-- CMAKE_BUILD_TYPE is Release
-- ---
-- Configuring done
-- Generating done
-- Build files have been written to: /hd/seel/qlever/qlever-code/build
Ninja build
see https://ninja-build.org/manual.html
ninja
...
[611/611] Linking CXX executable test/SparqlAntlrParserTest
Install needed packages
sudo apt-get install -y wget python3-yaml unzip curl bzip2 pkg-config libicu-dev python3-icu libgomp1 uuid-runtime
sudo apt install -y lbzip2 libjemalloc-dev libzstd-dev