Difference between revisions of "WikiData Import 2022-06-24"
Line 97: | Line 97: | ||
</source> | </source> | ||
− | = wikidata files = | + | = wikidata files and index settings = |
reuse files of latest attempt of last month | reuse files of latest attempt of last month | ||
− | <source lang='bash highlight='1'> | + | <source lang='bash' highlight='1'> |
wikidata$ ls -l | wikidata$ ls -l | ||
total 92754792 | total 92754792 | ||
Line 107: | Line 107: | ||
drwxrwxr-x 2 wf wf 4096 Mai 23 08:07 RCS | drwxrwxr-x 2 wf wf 4096 Mai 23 08:07 RCS | ||
-rw-rw-r-- 1 wf wf 40 Mai 23 14:52 wikidata.settings.json | -rw-rw-r-- 1 wf wf 40 Mai 23 14:52 wikidata.settings.json | ||
+ | </source> | ||
+ | <source lang='bash' highlight='1'> | ||
+ | cat Qleverfile | ||
+ | # Qleverfile for folder /hd/seel/qlever | ||
+ | # Automatically created on Sa 21. Mai 09:09:41 CEST 2022. | ||
+ | # Modify or expand as you see fit. | ||
+ | |||
+ | # Indexer settings | ||
+ | DB = wikidata | ||
+ | RDF_FILES = "latest-all.ttl.bz2 latest-lexemes.ttl.bz2" | ||
+ | CAT_FILES = "bzcat ${RDF_FILES}" | ||
+ | WITH_TEXT = false | ||
+ | SETTINGS_JSON = '{ "num-triples-per-batch": 10000000 }' | ||
+ | |||
+ | # Server settings | ||
+ | HOSTNAME = sun.bitplan.com | ||
+ | SERVER_PORT = 7001 | ||
+ | MEMORY_FOR_QUERIES = 10 | ||
+ | CACHE_MAX_SIZE_GB = 5 | ||
+ | CACHE_MAX_SIZE_GB_SINGLE_ENTRY = 1 | ||
+ | CACHE_MAX_NUM_ENTRIES = 100 | ||
+ | |||
+ | # QLever binaries | ||
+ | QLEVER_BIN_DIR = /hd/seel/qlever/qlever-code/build/ | ||
+ | USE_DOCKER = false | ||
+ | QLEVER_DOCKER_IMAGE = adfreiburg/qlever | ||
+ | QLEVER_DOCKER_CONTAINER = qlever.must_specify | ||
+ | |||
+ | # QLever UI | ||
+ | QLEVERUI_PORT = 7000 | ||
+ | QLEVERUI_DIR = qlever-ui | ||
+ | QLEVERUI_CONFIG = default | ||
</source> | </source> |
Revision as of 08:58, 24 June 2022
see QLever/script as discussed in QLever Issue #562 for the script which makes reproducing this attempt easier.
see QLever Discussions for more details on this attempt series.
since the https://github.com/ad-freiburg/qlever-control now has an official "qlever" script we have rename the script that has the purpose to make the import attempts reproducible to qleverauto.
Beware of https://github.com/ad-freiburg/qlever-control/issues/4 - make sure ulimit -n is set!. This attempt had to be restarted since setting the value within a script did not work.
Preparations
Native build
WikiData_Import_2022-05-21#Build_code steps still apply for this attempt using the native/compiled version of qlever.
Update qlever-code
qlever-code$ git pull
remote: Enumerating objects: 301, done.
remote: Counting objects: 100% (301/301), done.
remote: Compressing objects: 100% (223/223), done.
Receiving objects: 31% (94/301), 53.42 MiB | 3.75 MiB/s
...
create mode 100644 src/util/antlr/ANTLRErrorHandling.h
delete mode 100644 src/util/antlr/ThrowingErrorStrategy.h
create mode 100644 toolchains/gcc12.cmake
Update submodules
git submodule update --init --recursive
Submodule path 'third_party/abseil-cpp': checked out '2617970857c46e6ec971865d54f00445c260f682'
Submodule path 'third_party/googletest': checked out '0320f517fd920866d918e564105d68fd4362040a'
From https://github.com/ad-freiburg/stxxl
* branch 70cc597f3f76f96f036db4ffdd84a5cd7b224c7c -> FETCH_HEAD
Fetching submodule extlib/foxxll
Submodule path 'third_party/stxxl': checked out '70cc597f3f76f96f036db4ffdd84a5cd7b224c7c'
From https://github.com/ad-freiburg/foxxll
* branch 784859bc09a3982d6545fbf1d7b698e273401703 -> FETCH_HEAD
Submodule path 'third_party/stxxl/extlib/foxxll': checked out '784859bc09a3982d6545fbf1d7b698e273401703'
Build
wf@sun:/hd/seel/qlever/qlever-code$ rm -rf build/
wf@sun:/hd/seel/qlever/qlever-code$ mkdir build
wf@sun:/hd/seel/qlever/qlever-code$ cd build
wf@sun:/hd/seel/qlever/qlever-code/build$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="g++-11" -DLOGLEVEL=INFO -DUSE_PARALLEL=true -GNinja ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 11.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
...
-- ---
-- Configuring done
-- Generating done
-- Build files have been written to: /hd/seel/qlever/qlever-code/build
Ninja
see https://ninja-build.org/manual.html
ninja
...
[613/613] Linking CXX executable test/SparqlExpressionTest
qleverauto environment checks
./qleverauto -v
qleverauto version : 1.29 $ : 2022/05/23 06:15:28 $
./qleverauto -e
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
docker version
Docker version 20.10.16, build aa7e414
memory
total used free shared buff/cache available
Mem: 125Gi 1,3Gi 120Gi 27Mi 4,2Gi 123Gi
Swap: 2,0Gi 0B 2,0Gi
diskspace
/dev/sdb5 116G 25G 86G 23% /
tmpfs 63G 16K 63G 1% /dev/shm
/dev/sda1 3,6T 2,7T 716G 80% /hd/seel
/dev/sdb1 511M 4,0K 511M 1% /boot/efi
soft ulimit for files
1048576
wikidata files and index settings
reuse files of latest attempt of last month
wikidata$ ls -l
total 92754792
-rw-rw-r-- 1 wf wf 94653250500 Mai 19 07:35 latest-all.ttl.bz2
-rw-rw-r-- 1 wf wf 327629685 Mai 21 01:28 latest-lexemes.ttl.bz2
-rw-r--r-- 1 wf wf 911 Mai 22 17:47 Qleverfile
drwxrwxr-x 2 wf wf 4096 Mai 23 08:07 RCS
-rw-rw-r-- 1 wf wf 40 Mai 23 14:52 wikidata.settings.json
cat Qleverfile
# Qleverfile for folder /hd/seel/qlever
# Automatically created on Sa 21. Mai 09:09:41 CEST 2022.
# Modify or expand as you see fit.
# Indexer settings
DB = wikidata
RDF_FILES = "latest-all.ttl.bz2 latest-lexemes.ttl.bz2"
CAT_FILES = "bzcat ${RDF_FILES}"
WITH_TEXT = false
SETTINGS_JSON = '{ "num-triples-per-batch": 10000000 }'
# Server settings
HOSTNAME = sun.bitplan.com
SERVER_PORT = 7001
MEMORY_FOR_QUERIES = 10
CACHE_MAX_SIZE_GB = 5
CACHE_MAX_SIZE_GB_SINGLE_ENTRY = 1
CACHE_MAX_NUM_ENTRIES = 100
# QLever binaries
QLEVER_BIN_DIR = /hd/seel/qlever/qlever-code/build/
USE_DOCKER = false
QLEVER_DOCKER_IMAGE = adfreiburg/qlever
QLEVER_DOCKER_CONTAINER = qlever.must_specify
# QLever UI
QLEVERUI_PORT = 7000
QLEVERUI_DIR = qlever-ui
QLEVERUI_CONFIG = default