Difference between revisions of "Wikidata Import 2023-05-15"
Jump to navigation
Jump to search
(→index) |
|||
Line 86: | Line 86: | ||
nohup ./doindex & | nohup ./doindex & | ||
</source> | </source> | ||
+ | == Log == | ||
+ | <pre> | ||
+ | 2023-05-15 20:38:48.787 - INFO: QLever IndexBuilder, compiled on Mon May 1 10:21:29 UTC 2023 using git hash 83f1e8 | ||
+ | 2023-05-15 20:38:48.787 - INFO: You specified the input format: TTL | ||
+ | 2023-05-15 20:38:48.788 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1" | ||
+ | 2023-05-15 20:38:48.788 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory | ||
+ | 2023-05-15 20:38:48.788 - INFO: Integers that cannot be represented by QLever will throw an exception (this is the default behavior) | ||
+ | 2023-05-15 20:38:48.788 - INFO: Processing input triples from /dev/stdin ... | ||
+ | 2023-05-15 20:40:34.263 - INFO: Input triples processed: 100,000,000 | ||
+ | ... | ||
+ | 2023-05-16 02:44:35.983 - INFO: Input triples processed: 18,500,000,000 | ||
+ | 2023-05-16 02:45:56.492 - INFO: Done, total number of triples read: 18,572,955,199 [may contain duplicates] | ||
+ | 2023-05-16 02:45:56.492 - INFO: Number of QLever-internal triples created: 11,318,825,076 [may contain duplicates] | ||
+ | 2023-05-16 02:45:56.492 - INFO: Merging partial vocabularies in byte order (internal only) ... | ||
+ | 2023-05-16 02:47:12.954 - INFO: Words merged: 100,000,000 | ||
+ | ... | ||
+ | 2023-05-16 03:01:05.155 - INFO: Words merged: 800,000,000 | ||
+ | 2023-05-16 03:01:41.669 - INFO: Number of words in internal vocabulary: 861,507,414 | ||
+ | 2023-05-16 03:01:41.669 - INFO: Building prefix tree from internal vocabulary ... | ||
+ | 2023-05-16 03:02:00.943 - INFO: Words processed: 100,000,000 | ||
+ | ... | ||
+ | 2023-05-16 03:06:50.205 - INFO: Words processed: 800,000,000 | ||
+ | 2023-05-16 03:07:15.366 - INFO: Computing maximally compressing prefixes (greedy algorithm) ... | ||
+ | 2023-05-16 03:19:52.577 - INFO: Reduction of size of internal vocabulary: 45% | ||
+ | 2023-05-16 03:20:17.322 - INFO: Merging partial vocabularies in Unicode order (internal and external) ... | ||
+ | 2023-05-16 03:23:39.976 - INFO: Words merged: 100,000,000 | ||
+ | ... | ||
+ | 2023-05-16 05:05:07.691 - INFO: Words merged: 3,300,000,000 | ||
+ | 2023-05-16 05:08:47.811 - INFO: Number of words in external vocabulary: 2,529,812,916 | ||
+ | 2023-05-16 05:08:47.811 - INFO: Removing temporary files ... | ||
+ | 2023-05-16 05:08:58.512 - INFO: Converting external vocabulary to binary format ... | ||
+ | 2023-05-16 05:29:57.049 - INFO: Converting triples from local IDs to global IDs ... | ||
+ | 2023-05-16 05:30:04.051 - INFO: Triples converted: 100,000,000 | ||
+ | ... | ||
+ | 2023-05-16 06:43:46.833 - INFO: Triples converted: 29,800,000,000 | ||
+ | 2023-05-16 06:43:59.778 - INFO: Done, total number of triples converted: 29,891,780,275 | ||
+ | 2023-05-16 06:43:59.816 - INFO: Writing compressed vocabulary to disk ... | ||
+ | 2023-05-16 06:48:30.007 - INFO: Creating a pair of index permutations ... | ||
+ | </pre> |
Revision as of 06:10, 16 May 2023
Import
Import | |
---|---|
edit | |
state | |
url | https://wiki.bitplan.com/index.php/Wikidata_Import_2023-05-15 |
target | QLever |
start | 2023-05-15 |
end | |
days | |
os | Ubuntu 22.04.2 LTS |
cpu | Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz |
ram | 256 |
triples | |
comment |
see Wikidata_Import_2023-01-24
QLever control
https://github.com/ad-freiburg/qlever-control
mkdir qlever
cd qlever
git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 426, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (170/170), done.
remote: Total 426 (delta 108), reused 231 (delta 95), pack-reused 160
Receiving objects: 100% (426/426), 131.00 KiB | 585.00 KiB/s, done.
Resolving deltas: 100% (163/163), done.
setup wikidata
mkdir wikidata
cd wikidata/
. ../qlever-control/qlever wikidata
QLEVER CONFIG
Checking your PATH ...
Added the directory "/hd/mantax/qlever/qlever-control" to your PATH
Setting up bash autocompletion ...
Done, number of completions: 35
Creating new Qleverfile ...
Copied pre-configured Qleverfile for "wikidata" into current directory.
Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:
qlever get-data
qlever index
qlever start
qlever example-query
get-data ~7h:30 min
nohup qlever get-data&
tail nohup.out
440650K ... 100% 6.46T=2m17s
2023-05-15 18:20:15 (3.13 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [451229154/451229154]
FINISHED --2023-05-15 18:20:15--
Total wall clock time: 7h 30m 9s
Downloaded: 2 files, 95G in 7h 30m 9s (3.61 MB/s)
ls -l
-rw-rw-r-- 1 wf wf 101738463320 May 11 13:38 latest-all.ttl.bz2
-rw-rw-r-- 1 wf wf 451229154 May 13 01:33 latest-lexemes.ttl.bz2
index
update qlever docker image
docker pull adfreiburg/qlever
doindex
for F in latest-lexemes.ttl.bz2 latest-all.ttl.bz2
do
bzcat $F | head -1000 | \grep ^@prefix
done | sort -u > wikidata-latest.prefix-definitions
docker run --rm -u 10000:10000 -v /etc/localtime:/etc/localtime:ro -v /hd/mantax/qlever/wikidata:/index -w /index --entrypoint bash --name qlever.wikidata-latest.index-build adfreiburg/qlever -c "ulimit -Sn 1048576; bzcat -f wikidata-latest.prefix-definitions latest-lexemes.ttl.bz2 latest-all.ttl.bz2 | IndexBuilderMain -F ttl -f - -i wikidata-latest -s wikidata-latest.settings.json --stxxl-memory-gb 10 | tee wikidata-latest.index-log.txt"
nohup ./doindex &
Log
2023-05-15 20:38:48.787 - INFO: QLever IndexBuilder, compiled on Mon May 1 10:21:29 UTC 2023 using git hash 83f1e8 2023-05-15 20:38:48.787 - INFO: You specified the input format: TTL 2023-05-15 20:38:48.788 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1" 2023-05-15 20:38:48.788 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory 2023-05-15 20:38:48.788 - INFO: Integers that cannot be represented by QLever will throw an exception (this is the default behavior) 2023-05-15 20:38:48.788 - INFO: Processing input triples from /dev/stdin ... 2023-05-15 20:40:34.263 - INFO: Input triples processed: 100,000,000 ... 2023-05-16 02:44:35.983 - INFO: Input triples processed: 18,500,000,000 2023-05-16 02:45:56.492 - INFO: Done, total number of triples read: 18,572,955,199 [may contain duplicates] 2023-05-16 02:45:56.492 - INFO: Number of QLever-internal triples created: 11,318,825,076 [may contain duplicates] 2023-05-16 02:45:56.492 - INFO: Merging partial vocabularies in byte order (internal only) ... 2023-05-16 02:47:12.954 - INFO: Words merged: 100,000,000 ... 2023-05-16 03:01:05.155 - INFO: Words merged: 800,000,000 2023-05-16 03:01:41.669 - INFO: Number of words in internal vocabulary: 861,507,414 2023-05-16 03:01:41.669 - INFO: Building prefix tree from internal vocabulary ... 2023-05-16 03:02:00.943 - INFO: Words processed: 100,000,000 ... 2023-05-16 03:06:50.205 - INFO: Words processed: 800,000,000 2023-05-16 03:07:15.366 - INFO: Computing maximally compressing prefixes (greedy algorithm) ... 2023-05-16 03:19:52.577 - INFO: Reduction of size of internal vocabulary: 45% 2023-05-16 03:20:17.322 - INFO: Merging partial vocabularies in Unicode order (internal and external) ... 2023-05-16 03:23:39.976 - INFO: Words merged: 100,000,000 ... 2023-05-16 05:05:07.691 - INFO: Words merged: 3,300,000,000 2023-05-16 05:08:47.811 - INFO: Number of words in external vocabulary: 2,529,812,916 2023-05-16 05:08:47.811 - INFO: Removing temporary files ... 2023-05-16 05:08:58.512 - INFO: Converting external vocabulary to binary format ... 2023-05-16 05:29:57.049 - INFO: Converting triples from local IDs to global IDs ... 2023-05-16 05:30:04.051 - INFO: Triples converted: 100,000,000 ... 2023-05-16 06:43:46.833 - INFO: Triples converted: 29,800,000,000 2023-05-16 06:43:59.778 - INFO: Done, total number of triples converted: 29,891,780,275 2023-05-16 06:43:59.816 - INFO: Writing compressed vocabulary to disk ... 2023-05-16 06:48:30.007 - INFO: Creating a pair of index permutations ...