Difference between revisions of "Wikidata Import 2023-05-15"

From BITPlan Wiki
Jump to navigation Jump to search
Line 68: Line 68:
 
-rw-rw-r-- 1 wf wf 101738463320 May 11 13:38 latest-all.ttl.bz2
 
-rw-rw-r-- 1 wf wf 101738463320 May 11 13:38 latest-all.ttl.bz2
 
-rw-rw-r-- 1 wf wf    451229154 May 13 01:33 latest-lexemes.ttl.bz2
 
-rw-rw-r-- 1 wf wf    451229154 May 13 01:33 latest-lexemes.ttl.bz2
 +
</source>
 +
 +
= index =
 +
== doindex ==
 +
<source lang='bash'>
 +
for F in latest-lexemes.ttl.bz2 latest-all.ttl.bz2
 +
do
 +
  bzcat $F | head -1000 | \grep ^@prefix
 +
done | sort -u > wikidata-latest.prefix-definitions
 +
docker run --rm -u 10000:10000 -v /etc/localtime:/etc/localtime:ro -v /hd/mantax/qlever/wikidata:/index -w /index --entrypoint bash --name qlever.wikidata-latest.index-build adfreiburg/qlever -c "ulimit -Sn 1048576; bzcat -f wikidata-latest.prefix-definitions latest-lexemes.ttl.bz2 latest-all.ttl.bz2 | IndexBuilderMain -F ttl -f - -i wikidata-latest -s wikidata-latest.settings.json --stxxl-memory-gb 10 | tee wikidata-latest.index-log.txt"
 +
</source>
 +
<source lang='bash'>
 +
nohup ./doindex &
 
</source>
 
</source>

Revision as of 20:35, 15 May 2023

Import

Import
edit
state  
url  https://wiki.bitplan.com/index.php/Wikidata_Import_2023-05-15
target  QLever
start  2023-05-15
end  
days  
os  Ubuntu 22.04.2 LTS
cpu  Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
ram  256
triples  
comment  

see Wikidata_Import_2023-01-24

QLever control

https://github.com/ad-freiburg/qlever-control

mkdir qlever
cd qlever
git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 426, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (170/170), done.
remote: Total 426 (delta 108), reused 231 (delta 95), pack-reused 160
Receiving objects: 100% (426/426), 131.00 KiB | 585.00 KiB/s, done.
Resolving deltas: 100% (163/163), done.

setup wikidata

mkdir wikidata
cd wikidata/
. ../qlever-control/qlever wikidata

QLEVER CONFIG

Checking your PATH ...
Added the directory "/hd/mantax/qlever/qlever-control" to your PATH

Setting up bash autocompletion ...
Done, number of completions: 35

Creating new Qleverfile ...
Copied pre-configured Qleverfile for "wikidata" into current directory.

Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:

qlever get-data
qlever index
qlever start
qlever example-query

get-data ~7h:30 min

nohup qlever get-data&
tail nohup.out
440650K ...                                                   100% 6.46T=2m17s

2023-05-15 18:20:15 (3.13 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [451229154/451229154]

FINISHED --2023-05-15 18:20:15--
Total wall clock time: 7h 30m 9s
Downloaded: 2 files, 95G in 7h 30m 9s (3.61 MB/s)
ls -l
-rw-rw-r-- 1 wf wf 101738463320 May 11 13:38 latest-all.ttl.bz2
-rw-rw-r-- 1 wf wf    451229154 May 13 01:33 latest-lexemes.ttl.bz2

index

doindex

for F in latest-lexemes.ttl.bz2 latest-all.ttl.bz2 
do 
  bzcat $F | head -1000 | \grep ^@prefix
done | sort -u > wikidata-latest.prefix-definitions
docker run --rm -u 10000:10000 -v /etc/localtime:/etc/localtime:ro -v /hd/mantax/qlever/wikidata:/index -w /index --entrypoint bash --name qlever.wikidata-latest.index-build adfreiburg/qlever -c "ulimit -Sn 1048576; bzcat -f wikidata-latest.prefix-definitions latest-lexemes.ttl.bz2 latest-all.ttl.bz2 | IndexBuilderMain -F ttl -f - -i wikidata-latest -s wikidata-latest.settings.json --stxxl-memory-gb 10 | tee wikidata-latest.index-log.txt"
nohup ./doindex &