Wikidata Import 2023-05-10

From BITPlan Wiki
Revision as of 19:14, 10 May 2023 by Wf (talk | contribs)
Jump to navigation Jump to search

Download

Download Options

https://dumps.wikimedia.org/wikidatawiki/entities

dcatap.rdf                                         06-May-2023 02:08               84753
latest-all.json.bz2                                03-May-2023 21:06         81640390615
latest-all.json.gz                                 10-May-2023 13:49        124070020402
latest-all.nt.bz2                                  04-May-2023 16:07        158382342866
latest-all.nt.gz                                   03-May-2023 22:23        205171447838
latest-all.ttl.bz2                                 04-May-2023 03:24        101606862077
latest-all.ttl.gz                                  03-May-2023 17:08        124093922794
latest-lexemes.json.bz2                            10-May-2023 03:57           306901617
latest-lexemes.json.gz                             10-May-2023 03:55           418171562
latest-lexemes.nt.bz2                              05-May-2023 23:36           793805750
latest-lexemes.nt.gz                               05-May-2023 23:30          1035632811
latest-lexemes.ttl.bz2                             05-May-2023 23:31           450346788
latest-lexemes.ttl.gz                              05-May-2023 23:27           559471601
latest-truthy.nt.bz2                               06-May-2023 01:38         36065028020
latest-truthy.nt.gz                                05-May-2023 22:20         59829390689

download script

cat download.sh 
#/bin/bash
# WF 2023-04-26
# download wikidata dumps
baseurl=https://dumps.wikimedia.org/wikidatawiki/entities/
for file in latest-all latest-lexemes
do
  for ext in ttl.bz2
  do
    url=$baseurl/$file.$ext
    log=$file-$ext.log
    nohup wget $url >> $log&
  done
done

Munging ~29 h

Preparation

see Wikidata_Import_2023-04-26#Preparation_.7E20-30_min

calling munge.sh

domunge.sh

#!/bin/bash
# WF 2023-04-29
# start munge in background
bzcat latest-all.ttl.bz2 | service/munge.sh -f - -d data -- --skolemize

start domunge.sh and show nohup.out log