Difference between revisions of "Wikidata Import 2023-01-24"
Jump to navigation
Jump to search
Line 54: | Line 54: | ||
2023-01-24 20:14:37 (4.25 MB/s) - ‘latest-all.nt.bz2’ saved [155239026614/155239026614] | 2023-01-24 20:14:37 (4.25 MB/s) - ‘latest-all.nt.bz2’ saved [155239026614/155239026614] | ||
</source> | </source> | ||
− | == unzip | + | == unzip == |
=== bunzip2 with nohup does not work properly! === | === bunzip2 with nohup does not work properly! === | ||
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616002 | https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616002 | ||
Line 61: | Line 61: | ||
bunzip2: Control-C or similar caught, quitting. | bunzip2: Control-C or similar caught, quitting. | ||
bunzip2: Deleting output file latest-all.nt, if it exists. | bunzip2: Deleting output file latest-all.nt, if it exists. | ||
+ | </source> | ||
+ | === === | ||
+ | <source lang='bash' highlight='1'> | ||
+ | nohup pv latest-all.nt.bz2 | bunzip2 > latest-all.nt& | ||
</source> | </source> |
Revision as of 06:04, 25 January 2023
Download latest wikidata Dump ~10 hours
https://dumps.wikimedia.org/wikidatawiki/entities
latest-all.json.bz2 18-Jan-2023 17:40 79779054481 latest-all.json.gz 18-Jan-2023 10:51 121027823223 latest-all.nt.bz2 19-Jan-2023 17:00 155239026614 latest-all.nt.gz 18-Jan-2023 23:55 200917826250 latest-all.ttl.bz2 19-Jan-2023 04:34 99583991786 latest-all.ttl.gz 18-Jan-2023 19:25 121477047220 latest-lexemes.json.bz2 18-Jan-2023 03:47 270280878 latest-lexemes.json.gz 18-Jan-2023 03:46 369955852 latest-lexemes.nt.bz2 20-Jan-2023 23:32 717929951 latest-lexemes.nt.gz 20-Jan-2023 23:27 947996669 latest-lexemes.ttl.bz2 20-Jan-2023 23:28 402494804 latest-lexemes.ttl.gz 20-Jan-2023 23:25 503140103 latest-truthy.nt.bz2 20-Jan-2023 19:30 35434201681 latest-truthy.nt.gz 20-Jan-2023 16:19 58740712185
sudo nohup wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2&
tail -f nohup.out
--2023-01-24 10:33:23-- https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.bz2
Resolving dumps.wikimedia.org (dumps.wikimedia.org)... 208.80.154.142, 2620:0:861:2:208:80:154:142
Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 155239026614 (145G) [application/octet-stream]
Saving to: ‘latest-all.nt.bz2’
0K .......... .......... .......... .......... .......... 0% 337K 5d4h
50K .......... .......... .......... .......... .......... 0% 245K 6d4h
100K .......... .......... .......... .......... .......... 0% 425K 5d11h
...
1000K .......... .......... .......... .......... .......... 0% 58.4M 47h38m
...
10000K .......... .......... .......... .......... .......... 0% 3.66M 11h20m
...
100000K .......... .......... .......... .......... .......... 0% 23.6M 12h37m
...
1000000K .......... .......... .......... .......... .......... 0% 3.40M 9h1m
...
10000000K .......... .......... .......... .......... .......... 6% 3.05M 8h39m
...
100000000K .......... .......... .......... .......... .......... 65% 87.7M 3h18m
...
130000000K .......... .......... .......... .......... .......... 85% 101M 82m47s
...
140000000K .......... .......... .......... .......... .......... 92% 9.05M 44m30s
...
150000000K .......... .......... .......... .......... .......... 98% 3.94M 6m8s
...
151600600K .......... . 100% 623K=9h41m
2023-01-24 20:14:37 (4.25 MB/s) - ‘latest-all.nt.bz2’ saved [155239026614/155239026614]
unzip
bunzip2 with nohup does not work properly!
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616002
nohup bunzip2 latest-all.nt.bz2 &
bunzip2: Control-C or similar caught, quitting.
bunzip2: Deleting output file latest-all.nt, if it exists.
nohup pv latest-all.nt.bz2 | bunzip2 > latest-all.nt&