Wikidata Import 2025-05-02
Import
Import | |
---|---|
state | ✅ |
url | https://wiki.bitplan.com/index.php/Wikidata_Import_2025-05-02 |
target | blazegraph |
start | 2025-05-02 |
end | 2025-06-05 |
days | 34 |
os | Ubuntu 22.04.3 LTS |
cpu | Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (16 cores) |
ram | 512 |
triples | 16836266114 |
comment | seeded with 1.2 TB data.jnl file provided by James Hare |
This "import" is not using a dump and indexing approach but directly copying a blazegraph journal file.
Steps
Copy journal file
Source https://scatter.red/ wikidata installation. Usimng aria2c with 16 connections the copy initially took some 5 hours but was interrrupted. Since aria2c was used in preallocation mode and the script final message was "download finished" the file looked complete which it was not.
git clone the priv-wd-query
git clone https://github.com/scatter-llc/private-wikidata-query
mkdir data
mv data.jnl private-wikidata-query/data
cd private-wikidata-query/data
# use proper uid and gid as per the containers preferences
chown 666:66 data.jnl
jh@wikidata:/hd/delta/blazegraph/private-wikidata-query/data$ ls -l
total 346081076
-rw-rw-r-- 1 666 66 1328514809856 May 2 22:07 data.jnl
start docker
docker compose up -d
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
[+] Running 3/3
✔ Container private-wikidata-query-wdqs-1 Started 0.4s
✔ Container private-wikidata-query-wdqs-proxy-1 Started 0.7s
✔ Container private-wikidata-query-wdqs-frontend-1 Started 1.1s
docker ps | grep wdqs
36dad88ebfdc wikibase/wdqs-frontend:wmde.11 "/entrypoint.sh ngin…" About an hour ago Up 3 minutes 0.0.0.0:8099->80/tcp, [::]:8099->80/tcp private-wikidata-query-wdqs-frontend-1
f0d273cca376 caddy "caddy run --config …" About an hour ago Up 3 minutes 80/tcp, 443/tcp, 2019/tcp, 443/udp private-wikidata-query-wdqs-proxy-1
d86124984e0f wikibase/wdqs:0.3.97-wmde.8 "/entrypoint.sh /run…" About an hour ago Up 3 minutes 0.0.0.0:9999->9999/tcp, [::]:9999->9999/tcp private-wikidata-query-wdqs-1
6011f5c1cc03 caddy "caddy run --config …" 12 months ago Up 3 days 80/tcp, 443/tcp, 2019/tcp, 443/udp wdqs-wdqs-proxy-1
Incompatible RWStore header version
docker logs private-wikidata-query-wdqs-1 2>&1 | grep -m 1 "Incompatible RWStore header version"
java.lang.RuntimeException: java.lang.IllegalStateException: Incompatible RWStore header version: storeVersion=0, cVersion=1024, demispace: true
docker exec -it private-wikidata-query-wdqs-1 /bin/bash
diff RWStore.properties RWStore.properties.bak-20250503
--- RWStore.properties
+++ RWStore.properties.bak-20250503
@@ -56,6 +56,3 @@
{"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LATITUDE"},\
{"valueType":"LONG","multiplier":"1","minValue":"0","serviceMapping":"COORD_SYSTEM"}\
]}}
-
-# Added to fix Incompatible RWStore header version error
-com.bigdata.rwstore.RWStore.readBlobsAsync=false
docker compose restart wdqs
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
[+] Restarting 1/1
✔ Container private-wikidata-query-wdqs-1 Started 11.1s
Download multiple copies in parallel
The script below was named get and used to copy two /hd/delta and /hd/beta in parallel totalling 32 connections.
#!/bin/bash
# Corrected aria2 download script (no prealloc, true progress)
ISO_DATE=$(date +%Y%m%d)
DISK=beta
SESSION="bg_${DISK}_${ISO_DATE}"
FILENAME="data.jnl"
DOWNLOAD_DIR="/hd/$DISK/blazegraph"
URL="https://datasets.orbopengraph.com/blazegraph/data.jnl"
CONNECTIONS=16
# Kill stale session
if screen -list | grep -q "$SESSION"; then
echo "⚠️ Killing existing session '$SESSION'..."
screen -S "$SESSION" -X quit
fi
# Launch corrected aria2 download
screen -dmS "$SESSION" -L bash -c "
echo '[INFO] Starting corrected download...';
cd \"$DOWNLOAD_DIR\";
aria2c -c -x $CONNECTIONS -s $CONNECTIONS \
--file-allocation=none \
--auto-file-renaming=false \
--dir=\"$DOWNLOAD_DIR\" \
--out=\"$FILENAME\" \
\"$URL\";
EXIT_CODE=\$?;
if [ \$EXIT_CODE -eq 0 ]; then
echo '[INFO] Download finished.';
md5sum \"$FILENAME\" > \"$FILENAME.md5\";
echo '[INFO] MD5 saved to $FILENAME.md5';
else
echo '[ERROR] Download failed with code \$EXIT_CODE';
fi
"
echo "✅ Fixed download started in screen '$SESSION'."
echo "Monitor with: screen -r $SESSION"
md5 check
on source:
md5sum e891800af42b979159191487910bd9ae data.jnl
check script
at 95% progress
tail -f screenlog.0
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------
*** Download Progress Summary as of Mon May 5 07:36:28 2025 ***
===============================================================================
[#152de7 1,241GiB/1,296GiB(95%) CN:16 DL:16MiB ETA:55m32s]
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------
[#152de7 1,242GiB/1,296GiB(95%) CN:16 DL:21MiB ETA:43m39s]
./check
Comparing:
file1 → /hd/beta/blazegraph/data.jnl
file2 → /hd/delta/blazegraph/data.jnl
blocksize=1MB start=0MB
== log-5 ==
[ 0] 1 MB ✅ MD5 match
[ 1] 5 MB ✅ MD5 match
[ 2] 25 MB ✅ MD5 match
[ 3] 125 MB ✅ MD5 match
[ 4] 625 MB ✅ MD5 match
[ 5] 3,125 MB ✅ MD5 match
[ 6] 15,625 MB ✅ MD5 match
[ 7] 78,125 MB ✅ MD5 match
[ 8] 390,625 MB ✅ MD5 match
Summary: {'✅': 9}
== log-2 ==
100%|██████████████████████████████████████████| 21/21 [00:00<00:00, 212.25it/s]
Summary: {'✅': 21}
== linear-2000 ==
100%|████████████████████████████████████████| 663/663 [00:05<00:00, 116.57it/s]
Summary: {'✅': 607, '⚠️': 56}
== linear-500 ==
100%|██████████████████████████████████████| 2651/2651 [00:21<00:00, 122.85it/s]
Summary: {'✅': 2435, '⚠️': 216}
at 100 %
tail -15 screenlog.0
[#152de7 1,296GiB/1,296GiB(99%) CN:7 DL:12MiB ETA:6s]
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------
[#152de7 1,296GiB/1,296GiB(99%) CN:1 DL:1.2MiB ETA:1s]
05/05 08:20:15 [NOTICE] Download complete: /hd/delta/blazegraph/data.jnl
Download Results:
gid |stat|avg speed |path/URI
======+====+===========+=======================================================
152de7|OK | 19MiB/s|/hd/delta/blazegraph/data.jnl
Status Legend:
(OK):download completed.
[INFO] Download finished.
tail -15 screenlog.0
[#50fb05 1,296GiB/1,296GiB(99%) CN:13 DL:19MiB ETA:6s]
FILE: /hd/beta/blazegraph/data.jnl
-------------------------------------------------------------------------------
[#50fb05 1,296GiB/1,296GiB(99%) CN:1 DL:1.5MiB]
05/05 08:15:19 [NOTICE] Download complete: /hd/beta/blazegraph/data.jnl
Download Results:
gid |stat|avg speed |path/URI
======+====+===========+=======================================================
50fb05|OK | 18MiB/s|/hd/beta/blazegraph/data.jnl
Status Legend:
(OK):download completed.
[INFO] Download finished.
fix aria issues using blockdownload
see https://github.com/WolfgangFahl/blockdownload
check the aria downloaded journal file hashes - heads only
dcheck \
--head-only \
--url https://datasets.orbopengraph.com/blazegraph/data.jnl \
/hd/beta/blazegraph/data.jnl \
/hd/delta/blazegraph/data.jnl
Processing /hd/beta/blazegraph/data.jnl... (file size: 1296.92 GB)
Block 2656/2656 1296.88-1296.92 GB: : 1.39TB [00:00, 4.88TB/s]
/hd/beta/blazegraph/data.jnl.yaml created with 2657 blocks (1328045.19 MB processed)
Processing /hd/delta/blazegraph/data.jnl... (file size: 1296.92 GB)
Block 2656/2656 1296.88-1296.92 GB: : 1.39TB [00:00, 4.77TB/s]
/hd/delta/blazegraph/data.jnl.yaml created with 2657 blocks (1328045.19 MB processed)
2657✅ 0❌ 0⚠️: : 1.39TB [00:00, 6.80TB/s] ]
Final: 2657✅ 0❌ 0⚠️
check with full md5 500 MB block checks
dcheck \
--url https://datasets.orbopengraph.com/blazegraph/data.jnl \
/hd/beta/blazegraph/data.jnl \
/hd/delta/blazegraph/data.jnl
Processing /hd/beta/blazegraph/data.jnl... (file size: 1296.92 GB)
Processing data.jnl: 62%|██████████████████████▏ | 857G/1.39T [26:34<16:24, 544MB/s]
grep block blazegraph.yaml | wc -l
2652
check parts
md5sum blazegraph-2656.part
493923964d5840438d9d06e560eaf15b blazegraph-2656.part
wf@wikidata:/hd/eneco/blazegraph/blazegraph-2025-05-06$ grep 493923964d5840438d9d06e560eaf15b blazegraph.yaml -B2
path: blazegraph-2656.part
offset: 1392508928000
md5: 493923964d5840438d9d06e560eaf15b
check missing blocks
grep block blazegraph.yaml | cut -f2 -d: | sort -un | awk 'NR==1{prev=$1; next} {for(i=prev+1;i<$1;i++) print i; prev=$1}'
2
4
7
30
131
209
642
reassemble
first try
blockdownload --yaml blazegraph.yaml --output data.jnl --progress --name wikidata https://datasets.orbopengraph.com/blazegraph/data.jnl .
Creating target: 100%|███████████████████▉| 1.39T/1.39T [1:34:08<00:11, 310MB/s]created data.jnl - 1324545.19 MB
md5: 2106c6ae22ff4425b35834ad7bb65b07
File reassembled successfully: data.jnl
Creating target: 100%|███████████████████▉| 1.39T/1.39T [1:34:09<00:14, 246MB/s]
wrong md5sum as expected with missing blocks
attempt 2025-05-23
cat bdy
# reassemble
DATE="2025-05-23"
blockdownload https://datasets.orbopengraph.com/blazegraph/data.jnl \
blazegraph-$DATE \
--name blazegraph \
--progress \
--yaml blazegraph-$DATE/data.jnl.yaml \
--output /hd/mantax/blazegraph/data.jnl
./bdy
Creating target: 100%|██████████| 1.39T/1.39T [1:00:24<00:00, 384MB/s]
created /hd/mantax/blazegraph/data.jnl - 1328045.19 MB
md5: 1f6a822d9015aa3356714d304eeb8471
File reassembled successfully: /hd/mantax/blazegraph/data.jnl
wf@wikidata:/hd/eneco/blazegraph/blazegraph-2025-05-23$ head -1 md5sums.txt
1f6a822d9015aa3356714d304eeb8471 data.jnl
ls -l /hd/mantax/blazegraph/data.jnl
-rw-rw-r-- 1 wf wf 1392556310528 May 24 13:39 data.jnl
md5sum /hd/mantax/blazegraph/data.jnl
1f6a822d9015aa3356714d304eeb8471 data.jnl
follow-up 2025-05-31
All the following was done in tmux session 0 under the `jh` user.
With MD5 checksum verified, launch with Docker Compose and launch a bash shell for the `wdqs` container (the one running Blazegraph).
docker compose up --build -d
docker compose exec wdqs bash
Inside the container, run this command to catch the database up to the present day:
while true; do /wdqs/runUpdate.sh; sleep 10; done
This creates an infinite loop that will continuously re-start the update script if it crashes for any reason. In most cases, script crashes can be overcome by simply restarting the script. The `sleep 10` is so you are not spammed with restart attempts.
As of 05:05, 31 May 2025 (CEST) the data.jnl on /hd/delta is updated through the end of 2025-05-15. Wait a few days for it to catch up to present.
Next steps:
- Verify frontend can connect to query service
- Modify the retry loop one-liner to include logging of error events
- Formalize the one-liner into a script or Docker Compose configuration
followup 2025-06-03
- created apache config wdqs.conf with proxy to 8099
- started frontend
root@wikidata:/hd/delta/blazegraph/private-wikidata-query#docker compose ps
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
private-wikidata-query-wdqs-proxy-1 caddy "caddy run --config …" wdqs-proxy 4 weeks ago Up 11 hours 80/tcp, 443/tcp, 2019/tcp, 443/udp
root@wikidata:/hd/delta/blazegraph/private-wikidata-query# docker compose up -d
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
[+] Running 3/3
✔ Container private-wikidata-query-wdqs-1 Started 0.3s
✔ Container private-wikidata-query-wdqs-proxy-1 Running 0.0s
✔ Container private-wikidata-query-wdqs-frontend-1 Started 0.7s
root@wikidata:/hd/delta/blazegraph/private-wikidata-query# curl http://localhost:8099
<!DOCTYPE html><html lang="en" dir="ltr"><head><meta charset="utf-8"
followup 2025-06-04
see also https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
docker exec -it private-wikidata-query-wdqs-1 /bin/bash
while true; do /wdqs/runUpdate.sh; sleep 10; done&
15:33:59.522 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:22Z (next: 20250601211022|2427659504) at (10.6, 6.5, 2.9) updates per second and (3412.9, 2094.7, 941.5) milliseconds per second 15:33:59.770 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 81 changes, from Q134675408@2355787733@20250601211022|2427659504 to Q127768397@2355787831@20250601211035|2427659605 15:34:08.856 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:35Z (next: 20250601211035|2427659606) at (10.0, 6.5, 3.0) updates per second and (3095.0, 2071.5, 946.5) milliseconds per second 15:34:09.096 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 76 changes, from Q134675444@2355787832@20250601211035|2427659606 to Q134642638@2355787935@20250601211052|2427659706 15:34:13.024 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:52Z (next: 20250601211052|2427659707) at (10.3, 6.6, 3.0) updates per second and (3055.4, 2080.3, 955.7) milliseconds per second 15:34:13.268 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 78 changes, from Q134675474@2355787931@20250601211052|2427659707 to Q95527928@2355788033@20250601211106|2427659806
# check updated state of copy of wikidata
#
# see https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
# and https://wiki.bitplan.com/index.php/Wikidata_Import_2025-05-02#followup_2025-06-04
PREFIX schema: <http://schema.org/>
# This query returns:
# - the total number of triples in the dataset
# - the dateModified of the <http://www.wikidata.org> entity, if available
SELECT * WHERE {
{
# Subquery: count all triples in the dataset
SELECT (COUNT(*) AS ?count) {
?s ?p ?o
}
}
UNION
{
# Subquery: get the schema:dateModified for the Wikidata root URI
SELECT * WHERE {
<http://www.wikidata.org> schema:dateModified ?y
}
}
}
runUpdateLoop.sh
#!/bin/bash
# Run WDQS updater in a controlled loop with logging and signal handling.
# WF 2025-06-04
set -euo pipefail
LOGFILE="/var/log/wdqs/update-loop.log"
INTERVAL=10
trap 'echo "[$(date -Iseconds)] Caught termination signal. Exiting." >> "$LOGFILE"; exit 0' SIGINT SIGTERM
echo "[$(date -Iseconds)] Starting WDQS update loop..." >> "$LOGFILE"
while true; do
echo "[$(date -Iseconds)] Running /wdqs/runUpdate.sh ..." >> "$LOGFILE"
/wdqs/runUpdate.sh -n wdq >> "$LOGFILE" 2>&1
echo "[$(date -Iseconds)] Sleeping $INTERVAL seconds..." >> "$LOGFILE"
sleep $INTERVAL
done
bash-4.4# crontab -l
# do daily/weekly/monthly maintenance
# min hour day month weekday command
*/15 * * * * run-parts /etc/periodic/15min
0 * * * * run-parts /etc/periodic/hourly
0 2 * * * run-parts /etc/periodic/daily
0 3 * * 6 run-parts /etc/periodic/weekly
0 5 1 * * run-parts /etc/periodic/monthly
bash-4.4# chmod +x runUpdateLoop.sh
bash-4.4# ./runUpdateLoop.sh
bash-4.4# tail /var/log/wdqs/update-loop.log
Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
16:08:56.668 [main] INFO org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.97 (89fdd891d231a500bd5999b08ea4b5a59dc0499d)
16:08:57.494 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking where we left off
16:08:57.495 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
16:08:57.780 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
16:08:57.781 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2025-06-01T22:47:39Z
16:08:58.343 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q134682208@2355812225@20250601224755|2427686359 to Q24203239@2355812274@20250601224810|2427686410
16:09:04.065 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:10Z (next: 20250601224811|2427686409) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
16:09:04.352 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q1534239@2355812281@20250601224813|2427686417 to Q134682334@2355812375@20250601224844|2427686526
bash-4.4# tail -f /var/log/wdqs/update-loop.log
16:08:56.668 [main] INFO org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.97 (89fdd891d231a500bd5999b08ea4b5a59dc0499d)
16:08:57.494 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Checking where we left off
16:08:57.495 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
16:08:57.780 [main] INFO o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
16:08:57.781 [main] INFO o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2025-06-01T22:47:39Z
16:08:58.343 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q134682208@2355812225@20250601224755|2427686359 to Q24203239@2355812274@20250601224810|2427686410
16:09:04.065 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:10Z (next: 20250601224811|2427686409) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
16:09:04.352 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q1534239@2355812281@20250601224813|2427686417 to Q134682334@2355812375@20250601224844|2427686526
16:09:11.273 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:44Z (next: 20250601224845|2427686525) at (1.0, 0.2, 0.1) updates per second and (495.7, 102.5, 34.3) milliseconds per second
16:09:11.538 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 72 changes, from Q4795166@2355812384@20250601224847|2427686534 to Q134682364@2355812474@20250601224913|2427686632
16:09:18.719 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:49:14Z (next: 20250601224914|2427686631) at (1.9, 0.4, 0.1) updates per second and (919.8, 209.7, 71.4) milliseconds per second
16:09:18.987 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 75 changes, from Q104196941@2355812537@20250601224932|2427686697 to Q134682396@2355812574@20250601224945|2427686738
followup 2025-06-05
tail -f /var/log/wdqs/update-loop.log Caused by: java.net.ConnectException: Network unreachable (connect failed) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:338) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134) ... 21 more
Server needed reboot anyway ...
ls -l
total 1359918440
-rw-rw-r-- 1 666 66 1392556310528 Jun 5 06:23 data.jnl
root@wikidata:/hd/delta/blazegraph/private-wikidata-query/data# pv data.jnl > /hd/eneco/wikidata/2025-06-05/data.jnl
28.6GiB 0:00:29 [1.00GiB/s] [> ] 2% ETA 0:21:28
1.27TiB 0:27:10 [ 814MiB/s] [================================>] 100%
restart docker containers
docker compose down
docker compose up -d
[+] Running 4/4
✔ Network private-wikidata-query_default Created 0.1s
✔ Container private-wikidata-query-wdqs-1 Started 0.5s
✔ Container private-wikidata-query-wdqs-proxy-1 Started 0.7s
✔ Container private-wikidata-query-wdqs-frontend-1 Started 1.1s
improved runUpdateLoop.sh
#!/bin/bash
# Run WDQS updater in a controlled loop with logging and signal handling.
# WF 2025-06-04
set -euo pipefail
# Default values
LOGFILE="/var/log/wdqs/update-loop.log"
INTERVAL=10
# Function to show usage
show_help() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " -d, --daemon Detach and run in background via nohup"
echo " -l, --loop Run update loop in foreground"
echo " -h, --help Show this help message"
exit 0
}
# Function to handle daemon mode detachment
run_daemon_mode() {
if [[ "${DETACHED:-}" != "true" ]]; then
echo "Detaching process via nohup..."
export DETACHED=true
nohup "$0" --loop > "$LOGFILE" 2>&1 &
echo "Process detached. PID: $!"
exit 0
fi
run_main_loop
}
# Function to run the main update loop
run_main_loop() {
trap 'echo "[$(date -Iseconds)] Caught termination signal. Exiting." >> "$LOGFILE"; exit 0' SIGINT SIGTERM
echo "[$(date -Iseconds)] Starting WDQS update loop..." >> "$LOGFILE"
while true; do
echo "[$(date -Iseconds)] Running /wdqs/runUpdate.sh ..." >> "$LOGFILE"
/wdqs/runUpdate.sh -n wdq >> "$LOGFILE" 2>&1
echo "[$(date -Iseconds)] Sleeping $INTERVAL seconds..." >> "$LOGFILE"
sleep $INTERVAL
done
}
# Parse command line options
while [[ $# -gt 0 ]]; do
case $1 in
-d|--daemon)
run_daemon_mode
exit 0
;;
-l|--loop)
run_main_loop
exit 0
;;
-h|--help)
show_help
;;
*)
echo "Unknown option: $1" >&2
show_help
;;
esac
done
# No options provided
show_help
update catching up state
07:53:13.105 [main] INFO o.w.q.r.t.change.RecentChangesPoller - Got 77 changes, from Q14666496@2356688709@20250604022159|2428616084 to Q14666547@2356688807@20250604022228|2428616186 07:53:19.646 [main] INFO org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-04T02:22:28Z (next: 20250604022228|2428616187) at (8.8, 3.7, 1.6) updates per second and (4843.1, 5320.4, 5494.0) milliseconds per second