Wikidata Import 2025-05-02

From BITPlan Wiki
Jump to navigation Jump to search

Import

Import
edit
state  ✅
url  https://wiki.bitplan.com/index.php/Wikidata_Import_2025-05-02
target  blazegraph
start  2025-05-02
end  2025-06-05
days  34
os  Ubuntu 22.04.3 LTS
cpu  Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (16 cores)
ram  512
triples  16836266114
comment  seeded with 1.2 TB data.jnl file provided by James Hare


This "import" is not using a dump and indexing approach but directly copying a blazegraph journal file.

Steps

Copy journal file

Source https://scatter.red/ wikidata installation. Usimng aria2c with 16 connections the copy initially took some 5 hours but was interrrupted. Since aria2c was used in preallocation mode and the script final message was "download finished" the file looked complete which it was not.

git clone the priv-wd-query

git clone https://github.com/scatter-llc/private-wikidata-query
mkdir data
mv data.jnl private-wikidata-query/data
cd private-wikidata-query/data
# use proper uid and gid as per the containers preferences
chown 666:66 data.jnl
jh@wikidata:/hd/delta/blazegraph/private-wikidata-query/data$ ls -l
total 346081076
-rw-rw-r-- 1 666 66 1328514809856 May  2 22:07 data.jnl

start docker

docker compose up -d
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
[+] Running 3/3
 ✔ Container private-wikidata-query-wdqs-1           Started               0.4s 
 ✔ Container private-wikidata-query-wdqs-proxy-1     Started               0.7s 
 ✔ Container private-wikidata-query-wdqs-frontend-1  Started               1.1s
docker ps | grep wdqs
36dad88ebfdc   wikibase/wdqs-frontend:wmde.11      "/entrypoint.sh ngin…"   About an hour ago   Up 3 minutes                    0.0.0.0:8099->80/tcp, [::]:8099->80/tcp                           private-wikidata-query-wdqs-frontend-1
f0d273cca376   caddy                               "caddy run --config …"   About an hour ago   Up 3 minutes                    80/tcp, 443/tcp, 2019/tcp, 443/udp                                private-wikidata-query-wdqs-proxy-1
d86124984e0f   wikibase/wdqs:0.3.97-wmde.8         "/entrypoint.sh /run…"   About an hour ago   Up 3 minutes                    0.0.0.0:9999->9999/tcp, [::]:9999->9999/tcp                       private-wikidata-query-wdqs-1
6011f5c1cc03   caddy                               "caddy run --config …"   12 months ago       Up 3 days                       80/tcp, 443/tcp, 2019/tcp, 443/udp                                wdqs-wdqs-proxy-1

Incompatible RWStore header version

see https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata/src/java/com/bigdata/rwstore/RWStore.java

docker logs private-wikidata-query-wdqs-1 2>&1 | grep -m 1 "Incompatible RWStore header version"
java.lang.RuntimeException: java.lang.IllegalStateException: Incompatible RWStore header version: storeVersion=0, cVersion=1024, demispace: true
docker exec -it private-wikidata-query-wdqs-1 /bin/bash
diff RWStore.properties RWStore.properties.bak-20250503 
--- RWStore.properties
+++ RWStore.properties.bak-20250503
@@ -56,6 +56,3 @@
    {"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LATITUDE"},\
    {"valueType":"LONG","multiplier":"1","minValue":"0","serviceMapping":"COORD_SYSTEM"}\
   ]}}
-
-# Added to fix Incompatible RWStore header version error
-com.bigdata.rwstore.RWStore.readBlobsAsync=false
docker compose restart wdqs
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
[+] Restarting 1/1
 ✔ Container private-wikidata-query-wdqs-1  Started                      11.1s

Download multiple copies in parallel

The script below was named get and used to copy two /hd/delta and /hd/beta in parallel totalling 32 connections.

#!/bin/bash
# Corrected aria2 download script (no prealloc, true progress)

ISO_DATE=$(date +%Y%m%d)
DISK=beta
SESSION="bg_${DISK}_${ISO_DATE}"
FILENAME="data.jnl"
DOWNLOAD_DIR="/hd/$DISK/blazegraph"
URL="https://datasets.orbopengraph.com/blazegraph/data.jnl"
CONNECTIONS=16

# Kill stale session
if screen -list | grep -q "$SESSION"; then
  echo "⚠️ Killing existing session '$SESSION'..."
  screen -S "$SESSION" -X quit
fi

# Launch corrected aria2 download
screen -dmS "$SESSION" -L bash -c "
  echo '[INFO] Starting corrected download...';
  cd \"$DOWNLOAD_DIR\";
  aria2c -c -x $CONNECTIONS -s $CONNECTIONS \
    --file-allocation=none \
    --auto-file-renaming=false \
    --dir=\"$DOWNLOAD_DIR\" \
    --out=\"$FILENAME\" \
    \"$URL\";
  EXIT_CODE=\$?;
  if [ \$EXIT_CODE -eq 0 ]; then
    echo '[INFO] Download finished.';
    md5sum \"$FILENAME\" > \"$FILENAME.md5\";
    echo '[INFO] MD5 saved to $FILENAME.md5';
  else
    echo '[ERROR] Download failed with code \$EXIT_CODE';
  fi
"

echo "✅ Fixed download started in screen '$SESSION'."
echo "Monitor with: screen -r $SESSION"

md5 check

on source:

 md5sum e891800af42b979159191487910bd9ae  data.jnl 

check script

at 95% progress

tail -f screenlog.0
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------

 *** Download Progress Summary as of Mon May  5 07:36:28 2025 ***              
===============================================================================
[#152de7 1,241GiB/1,296GiB(95%) CN:16 DL:16MiB ETA:55m32s]
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------

[#152de7 1,242GiB/1,296GiB(95%) CN:16 DL:21MiB ETA:43m39s]
./check 
Comparing:
  file1 → /hd/beta/blazegraph/data.jnl
  file2 → /hd/delta/blazegraph/data.jnl
  blocksize=1MB start=0MB

== log-5 ==
[  0]       1 MB  ✅  MD5 match
[  1]       5 MB  ✅  MD5 match
[  2]      25 MB  ✅  MD5 match
[  3]     125 MB  ✅  MD5 match
[  4]     625 MB  ✅  MD5 match
[  5]   3,125 MB  ✅  MD5 match
[  6]  15,625 MB  ✅  MD5 match
[  7]  78,125 MB  ✅  MD5 match
[  8] 390,625 MB  ✅  MD5 match

Summary: {'✅': 9}

== log-2 ==
100%|██████████████████████████████████████████| 21/21 [00:00<00:00, 212.25it/s]

Summary: {'✅': 21}

== linear-2000 ==
100%|████████████████████████████████████████| 663/663 [00:05<00:00, 116.57it/s]

Summary: {'✅': 607, '⚠️': 56}

== linear-500 ==
100%|██████████████████████████████████████| 2651/2651 [00:21<00:00, 122.85it/s]

Summary: {'✅': 2435, '⚠️': 216}

at 100 %

 tail  -15 screenlog.0 
[#152de7 1,296GiB/1,296GiB(99%) CN:7 DL:12MiB ETA:6s]
FILE: /hd/delta/blazegraph/data.jnl
-------------------------------------------------------------------------------

[#152de7 1,296GiB/1,296GiB(99%) CN:1 DL:1.2MiB ETA:1s]                         
05/05 08:20:15 [NOTICE] Download complete: /hd/delta/blazegraph/data.jnl

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
152de7|OK  |    19MiB/s|/hd/delta/blazegraph/data.jnl

Status Legend:
(OK):download completed.
[INFO] Download finished.
tail  -15 screenlog.0 
[#50fb05 1,296GiB/1,296GiB(99%) CN:13 DL:19MiB ETA:6s]
FILE: /hd/beta/blazegraph/data.jnl
-------------------------------------------------------------------------------

[#50fb05 1,296GiB/1,296GiB(99%) CN:1 DL:1.5MiB]                                
05/05 08:15:19 [NOTICE] Download complete: /hd/beta/blazegraph/data.jnl

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
50fb05|OK  |    18MiB/s|/hd/beta/blazegraph/data.jnl

Status Legend:
(OK):download completed.
[INFO] Download finished.

fix aria issues using blockdownload

see https://github.com/WolfgangFahl/blockdownload

check the aria downloaded journal file hashes - heads only

dcheck \
  --head-only \
  --url https://datasets.orbopengraph.com/blazegraph/data.jnl \
  /hd/beta/blazegraph/data.jnl \
  /hd/delta/blazegraph/data.jnl
Processing /hd/beta/blazegraph/data.jnl... (file size: 1296.92 GB)
Block 2656/2656 1296.88-1296.92 GB: : 1.39TB [00:00, 4.88TB/s]                                   
/hd/beta/blazegraph/data.jnl.yaml created with 2657 blocks (1328045.19 MB processed)
Processing /hd/delta/blazegraph/data.jnl... (file size: 1296.92 GB)
Block 2656/2656 1296.88-1296.92 GB: : 1.39TB [00:00, 4.77TB/s]                                   
/hd/delta/blazegraph/data.jnl.yaml created with 2657 blocks (1328045.19 MB processed)
265700⚠️: : 1.39TB [00:00, 6.80TB/s]                                                       ]

Final: 265700⚠️

check with full md5 500 MB block checks

dcheck \
  --url https://datasets.orbopengraph.com/blazegraph/data.jnl \
  /hd/beta/blazegraph/data.jnl \
  /hd/delta/blazegraph/data.jnl
Processing /hd/beta/blazegraph/data.jnl... (file size: 1296.92 GB)
Processing data.jnl:  62%|██████████████████████▏             | 857G/1.39T [26:34<16:24, 544MB/s]
grep block blazegraph.yaml  | wc -l
2652

check parts

md5sum blazegraph-2656.part 
493923964d5840438d9d06e560eaf15b  blazegraph-2656.part
wf@wikidata:/hd/eneco/blazegraph/blazegraph-2025-05-06$ grep 493923964d5840438d9d06e560eaf15b blazegraph.yaml -B2
  path: blazegraph-2656.part
  offset: 1392508928000
  md5: 493923964d5840438d9d06e560eaf15b

check missing blocks

grep block blazegraph.yaml | cut -f2 -d: | sort -un | awk 'NR==1{prev=$1; next} {for(i=prev+1;i<$1;i++) print i; prev=$1}'
2
4
7
30
131
209
642

reassemble

first try

blockdownload --yaml blazegraph.yaml --output data.jnl --progress --name wikidata https://datasets.orbopengraph.com/blazegraph/data.jnl .
Creating target: 100%|███████████████████▉| 1.39T/1.39T [1:34:08<00:11, 310MB/s]created data.jnl - 1324545.19 MB
md5: 2106c6ae22ff4425b35834ad7bb65b07
File reassembled successfully: data.jnl
Creating target: 100%|███████████████████▉| 1.39T/1.39T [1:34:09<00:14, 246MB/s]

wrong md5sum as expected with missing blocks

attempt 2025-05-23

cat bdy
# reassemble 
DATE="2025-05-23"

blockdownload https://datasets.orbopengraph.com/blazegraph/data.jnl \
  blazegraph-$DATE \
  --name blazegraph \
  --progress \
  --yaml blazegraph-$DATE/data.jnl.yaml \
  --output /hd/mantax/blazegraph/data.jnl
./bdy
Creating target: 100%|██████████| 1.39T/1.39T [1:00:24<00:00, 384MB/s]
created /hd/mantax/blazegraph/data.jnl - 1328045.19 MB
md5: 1f6a822d9015aa3356714d304eeb8471
File reassembled successfully: /hd/mantax/blazegraph/data.jnl
wf@wikidata:/hd/eneco/blazegraph/blazegraph-2025-05-23$ head -1 md5sums.txt 
1f6a822d9015aa3356714d304eeb8471  data.jnl
ls -l /hd/mantax/blazegraph/data.jnl 
-rw-rw-r-- 1 wf wf 1392556310528 May 24 13:39 data.jnl
md5sum /hd/mantax/blazegraph/data.jnl 
1f6a822d9015aa3356714d304eeb8471 data.jnl

follow-up 2025-05-31

All the following was done in tmux session 0 under the `jh` user.

With MD5 checksum verified, launch with Docker Compose and launch a bash shell for the `wdqs` container (the one running Blazegraph).

docker compose up --build -d

docker compose exec wdqs bash

Inside the container, run this command to catch the database up to the present day:

while true; do /wdqs/runUpdate.sh; sleep 10; done

This creates an infinite loop that will continuously re-start the update script if it crashes for any reason. In most cases, script crashes can be overcome by simply restarting the script. The `sleep 10` is so you are not spammed with restart attempts.

As of 05:05, 31 May 2025 (CEST) the data.jnl on /hd/delta is updated through the end of 2025-05-15. Wait a few days for it to catch up to present.

Next steps:

  1. Verify frontend can connect to query service
  2. Modify the retry loop one-liner to include logging of error events
  3. Formalize the one-liner into a script or Docker Compose configuration

followup 2025-06-03

  • created apache config wdqs.conf with proxy to 8099
  • started frontend
root@wikidata:/hd/delta/blazegraph/private-wikidata-query#docker compose ps
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
NAME                                  IMAGE     COMMAND                  SERVICE      CREATED       STATUS        PORTS
private-wikidata-query-wdqs-proxy-1   caddy     "caddy run --config …"   wdqs-proxy   4 weeks ago   Up 11 hours   80/tcp, 443/tcp, 2019/tcp, 443/udp
root@wikidata:/hd/delta/blazegraph/private-wikidata-query# docker compose up -d
WARN[0000] /hd/delta/blazegraph/private-wikidata-query/docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion 
[+] Running 3/3
 ✔ Container private-wikidata-query-wdqs-1           Started                               0.3s 
 ✔ Container private-wikidata-query-wdqs-proxy-1     Running                               0.0s 
 ✔ Container private-wikidata-query-wdqs-frontend-1  Started                               0.7s 
root@wikidata:/hd/delta/blazegraph/private-wikidata-query# curl http://localhost:8099
<!DOCTYPE html><html lang="en" dir="ltr"><head><meta charset="utf-8"

followup 2025-06-04

see also https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours

docker exec -it private-wikidata-query-wdqs-1 /bin/bash
while true; do /wdqs/runUpdate.sh; sleep 10; done&
15:33:59.522 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:22Z (next: 20250601211022|2427659504) at (10.6, 6.5, 2.9) updates per second and (3412.9, 2094.7, 941.5) milliseconds per second
15:33:59.770 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 81 changes, from Q134675408@2355787733@20250601211022|2427659504 to Q127768397@2355787831@20250601211035|2427659605
15:34:08.856 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:35Z (next: 20250601211035|2427659606) at (10.0, 6.5, 3.0) updates per second and (3095.0, 2071.5, 946.5) milliseconds per second
15:34:09.096 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 76 changes, from Q134675444@2355787832@20250601211035|2427659606 to Q134642638@2355787935@20250601211052|2427659706
15:34:13.024 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T21:10:52Z (next: 20250601211052|2427659707) at (10.3, 6.6, 3.0) updates per second and (3055.4, 2080.3, 955.7) milliseconds per second
15:34:13.268 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 78 changes, from Q134675474@2355787931@20250601211052|2427659707 to Q95527928@2355788033@20250601211106|2427659806

update check query

# check updated state of copy of wikidata
#
# see https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
# and https://wiki.bitplan.com/index.php/Wikidata_Import_2025-05-02#followup_2025-06-04
PREFIX schema: <http://schema.org/>

# This query returns:
# - the total number of triples in the dataset
# - the dateModified of the <http://www.wikidata.org> entity, if available
SELECT * WHERE {
  {
    # Subquery: count all triples in the dataset
    SELECT (COUNT(*) AS ?count) {
      ?s ?p ?o
    }
  }
  UNION
  {
    # Subquery: get the schema:dateModified for the Wikidata root URI
    SELECT * WHERE {
      <http://www.wikidata.org> schema:dateModified ?y
    }
  }
}
runUpdateLoop.sh
 
#!/bin/bash
# Run WDQS updater in a controlled loop with logging and signal handling.
# WF 2025-06-04

set -euo pipefail

LOGFILE="/var/log/wdqs/update-loop.log"
INTERVAL=10

trap 'echo "[$(date -Iseconds)] Caught termination signal. Exiting." >> "$LOGFILE"; exit 0' SIGINT SIGTERM

echo "[$(date -Iseconds)] Starting WDQS update loop..." >> "$LOGFILE"

while true; do
  echo "[$(date -Iseconds)] Running /wdqs/runUpdate.sh ..." >> "$LOGFILE"
  /wdqs/runUpdate.sh -n wdq >> "$LOGFILE" 2>&1
  echo "[$(date -Iseconds)] Sleeping $INTERVAL seconds..." >> "$LOGFILE"
  sleep $INTERVAL
done
 
bash-4.4# crontab -l
# do daily/weekly/monthly maintenance
# min	hour	day	month	weekday	command
*/15	*	*	*	*	run-parts /etc/periodic/15min
0	*	*	*	*	run-parts /etc/periodic/hourly
0	2	*	*	*	run-parts /etc/periodic/daily
0	3	*	*	6	run-parts /etc/periodic/weekly
0	5	1	*	*	run-parts /etc/periodic/monthly
 
bash-4.4# chmod +x runUpdateLoop.sh 
bash-4.4# ./runUpdateLoop.sh 
bash-4.4# tail /var/log/wdqs/update-loop.log 
Updating via http://localhost:9999/bigdata/namespace/wdq/sparql
#logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
16:08:56.668 [main] INFO  org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.97 (89fdd891d231a500bd5999b08ea4b5a59dc0499d)
16:08:57.494 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Checking where we left off
16:08:57.495 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
16:08:57.780 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
16:08:57.781 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2025-06-01T22:47:39Z
16:08:58.343 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q134682208@2355812225@20250601224755|2427686359 to Q24203239@2355812274@20250601224810|2427686410
16:09:04.065 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:10Z (next: 20250601224811|2427686409) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
16:09:04.352 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q1534239@2355812281@20250601224813|2427686417 to Q134682334@2355812375@20250601224844|2427686526
 
bash-4.4# tail -f /var/log/wdqs/update-loop.log 
16:08:56.668 [main] INFO  org.wikidata.query.rdf.tool.Update - Starting Updater 0.3.97 (89fdd891d231a500bd5999b08ea4b5a59dc0499d)
16:08:57.494 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Checking where we left off
16:08:57.495 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Checking for left off time from the updater
16:08:57.780 [main] INFO  o.w.query.rdf.tool.rdf.RdfRepository - Found left off time from the updater
16:08:57.781 [main] INFO  o.w.q.r.t.change.ChangeSourceContext - Found start time in the RDF store: 2025-06-01T22:47:39Z
16:08:58.343 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q134682208@2355812225@20250601224755|2427686359 to Q24203239@2355812274@20250601224810|2427686410
16:09:04.065 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:10Z (next: 20250601224811|2427686409) at (0.0, 0.0, 0.0) updates per second and (0.0, 0.0, 0.0) milliseconds per second
16:09:04.352 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 80 changes, from Q1534239@2355812281@20250601224813|2427686417 to Q134682334@2355812375@20250601224844|2427686526
16:09:11.273 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:48:44Z (next: 20250601224845|2427686525) at (1.0, 0.2, 0.1) updates per second and (495.7, 102.5, 34.3) milliseconds per second
16:09:11.538 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 72 changes, from Q4795166@2355812384@20250601224847|2427686534 to Q134682364@2355812474@20250601224913|2427686632
16:09:18.719 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-01T22:49:14Z (next: 20250601224914|2427686631) at (1.9, 0.4, 0.1) updates per second and (919.8, 209.7, 71.4) milliseconds per second
16:09:18.987 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 75 changes, from Q104196941@2355812537@20250601224932|2427686697 to Q134682396@2355812574@20250601224945|2427686738

followup 2025-06-05

tail -f /var/log/wdqs/update-loop.log 
Caused by: java.net.ConnectException: Network unreachable (connect failed)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:338)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
	... 21 more

Server needed reboot anyway ...

ls -l
total 1359918440
-rw-rw-r-- 1 666 66 1392556310528 Jun  5 06:23 data.jnl
root@wikidata:/hd/delta/blazegraph/private-wikidata-query/data#  pv data.jnl > /hd/eneco/wikidata/2025-06-05/data.jnl
28.6GiB 0:00:29 [1.00GiB/s] [>                                 ]  2% ETA 0:21:28
1.27TiB 0:27:10 [ 814MiB/s] [================================>] 100%
restart docker containers
docker compose down
docker compose up -d
[+] Running 4/4
 ✔ Network private-wikidata-query_default            Created               0.1s 
 ✔ Container private-wikidata-query-wdqs-1           Started               0.5s 
 ✔ Container private-wikidata-query-wdqs-proxy-1     Started               0.7s 
 ✔ Container private-wikidata-query-wdqs-frontend-1  Started               1.1s
improved runUpdateLoop.sh
#!/bin/bash
# Run WDQS updater in a controlled loop with logging and signal handling.
# WF 2025-06-04

set -euo pipefail

# Default values
LOGFILE="/var/log/wdqs/update-loop.log"
INTERVAL=10

# Function to show usage
show_help() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  -d, --daemon    Detach and run in background via nohup"
    echo "  -l, --loop      Run update loop in foreground"
    echo "  -h, --help      Show this help message"
    exit 0
}

# Function to handle daemon mode detachment
run_daemon_mode() {
    if [[ "${DETACHED:-}" != "true" ]]; then
        echo "Detaching process via nohup..."
        export DETACHED=true
        nohup "$0" --loop > "$LOGFILE" 2>&1 &
        echo "Process detached. PID: $!"
        exit 0
    fi
    run_main_loop
}

# Function to run the main update loop
run_main_loop() {
    trap 'echo "[$(date -Iseconds)] Caught termination signal. Exiting." >> "$LOGFILE"; exit 0' SIGINT SIGTERM
    echo "[$(date -Iseconds)] Starting WDQS update loop..." >> "$LOGFILE"
    
    while true; do
        echo "[$(date -Iseconds)] Running /wdqs/runUpdate.sh ..." >> "$LOGFILE"
        /wdqs/runUpdate.sh -n wdq >> "$LOGFILE" 2>&1
        echo "[$(date -Iseconds)] Sleeping $INTERVAL seconds..." >> "$LOGFILE"
        sleep $INTERVAL
    done
}

# Parse command line options
while [[ $# -gt 0 ]]; do
    case $1 in
        -d|--daemon)
            run_daemon_mode
            exit 0
            ;;
        -l|--loop)
            run_main_loop
            exit 0
            ;;
        -h|--help)
            show_help
            ;;
        *)
            echo "Unknown option: $1" >&2
            show_help
            ;;
    esac
done

# No options provided
show_help
update catching up state
07:53:13.105 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 77 changes, from Q14666496@2356688709@20250604022159|2428616084 to Q14666547@2356688807@20250604022228|2428616186
07:53:19.646 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2025-06-04T02:22:28Z (next: 20250604022228|2428616187) at (8.8, 3.7, 1.6) updates per second and (4843.1, 5320.4, 5494.0) milliseconds per second