Server Admin Log

2025-10-20

05:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84071 and previous config saved to /var/cache/conftool/dbconfig/20251020-052438-root.json
05:20 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1027 from dbctl T407595', diff saved to https://phabricator.wikimedia.org/P84070 and previous config saved to /var/cache/conftool/dbconfig/20251020-052057-marostegui.json
05:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84069 and previous config saved to /var/cache/conftool/dbconfig/20251020-051712-marostegui.json
05:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
05:05 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
05:04 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2248.codfw.wmnet onto db2245.codfw.wmnet
05:04 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
05:03 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
05:03 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 52s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-19

01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-18

08:45 brett@dns1004: END - running authdns-update
08:44 brett@dns1004: START - running authdns-update
08:25 brett@dns1004: END - running authdns-update
08:23 brett@dns1004: START - running authdns-update

2025-10-17

21:49 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
21:48 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:45 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
21:44 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
21:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
21:42 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:29 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
21:29 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:26 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
21:26 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
21:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
21:21 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
21:20 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
20:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
20:44 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
20:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
20:37 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
20:18 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
20:17 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
20:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
19:50 ejegg: donorwiki upgraded from 70a7050f to 039e5a15
19:50 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
19:49 ejegg: payments-wiki upgraded from 70a7050f to 039e5a15
19:11 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
19:11 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
18:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
18:45 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
17:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:08 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:09 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
16:01 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
15:33 Dreamy_Jazz: Ran `mwscript-k8s --comment='First emails to users to get them to confirm their email address for T58074' extensions/WikimediaMaintenance/sendVerifyEmailReminderNotification.php --wiki=metawiki 20250917000000`
13:09 vgutierrez: updating ca-certificates package on bookworm puppetservers
13:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84067 and previous config saved to /var/cache/conftool/dbconfig/20251017-130106-root.json
12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
12:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84066 and previous config saved to /var/cache/conftool/dbconfig/20251017-124600-root.json
12:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84064 and previous config saved to /var/cache/conftool/dbconfig/20251017-123054-root.json
12:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84063 and previous config saved to /var/cache/conftool/dbconfig/20251017-121548-root.json
12:07 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1195 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84062 and previous config saved to /var/cache/conftool/dbconfig/20251017-120737-marostegui.json
12:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2248.codfw.wmnet onto db2246.codfw.wmnet
11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
11:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
10:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:35 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:35 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:08 eileen: civicrm upgraded from ab1d21dc to 7b70cb83
10:05 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:05 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:03 topranks: un-draining Arelion 100G transport eqiad <-> codfw following carrier fibre fix and return to stability T407578
10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:02 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:02 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:37 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
08:47 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
08:46 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
08:19 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
08:19 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
08:19 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2246.codfw.wmnet
08:08 topranks: draining Arelion eqiad <-> codfw transport wiht OSPF metric and re-enabling port on cr1-eqiad
08:04 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
07:42 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84056 and previous config saved to /var/cache/conftool/dbconfig/20251017-074221-root.json
07:27 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84055 and previous config saved to /var/cache/conftool/dbconfig/20251017-072715-root.json
07:12 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84054 and previous config saved to /var/cache/conftool/dbconfig/20251017-071209-root.json
06:57 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84053 and previous config saved to /var/cache/conftool/dbconfig/20251017-065703-root.json
06:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84052 and previous config saved to /var/cache/conftool/dbconfig/20251017-064157-root.json
06:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84051 and previous config saved to /var/cache/conftool/dbconfig/20251017-062651-root.json
06:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84050 and previous config saved to /var/cache/conftool/dbconfig/20251017-061145-root.json
05:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84049 and previous config saved to /var/cache/conftool/dbconfig/20251017-055639-root.json
05:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es1027.eqiad.wmnet with reason: Cloning
05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T407595', diff saved to https://phabricator.wikimedia.org/P84048 and previous config saved to /var/cache/conftool/dbconfig/20251017-054458-marostegui.json
05:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84047 and previous config saved to /var/cache/conftool/dbconfig/20251017-054133-root.json
05:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84046 and previous config saved to /var/cache/conftool/dbconfig/20251017-052627-root.json
05:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84045 and previous config saved to /var/cache/conftool/dbconfig/20251017-051121-root.json
05:11 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1056 to dbctl T406488', diff saved to https://phabricator.wikimedia.org/P84044 and previous config saved to /var/cache/conftool/dbconfig/20251017-051114-marostegui.json
01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 04s)
01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-16

23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
23:19 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
23:18 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: apply
23:18 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/termbox: apply
23:17 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/termbox: apply
23:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
23:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
23:15 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
23:15 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
23:13 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
23:13 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
23:12 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
23:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
23:10 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
23:10 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
23:09 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
23:09 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
23:08 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
23:07 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
23:07 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
23:06 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
23:06 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
23:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/push-notifications: apply
23:05 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
23:03 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/page-analytics: apply
23:02 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
22:59 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
22:59 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
22:58 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply
22:58 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
22:57 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
22:55 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
22:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
22:49 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
22:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
22:47 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
22:46 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
22:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
22:44 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
22:43 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
22:42 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
22:41 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
22:41 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
22:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
22:39 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
22:39 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
22:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
22:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
22:37 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
22:35 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
22:34 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
22:33 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
22:32 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
22:32 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
22:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
22:30 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:29 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
22:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
22:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
22:24 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
22:24 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
22:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
22:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
22:04 sbassett: Deployed security fix for T407131
21:46 jdlrobson@deploy2002: Finished scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549) (duration: 15m 21s)
21:42 jdlrobson@deploy2002: jdlrobson: Continuing with sync
21:35 jdlrobson@deploy2002: jdlrobson: Backport for Temporary user banner should not have such a high z-index (T407549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:31 jdlrobson@deploy2002: Started scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549)
21:26 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7004.*
21:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
21:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7004.magru.wmnet} and A:cp
21:20 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
21:08 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7004.magru.wmnet} and A:cp
21:00 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: Debugging sre.cdn.roll-reboot bugs
20:59 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7004.*
20:56 bblack: see also https://phabricator.wikimedia.org/T407578 for above port disables
20:51 bblack: disabling cr1-eqiad:et-1/1/2 and cr1-codfw:et-1/0/2 (both ends of same Arelion transport, been erroring/flapping for a while)
20:50 eileen: civicrm upgraded from ac4c185b to ab1d21dc
20:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
20:43 ebernhardson@deploy2002: Finished scap sync-world: Backport for Add wgSitename for azwiktionary (T407358) (duration: 09m 29s)
20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Continuing with sync
20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Backport for Add wgSitename for azwiktionary (T407358) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:33 ebernhardson@deploy2002: Started scap sync-world: Backport for Add wgSitename for azwiktionary (T407358)
20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) (duration: 10m 57s)
20:26 ebernhardson@deploy2002: ebernhardson, hamishz: Continuing with sync
20:24 ebernhardson@deploy2002: ebernhardson, hamishz: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:20 ebernhardson@deploy2002: Started scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281)
20:19 ejegg: fundraising python tools upgraded from 698309f1 to 3b0b3fc0
20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
20:18 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
20:15 ebernhardson@deploy2002: Finished scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) (duration: 09m 36s)
20:11 ebernhardson@deploy2002: ebernhardson: Continuing with sync
20:10 ebernhardson@deploy2002: ebernhardson: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:06 ebernhardson@deploy2002: Started scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858)
19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
19:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
19:38 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
19:25 dancy: dancy@deploy2002 Installation of scap version "4.214.0" completed for 2 hosts
19:22 dancy@deploy2002: Installing scap version "4.214.0" for 2 host(s)
19:03 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
18:57 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
18:44 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: no active host - disabled
18:42 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2032.codfw.wmnet onto es2055.codfw.wmnet
18:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
18:26 brett@dns1004: END - running authdns-update
18:25 brett@dns1004: START - running authdns-update
18:08 brett: Import varnish 7.1.1-2~bpo13+wmf1 into trixie-wikimedia - T401832
17:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
17:38 swfrench@deploy2002: Finished scap sync-world: New PHP 8.3 production image (duration: 27m 32s)
17:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
17:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
17:17 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac] (duration: 01m 29s)
17:16 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac]
17:16 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac] (duration: 06m 48s)
17:12 swfrench@deploy2002: Started scap sync-world: New PHP 8.3 production image
17:10 topranks: re-enable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad after maintenance on the lvs host T405499
17:09 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac]
17:06 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
17:00 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
16:58 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac] (duration: 01m 16s)
16:57 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac]
16:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
16:46 swfrench-wmf: reprepro include php8.3_8.3.26-1+wmf11u2 in component/php83
16:34 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:20 topranks: disable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad to move traffic to backup load-balancer lvs1020 T405499
16:19 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: remove lvs1018 enp94s0f0np0 link to rack E1
16:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
16:13 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
16:13 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
15:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2055.codfw.wmnet with reason: Setting up new ES host
15:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp700[7-8].magru.wmnet [reason: pool after firmware updated]
15:27 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7008.magru.wmnet
15:27 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7008.magru.wmnet
15:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7008']
15:15 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: firmware upgrade
15:10 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7008']
15:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7007.magru.wmnet
15:10 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7007.magru.wmnet
15:10 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7008.magru.wmnet [reason: updating firmware]
15:03 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7007']
14:54 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7007']
14:51 ejegg: donorwiki upgraded from d903982c to 70a7050f
14:37 moritzm: installing libarchive security updates
14:33 urandom: starting `removenode` of aqs1012-b (id=bc700f01-8120-4d77-908f-eea943470a25)— T407414
14:30 moritzm: installing distro-info-data updates on Bookworm
14:27 urandom: starting `removenode` of aqs1012-a (id=0b0f0cd5-a1f8-44e2-a8e2-75800ebaea80) — T407414
14:17 tappof: bump space for prometheus k8s-dse in eqiad
14:09 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7008*} and A:cp
14:09 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7008.magru.wmnet
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
13:59 sukhe: sudo ipmitool -I lanplus -H "cp7008.mgmt.magru.wmnet" -U root -E chassis power cycle
13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
13:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
13:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
13:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
13:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
13:49 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
13:28 zabe@deploy2002: Finished scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738) (duration: 08m 09s)
13:27 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7008*} and A:cp
13:24 zabe@deploy2002: zabe: Continuing with sync
13:22 zabe@deploy2002: zabe: Backport for BETA: Try using Hadoop QueryPage computations (T309738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:20 zabe@deploy2002: Started scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738)
13:13 esanders@deploy2002: Finished scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) (duration: 10m 14s)
13:12 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
13:09 esanders@deploy2002: esanders: Continuing with sync
13:06 esanders@deploy2002: esanders: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:03 esanders@deploy2002: Started scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357)
12:51 moritzm: installing git security updates
12:36 moritzm: installing gst-plugins-base1.0 security updates
12:13 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2054 slowly with 10 steps - Pooling in new host
12:05 jmm@dns1004: END - running authdns-update
12:03 jmm@dns1004: START - running authdns-update
11:54 ozge@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:26 claime: sudo cumin 'A:cp' "enable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'
11:22 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:21 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:21 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:19 hnowlan@deploy2002: Finished deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553 (duration: 27m 01s)
11:12 moritzm: installing Squid security updates
11:08 claime: sudo cumin 'A:cp' "disable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'"
11:05 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
11:04 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:53 hnowlan@deploy2002: Started deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553
10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:21 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84027 and previous config saved to /var/cache/conftool/dbconfig/20251016-102110-root.json
10:15 moritzm: installing libfcgi security updates
10:06 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84025 and previous config saved to /var/cache/conftool/dbconfig/20251016-100605-root.json
09:57 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2054 slowly with 10 steps - Pooling in new host
09:56 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2054.codfw.wmnet
09:56 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2054.codfw.wmnet
09:55 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2054 T402859', diff saved to https://phabricator.wikimedia.org/P84023 and previous config saved to /var/cache/conftool/dbconfig/20251016-095534-fceratto.json
09:51 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84022 and previous config saved to /var/cache/conftool/dbconfig/20251016-095058-root.json
09:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84021 and previous config saved to /var/cache/conftool/dbconfig/20251016-093553-root.json
09:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
09:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
09:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
09:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84019 and previous config saved to /var/cache/conftool/dbconfig/20251016-092047-root.json
09:14 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ssw1-d1-eqiad.mgmt with reason: downtime ssw1-d1-eqiad until we have the monitoring checks fully working for the new platform
09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84018 and previous config saved to /var/cache/conftool/dbconfig/20251016-091343-root.json
09:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84017 and previous config saved to /var/cache/conftool/dbconfig/20251016-090541-root.json
09:02 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:00 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gerrit2002.wikimedia.org with reason: T407110
09:00 cmooney@cumin1003: START - Cookbook sre.dns.netbox
08:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84016 and previous config saved to /var/cache/conftool/dbconfig/20251016-085837-root.json
08:57 cmooney@dns2005: END - running authdns-update
08:56 cmooney@dns2005: START - running authdns-update
08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
08:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84014 and previous config saved to /var/cache/conftool/dbconfig/20251016-085035-root.json
08:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84013 and previous config saved to /var/cache/conftool/dbconfig/20251016-084331-root.json
08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1026.eqiad.wmnet
08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
08:35 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
08:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84011 and previous config saved to /var/cache/conftool/dbconfig/20251016-083529-root.json
08:32 marostegui@cumin1003: START - Cookbook sre.dns.netbox
08:32 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.23 refs T405679
08:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84010 and previous config saved to /var/cache/conftool/dbconfig/20251016-082825-root.json
08:26 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1026.eqiad.wmnet
08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
08:22 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84009 and previous config saved to /var/cache/conftool/dbconfig/20251016-082237-root.json
08:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1235 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84007 and previous config saved to /var/cache/conftool/dbconfig/20251016-082031-marostegui.json
08:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
08:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84006 and previous config saved to /var/cache/conftool/dbconfig/20251016-082023-root.json
08:15 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
08:09 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1026 from dbctl T407351', diff saved to https://phabricator.wikimedia.org/P84005 and previous config saved to /var/cache/conftool/dbconfig/20251016-080948-marostegui.json
08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
08:07 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84004 and previous config saved to /var/cache/conftool/dbconfig/20251016-080731-root.json
08:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
08:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84002 and previous config saved to /var/cache/conftool/dbconfig/20251016-080518-root.json
08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2033.codfw.wmnet onto es2054.codfw.wmnet
08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
07:55 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264936
07:54 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264936
07:52 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84000 and previous config saved to /var/cache/conftool/dbconfig/20251016-075225-root.json
07:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83999 and previous config saved to /var/cache/conftool/dbconfig/20251016-075012-root.json
07:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 100%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83997 and previous config saved to /var/cache/conftool/dbconfig/20251016-074122-root.json
07:41 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1055 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83996 and previous config saved to /var/cache/conftool/dbconfig/20251016-074118-marostegui.json
07:37 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83995 and previous config saved to /var/cache/conftool/dbconfig/20251016-073719-root.json
07:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2188 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83992 and previous config saved to /var/cache/conftool/dbconfig/20251016-072932-marostegui.json
07:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
07:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 75%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83991 and previous config saved to /var/cache/conftool/dbconfig/20251016-072610-root.json
07:18 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83989 and previous config saved to /var/cache/conftool/dbconfig/20251016-071136-root.json
07:11 kostajh: UTC morning deploys done
07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 60%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83988 and previous config saved to /var/cache/conftool/dbconfig/20251016-071104-root.json
07:09 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83987 and previous config saved to /var/cache/conftool/dbconfig/20251016-070916-root.json
06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83986 and previous config saved to /var/cache/conftool/dbconfig/20251016-065630-root.json
06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83985 and previous config saved to /var/cache/conftool/dbconfig/20251016-065612-root.json
06:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 50%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83984 and previous config saved to /var/cache/conftool/dbconfig/20251016-065558-root.json
06:54 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83983 and previous config saved to /var/cache/conftool/dbconfig/20251016-065410-root.json
06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83982 and previous config saved to /var/cache/conftool/dbconfig/20251016-064124-root.json
06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83981 and previous config saved to /var/cache/conftool/dbconfig/20251016-064106-root.json
06:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 30%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83980 and previous config saved to /var/cache/conftool/dbconfig/20251016-064052-root.json
06:39 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83979 and previous config saved to /var/cache/conftool/dbconfig/20251016-063904-root.json
06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83978 and previous config saved to /var/cache/conftool/dbconfig/20251016-062618-root.json
06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83977 and previous config saved to /var/cache/conftool/dbconfig/20251016-062600-root.json
06:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 25%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83976 and previous config saved to /var/cache/conftool/dbconfig/20251016-062546-root.json
06:24 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83975 and previous config saved to /var/cache/conftool/dbconfig/20251016-062358-root.json
06:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2145 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83974 and previous config saved to /var/cache/conftool/dbconfig/20251016-061818-marostegui.json
06:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83973 and previous config saved to /var/cache/conftool/dbconfig/20251016-061054-root.json
06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 20%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83972 and previous config saved to /var/cache/conftool/dbconfig/20251016-061040-root.json
06:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83971 and previous config saved to /var/cache/conftool/dbconfig/20251016-060852-root.json
06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1186 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83970 and previous config saved to /var/cache/conftool/dbconfig/20251016-060300-marostegui.json
06:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
05:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 10%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83969 and previous config saved to /var/cache/conftool/dbconfig/20251016-055534-root.json
05:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83968 and previous config saved to /var/cache/conftool/dbconfig/20251016-055346-root.json
05:51 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83967 and previous config saved to /var/cache/conftool/dbconfig/20251016-054504-root.json
05:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 7%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83965 and previous config saved to /var/cache/conftool/dbconfig/20251016-054027-root.json
05:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83964 and previous config saved to /var/cache/conftool/dbconfig/20251016-053840-root.json
05:29 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83963 and previous config saved to /var/cache/conftool/dbconfig/20251016-052958-root.json
05:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 5%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83962 and previous config saved to /var/cache/conftool/dbconfig/20251016-052521-root.json
05:23 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83961 and previous config saved to /var/cache/conftool/dbconfig/20251016-052335-root.json
05:14 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83960 and previous config saved to /var/cache/conftool/dbconfig/20251016-051452-root.json
05:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 1%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83959 and previous config saved to /var/cache/conftool/dbconfig/20251016-051015-root.json
05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2248 to dbctl depooled T406551', diff saved to https://phabricator.wikimedia.org/P83958 and previous config saved to /var/cache/conftool/dbconfig/20251016-050917-marostegui.json
05:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83957 and previous config saved to /var/cache/conftool/dbconfig/20251016-050829-root.json
04:59 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83956 and previous config saved to /var/cache/conftool/dbconfig/20251016-045946-root.json
04:58 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
04:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83955 and previous config saved to /var/cache/conftool/dbconfig/20251016-045323-root.json
04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2240.codfw.wmnet with reason: Maintenance
04:47 marostegui@dns1006: END - running authdns-update
04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2240 T407177', diff saved to https://phabricator.wikimedia.org/P83954 and previous config saved to /var/cache/conftool/dbconfig/20251016-044650-marostegui.json
04:46 marostegui@dns1006: START - running authdns-update
04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2179 to s4 primary and set section read-write T407177', diff saved to https://phabricator.wikimedia.org/P83953 and previous config saved to /var/cache/conftool/dbconfig/20251016-044557-marostegui.json
04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T407177', diff saved to https://phabricator.wikimedia.org/P83952 and previous config saved to /var/cache/conftool/dbconfig/20251016-044533-marostegui.json
04:45 marostegui: Starting s4 codfw failover from db2240 to db2179 - T407177
04:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s4 T407177
04:39 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2179 with weight 0 T407177', diff saved to https://phabricator.wikimedia.org/P83951 and previous config saved to /var/cache/conftool/dbconfig/20251016-043920-marostegui.json
04:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83950 and previous config saved to /var/cache/conftool/dbconfig/20251016-043816-root.json
04:35 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1054 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83949 and previous config saved to /var/cache/conftool/dbconfig/20251016-043510-marostegui.json
04:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
04:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
04:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
04:16 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
03:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
03:22 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
03:04 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
02:50 eileen: civicrm upgraded from 25df5996 to ac4c185b
00:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye

2025-10-15

23:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
23:36 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
23:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
22:56 andrew@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM cloudbackup1002-dev.eqiad.wmnet
21:35 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
21:29 bvibber@deploy2002: Finished scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv (duration: 07m 13s)
21:25 bvibber@deploy2002: bvibber: Continuing with sync
21:24 bvibber@deploy2002: bvibber: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:22 bvibber@deploy2002: Started scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv
21:05 cjming: end of UTC late backport window
21:03 cjming@deploy2002: Finished scap sync-world: Backport for Enable protection indicator for srwiki (T407183) (duration: 08m 25s)
21:03 andrewbogott: adding additional disk space to cloudbackup1002-dev with "sudo gnt-instance modify --disk add:size=60g cloudbackup1002-dev.eqiad.wmnet"
20:59 cjming@deploy2002: cjming, zoranzoki21: Continuing with sync
20:57 cjming@deploy2002: cjming, zoranzoki21: Backport for Enable protection indicator for srwiki (T407183) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:55 cjming@deploy2002: Started scap sync-world: Backport for Enable protection indicator for srwiki (T407183)
20:51 cjming@deploy2002: Finished scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) (duration: 06m 48s)
20:47 cjming@deploy2002: cjming, robertsky: Continuing with sync
20:47 cjming@deploy2002: cjming, robertsky: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:44 cjming@deploy2002: Started scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422)
20:41 cjming@deploy2002: Finished scap sync-world: Backport for Add reader exp to common settings (T406916) (duration: 13m 51s)
20:36 cjming@deploy2002: ksarabia, cjming: Continuing with sync
20:33 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
20:29 cjming@deploy2002: ksarabia, cjming: Backport for Add reader exp to common settings (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:27 cjming@deploy2002: Started scap sync-world: Backport for Add reader exp to common settings (T406916)
20:24 cjming@deploy2002: Finished scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359) (duration: 07m 12s)
20:20 cjming@deploy2002: cjming: Continuing with sync
20:19 cjming@deploy2002: cjming: Backport for Fix action_context for simple bot detection instrument (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:17 cjming@deploy2002: Started scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359)
20:12 kemayo@deploy2002: Finished scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095) (duration: 07m 04s)
20:08 kemayo@deploy2002: kemayo: Continuing with sync
20:07 kemayo@deploy2002: kemayo: Backport for DiscussionTools: enable thanking comments (T366095) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:05 kemayo@deploy2002: Started scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095)
19:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS bullseye
19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: sync
19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: sync
19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:27 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS bullseye
19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:19 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7007.magru.wmnet with reason: hardware issues, depooled
19:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
19:03 sukhe: sudo ipmitool -I lanplus -H "cp7007.mgmt.magru.wmnet" -U root -E chassis power cycle
18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
18:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
18:45 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
18:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
18:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7016.magru.wmnet
18:18 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
18:18 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6016.drmrs.wmnet
18:14 swfrench@deploy2002: Finished scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955) (duration: 10m 21s)
18:14 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
18:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6008.drmrs.wmnet
18:10 swfrench@deploy2002: swfrench: Continuing with sync
18:10 sukhe@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
18:07 swfrench@deploy2002: swfrench: Backport for Disable enrollment in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
18:04 swfrench@deploy2002: Started scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955)
17:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7015.magru.wmnet
17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
17:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6015.drmrs.wmnet
17:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6007.drmrs.wmnet
17:26 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
17:23 swfrench@deploy2002: Finished scap sync-world: Revert to PHP 8.1 - T405955 (duration: 02m 47s)
17:21 swfrench@deploy2002: Started scap sync-world: Revert to PHP 8.1 - T405955
17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
17:04 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7014.magru.wmnet
16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:55 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6014.drmrs.wmnet
16:53 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6006.drmrs.wmnet
16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:40 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2053 slowly with 10 steps - Pooling in new host
16:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:37 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
16:20 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7013.magru.wmnet
16:19 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7006.magru.wmnet
16:16 eevans@cumin1003: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:aqs-eqiad
16:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6013.drmrs.wmnet
16:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6005.drmrs.wmnet
15:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2247.codfw.wmnet
15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7012.magru.wmnet
15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7005.magru.wmnet
15:33 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6012.drmrs.wmnet
15:31 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6004.drmrs.wmnet
15:29 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8] (duration: 01m 06s)
15:28 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8]
15:28 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8] (duration: 06m 37s)
15:21 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8]
15:21 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8] (duration: 02m 17s)
15:19 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8]
15:03 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7011.magru.wmnet
14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6011.drmrs.wmnet
14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6003.drmrs.wmnet
14:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
14:43 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
14:43 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2033.codfw.wmnet onto es2054.codfw.wmnet
14:41 claime: armed keyholder on deploy[1003|2002] following reboots
14:40 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
14:39 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
14:37 moritzm: armed keyholder on cumin1002 following reboot
14:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2054.codfw.wmnet with reason: Setting up new ES host
14:34 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-eqiad
14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
14:34 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
14:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
14:29 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
14:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
14:24 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
14:23 fceratto@cumin1002: dbctl commit (dc=all): 'es2053 set ipaddr before pool-in', diff saved to https://phabricator.wikimedia.org/P83930 and previous config saved to /var/cache/conftool/dbconfig/20251015-142339-fceratto.json
14:22 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2053 slowly with 10 steps - Pooling in new host
14:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2002.codfw.wmnet
14:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2001.codfw.wmnet
14:19 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
14:19 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1003.eqiad.wmnet
14:18 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
14:17 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
14:16 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2002.codfw.wmnet
14:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2001.codfw.wmnet
14:14 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7003.magru.wmnet
14:11 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7010.magru.wmnet
14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6010.drmrs.wmnet
14:10 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6002.drmrs.wmnet
14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:09 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy1003.eqiad.wmnet
14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
14:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:04 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
14:04 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2053 T402859', diff saved to https://phabricator.wikimedia.org/P83929 and previous config saved to /var/cache/conftool/dbconfig/20251015-135630-fceratto.json
13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
13:33 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:33 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6009.drmrs.wmnet
13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6001.drmrs.wmnet
13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7002.magru.wmnet
13:28 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7009.magru.wmnet
13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
13:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
13:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
13:17 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
13:16 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
13:16 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: already rebooted; pooling]
13:15 sukhe@cumin1003: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_magru
13:15 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
13:14 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:14 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
13:00 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1026 T407351', diff saved to https://phabricator.wikimedia.org/P83925 and previous config saved to /var/cache/conftool/dbconfig/20251015-124927-marostegui.json
12:44 claime: enabling puppet on cp nodes for T406318
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
12:29 claime: disabling puppet on cp nodes for T406318
12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
12:26 moritzm: installing ghostscript security updates
12:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2009.codfw.wmnet
12:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2009.codfw.wmnet
12:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2008.codfw.wmnet
12:12 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2008.codfw.wmnet
12:12 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2007.codfw.wmnet
12:05 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2007.codfw.wmnet
12:05 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2006.codfw.wmnet
12:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
12:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
12:01 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2247.codfw.wmnet
11:57 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2006.codfw.wmnet
11:57 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2005.codfw.wmnet
11:50 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:16 claime: Enabling puppet on all cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
11:16 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:15 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:14 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
11:12 claime: Enabling puppet on cp6015 for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
11:07 claime: disabling puppet on cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
10:44 hashar@deploy2002: Finished scap sync-world: Backport for Replace call to deprecated method getImages (T407184) (duration: 32m 19s)
10:40 hashar@deploy2002: hashar: Continuing with sync
10:37 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1009.eqiad.wmnet
10:30 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1009.eqiad.wmnet
10:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1008.eqiad.wmnet
10:23 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1008.eqiad.wmnet
10:23 moritzm: installing libcommons-lang3-java security updates
10:23 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1007.eqiad.wmnet
10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS trixie
10:18 hnowlan: deleted legacy EMEA/Americas business hours Splunk rotations
10:16 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1007.eqiad.wmnet
10:16 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1006.eqiad.wmnet
10:16 hashar@deploy2002: hashar: Backport for Replace call to deprecated method getImages (T407184) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
10:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:11 hashar@deploy2002: Started scap sync-world: Backport for Replace call to deprecated method getImages (T407184)
10:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1006.eqiad.wmnet
10:09 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1005.eqiad.wmnet
10:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
10:02 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
09:58 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
09:44 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS trixie
09:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS trixie
09:37 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
09:33 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2248.codfw.wmnet
09:32 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
09:32 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
09:32 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
09:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
09:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
09:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
09:18 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:17 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:17 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
09:16 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
09:14 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:01 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
09:01 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
08:59 Amir1: mwscript-k8s -- purgeUserOptions.php --wiki=loginwiki (T406724)
08:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
08:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
08:49 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
08:47 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
08:44 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
08:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83918 and previous config saved to /var/cache/conftool/dbconfig/20251015-083339-root.json
08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83917 and previous config saved to /var/cache/conftool/dbconfig/20251015-083333-root.json
08:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
08:29 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
08:22 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
08:22 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:19 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.23 refs T405679
08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83916 and previous config saved to /var/cache/conftool/dbconfig/20251015-081833-root.json
08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83915 and previous config saved to /var/cache/conftool/dbconfig/20251015-081827-root.json
08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:14 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
08:13 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2032.codfw.wmnet onto es2053.codfw.wmnet
08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
08:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
08:04 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
08:04 slyngshede@dns1004: END - running authdns-update
08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83913 and previous config saved to /var/cache/conftool/dbconfig/20251015-080327-root.json
08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83912 and previous config saved to /var/cache/conftool/dbconfig/20251015-080321-root.json
08:03 slyngshede@dns1004: START - running authdns-update
08:02 slyngs: Moving CAS/IDP/SSO to Trixie.
07:58 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
07:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
07:53 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
07:50 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83910 and previous config saved to /var/cache/conftool/dbconfig/20251015-074821-root.json
07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83909 and previous config saved to /var/cache/conftool/dbconfig/20251015-074815-root.json
07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83907 and previous config saved to /var/cache/conftool/dbconfig/20251015-073316-root.json
07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83906 and previous config saved to /var/cache/conftool/dbconfig/20251015-073309-root.json
07:28 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
07:27 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366) (duration: 09m 02s)
07:23 kharlan@deploy2002: kharlan: Continuing with sync
07:21 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable on enwiki (T402366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
07:18 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366)
07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83904 and previous config saved to /var/cache/conftool/dbconfig/20251015-071810-root.json
07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83903 and previous config saved to /var/cache/conftool/dbconfig/20251015-071803-root.json
07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83901 and previous config saved to /var/cache/conftool/dbconfig/20251015-070304-root.json
07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83900 and previous config saved to /var/cache/conftool/dbconfig/20251015-070258-root.json
06:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83899 and previous config saved to /var/cache/conftool/dbconfig/20251015-064758-root.json
06:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83898 and previous config saved to /var/cache/conftool/dbconfig/20251015-064752-root.json
06:46 jmm@dns1004: END - running authdns-update
06:45 jmm@dns1004: START - running authdns-update
06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83897 and previous config saved to /var/cache/conftool/dbconfig/20251015-063252-root.json
06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83896 and previous config saved to /var/cache/conftool/dbconfig/20251015-063246-root.json
06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83895 and previous config saved to /var/cache/conftool/dbconfig/20251015-061746-root.json
06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83894 and previous config saved to /var/cache/conftool/dbconfig/20251015-061740-root.json
06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83891 and previous config saved to /var/cache/conftool/dbconfig/20251015-060240-root.json
06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83890 and previous config saved to /var/cache/conftool/dbconfig/20251015-060234-root.json
06:02 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1052 and es1057 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83889 and previous config saved to /var/cache/conftool/dbconfig/20251015-060210-marostegui.json
05:55 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
05:54 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1260 to dbctl depooled T406550', diff saved to https://phabricator.wikimedia.org/P83886 and previous config saved to /var/cache/conftool/dbconfig/20251015-055457-marostegui.json
05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
05:43 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
05:43 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2248.codfw.wmnet
05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
04:56 eileen: civicrm upgraded from 4d3107fc to 25df5996
01:40 musikanimal@deploy2002: Finished scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719) (duration: 07m 25s)
01:36 musikanimal@deploy2002: hmonroy, musikanimal: Continuing with sync
01:35 musikanimal@deploy2002: hmonroy, musikanimal: Backport for Make tags be links to wish-index with filter applied (T406719) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
01:33 musikanimal@deploy2002: Started scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719)
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 06s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-14

23:39 musikanimal@deploy2002: Finished scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) (duration: 07m 08s)
23:35 musikanimal@deploy2002: musikanimal: Continuing with sync
23:34 musikanimal@deploy2002: musikanimal: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
23:32 musikanimal@deploy2002: Started scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945)
21:55 greg-g: (from eileen) civicrm upgraded from f68c287a to 4d3107fc
21:43 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set reader experiment to true (T406916) (duration: 11m 26s)
21:38 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
21:34 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for Set reader experiment to true (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Set reader experiment to true (T406916)
21:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) (duration: 14m 22s)
21:25 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
21:19 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:17 ladsgroup@deploy2002: Started scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992)
21:16 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP" (duration: 16m 34s)
21:10 ladsgroup@deploy2002: neslihanturan, ladsgroup: Continuing with sync
21:04 ladsgroup@deploy2002: neslihanturan, ladsgroup: Backport for Revert "Add icons for wikibase changes. WIP" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:59 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP"
20:37 toyofuku@deploy2002: Finished scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627) (duration: 11m 58s)
20:30 toyofuku@deploy2002: lmora, toyofuku: Continuing with sync
20:29 toyofuku@deploy2002: lmora, toyofuku: Backport for Add ReadingList Stream to EventStreamConfig (T406627) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:25 toyofuku@deploy2002: Started scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627)
20:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS trixie
20:17 kemayo@deploy2002: Finished scap sync-world: Backport for Suggestions mode (T399612) (duration: 12m 47s)
20:09 kemayo@deploy2002: kemayo: Continuing with sync
20:09 kemayo@deploy2002: kemayo: Backport for Suggestions mode (T399612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:05 kemayo@deploy2002: Started scap sync-world: Backport for Suggestions mode (T399612)
19:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
19:55 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
19:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
19:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
19:54 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
19:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
19:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
19:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
19:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
19:50 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
19:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
19:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
19:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
19:40 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
19:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
19:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
19:39 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
19:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
19:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
19:36 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
19:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
19:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
19:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
19:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
19:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
19:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
19:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
19:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
19:09 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
19:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
19:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
19:06 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
19:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
19:05 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
19:05 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
19:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
19:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
19:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
19:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
19:01 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
19:01 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
19:00 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
19:00 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:59 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum5002.eqsin.wmnet with OS trixie
18:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
18:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
18:57 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
18:57 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
18:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
18:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
18:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
18:52 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
18:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
18:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
18:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
18:48 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
18:48 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
18:46 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:46 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS trixie
18:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS trixie
18:44 rzl: rzl@deploy1003:~$ kube-env mw-script-deploy codfw; helm uninstall amfcta11 # HelmReleaseBadStatus alert was firing for this mw-script job in state pending-install, even though the job was long since finished
18:38 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS trixie
18:36 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
18:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
18:34 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
18:34 ejegg: fundraising civicrm upgraded from 9393addf to f68c287a
18:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
18:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
18:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
18:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
18:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
18:31 brett@dns1004: END - running authdns-update
18:29 brett@dns1004: START - running authdns-update
18:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
18:28 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
18:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
18:26 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
18:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
18:23 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
18:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
18:19 brett@dns1004: END - running authdns-update
18:18 brett@dns1004: START - running authdns-update
18:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
18:11 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) (duration: 19m 18s)
18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum4002.ulsfo.wmnet with OS trixie
18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS trixie
18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS trixie
18:01 swfrench@deploy2002: swfrench: Continuing with sync
17:56 swfrench@deploy2002: swfrench: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
17:52 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955)
17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
17:41 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:40 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:19 swfrench@deploy2002: Finished scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955 (duration: 05m 41s)
17:15 swfrench@deploy2002: Started scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955
16:55 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:55 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:36 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:36 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:32 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
16:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
16:27 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
16:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
16:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
16:19 mutante: rebooting backend of releases.wikimedia.org
16:19 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1003.eqiad.wmnet with reason: reboot
16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1003.eqiad.wmnet
16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1003.eqiad.wmnet with OS trixie
16:17 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:16 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:12 mutante: rebooting phab2002
16:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: reboot
16:04 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
16:03 mutante: CI should be back in operation as normal
15:57 mutante: rebooting main CI server - integration.wikimedia.org will be down for a minute
15:57 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
15:56 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: reboot
15:50 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot
15:50 mutante: contint2002 - rebooting - (not the manager host)
15:47 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1003.eqiad.wmnet with OS trixie
15:46 swfrench-wmf: rolling run-puppet-agent on A:cp hosts - T405955
15:33 swfrench-wmf: disable-puppet on A:cp hosts - T405955
15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
15:30 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1003.eqiad.wmnet on all recursors
15:30 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1003.eqiad.wmnet on all recursors
15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
15:21 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
15:20 moritzm: installing jq security updates
15:17 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
15:05 fceratto@cumin1002: START - Cookbook sre.dns.netbox
15:05 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1003.eqiad.wmnet
15:04 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244 (duration: 00m 58s)
15:03 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244
15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244 (duration: 00m 31s)
15:02 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244
14:58 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T407244
14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
14:36 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
14:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
14:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1001.eqiad.wmnet
14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1001.eqiad.wmnet with OS trixie
14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
14:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4037.ulsfo.wmnet
14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
14:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
14:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:26 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
14:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
14:18 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
14:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:14 Lucas_WMDE: UTC afternoon backport+config window done
14:12 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
14:11 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) (duration: 09m 25s)
14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
14:09 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
14:07 samtar@deploy2002: samtar: Continuing with sync
14:06 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
14:02 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)
14:02 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1001.eqiad.wmnet with OS trixie
14:01 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
14:01 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1001.eqiad.wmnet on all recursors
14:00 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1001.eqiad.wmnet on all recursors
14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
14:00 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
14:00 phuedx@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema (duration: 10m 17s)
13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
13:56 fceratto@cumin1002: START - Cookbook sre.dns.netbox
13:56 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
13:56 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
13:56 phuedx@deploy2002: phuedx: Continuing with sync
13:55 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
13:54 phuedx@deploy2002: phuedx: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:52 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:49 phuedx@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema
13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
13:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:46 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7001.magru.wmnet
13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
13:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
13:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
13:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:34 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
13:31 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
13:26 logmsgbot: daniel Deployed security patch for T405859
13:19 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
13:16 logmsgbot: daniel Deployed security patch for T405859
13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
13:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
13:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye
13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
13:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1095.eqiad.wmnet
12:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1095.eqiad.wmnet
12:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
12:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
12:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
12:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
12:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
12:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
12:39 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
12:39 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2053.codfw.wmnet
12:34 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
12:33 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
12:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
12:30 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
12:18 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
12:17 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:17 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:16 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:15 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:15 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
12:13 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:12 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:12 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:08 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
12:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
12:07 ladsgroup@deploy2002: Finished scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872) (duration: 12m 46s)
12:07 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:07 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1094.eqiad.wmnet
12:03 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:03 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:03 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:59 ladsgroup@deploy2002: ladsgroup: Backport for filebackend: Remove consistency check for multi-backend (T328872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
11:54 ladsgroup@deploy2002: Started scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872)
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
11:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1094.eqiad.wmnet
11:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1093.eqiad.wmnet
11:45 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1093.eqiad.wmnet
11:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1092.eqiad.wmnet
11:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1092.eqiad.wmnet
11:32 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1091.eqiad.wmnet
11:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2089.codfw.wmnet
11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
11:26 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1091.eqiad.wmnet
11:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1090.eqiad.wmnet
11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2089.codfw.wmnet
11:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2088.codfw.wmnet
11:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1090.eqiad.wmnet
11:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1089.eqiad.wmnet
11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
11:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2087.codfw.wmnet
10:58 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1087.eqiad.wmnet
10:58 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1086.eqiad.wmnet
10:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2085.codfw.wmnet
10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1002.eqiad.wmnet
10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1002.eqiad.wmnet with OS trixie
10:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1086.eqiad.wmnet
10:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1085.eqiad.wmnet
10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2082.codfw.wmnet
10:49 hashar: Restarted Zuul to have it reconnect to Gerrit
10:48 fabfur: enable puppet on all DNS hosts for manual gerrit switch (T407200)
10:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1085.eqiad.wmnet
10:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1084.eqiad.wmnet
10:43 arnaudb@dns1004: END - running authdns-update
10:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
10:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2082.codfw.wmnet
10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2081.codfw.wmnet
10:42 arnaudb@dns1004: START - running authdns-update
10:38 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1084.eqiad.wmnet
10:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1083.eqiad.wmnet
10:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
10:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2081.codfw.wmnet
10:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
10:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1083.eqiad.wmnet
10:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
10:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
10:27 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1002.eqiad.wmnet with OS trixie
10:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
10:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
10:20 fabfur: disabling puppet on all DNS hosts for manual gerrit switch (T407200)
10:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
10:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1002.eqiad.wmnet on all recursors
10:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1002.eqiad.wmnet on all recursors
10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
10:16 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1081.eqiad.wmnet
10:15 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
10:09 fceratto@cumin1002: START - Cookbook sre.dns.netbox
10:09 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1002.eqiad.wmnet
10:04 Amir1: mwscript-k8s --follow --dblist=group0 -- purgeUserOptions.php (T406724)
09:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2002.codfw.wmnet
09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2002.codfw.wmnet with OS trixie
09:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
09:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
09:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
09:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
09:38 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
09:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
09:33 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
09:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
09:25 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
09:22 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
09:22 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
09:22 arnaudb@dns1004: END - running authdns-update
09:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2002.codfw.wmnet with OS trixie
09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
09:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
09:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2002.codfw.wmnet on all recursors
09:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2002.codfw.wmnet on all recursors
09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
09:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
09:13 fceratto@cumin1002: START - Cookbook sre.dns.netbox
09:12 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2002.codfw.wmnet
09:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
09:10 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
09:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
09:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
09:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
09:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
09:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
09:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
09:00 arnaudb@dns1004: START - running authdns-update
08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
08:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
08:56 topranks: enable new inter.link IP transit circuit on cr1-drms T401104
08:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
08:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
08:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
08:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
08:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
08:45 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:44 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:42 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
08:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
08:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
08:38 brouberol@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:37 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.23 refs T405679
08:37 brouberol@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:33 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
08:32 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
08:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
08:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
08:30 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
08:29 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
08:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
08:25 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
08:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
08:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
08:18 dcausse: closing the UTC morning backport window
08:14 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) (duration: 10m 46s)
08:12 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
08:10 dcausse@deploy2002: dcausse, phuedx: Continuing with sync
08:07 dcausse@deploy2002: dcausse, phuedx: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
08:03 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332)
08:02 dcausse@deploy2002: mwscript-k8s job started: namespaceDupes eswiktionary --fix # T407150
08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
08:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83863 and previous config saved to /var/cache/conftool/dbconfig/20251014-080025-root.json
08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:59 dcausse@deploy2002: Finished scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) (duration: 11m 29s)
07:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83862 and previous config saved to /var/cache/conftool/dbconfig/20251014-075608-root.json
07:54 dcausse@deploy2002: dcausse, superpes: Continuing with sync
07:51 dcausse@deploy2002: dcausse, superpes: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:47 dcausse@deploy2002: Started scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076)
07:45 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83861 and previous config saved to /var/cache/conftool/dbconfig/20251014-074519-root.json
07:43 dcausse@deploy2002: Finished scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290) (duration: 10m 50s)
07:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83860 and previous config saved to /var/cache/conftool/dbconfig/20251014-074102-root.json
07:39 dcausse@deploy2002: joelyrookewmde, dcausse: Continuing with sync
07:36 dcausse@deploy2002: joelyrookewmde, dcausse: Backport for Implement new usage types for statement with qualifiers and references (T401290) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:32 dcausse@deploy2002: Started scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290)
07:30 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83859 and previous config saved to /var/cache/conftool/dbconfig/20251014-073013-root.json
07:28 dcausse@deploy2002: Finished scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark (duration: 11m 46s)
07:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83858 and previous config saved to /var/cache/conftool/dbconfig/20251014-072556-root.json
07:22 dcausse@deploy2002: jhsoby, dcausse: Continuing with sync
07:21 dcausse@deploy2002: jhsoby, dcausse: Backport for Remove artifact from Quechua Wikipedia wordmark synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:16 dcausse@deploy2002: Started scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark
07:15 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83857 and previous config saved to /var/cache/conftool/dbconfig/20251014-071507-root.json
07:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83856 and previous config saved to /var/cache/conftool/dbconfig/20251014-071050-root.json
07:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83855 and previous config saved to /var/cache/conftool/dbconfig/20251014-070001-root.json
06:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83854 and previous config saved to /var/cache/conftool/dbconfig/20251014-065544-root.json
06:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83853 and previous config saved to /var/cache/conftool/dbconfig/20251014-064455-root.json
06:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83852 and previous config saved to /var/cache/conftool/dbconfig/20251014-064038-root.json
06:37 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83851 and previous config saved to /var/cache/conftool/dbconfig/20251014-063724-root.json
06:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83850 and previous config saved to /var/cache/conftool/dbconfig/20251014-062949-root.json
06:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83848 and previous config saved to /var/cache/conftool/dbconfig/20251014-062532-root.json
06:22 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83847 and previous config saved to /var/cache/conftool/dbconfig/20251014-062218-root.json
06:21 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
06:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83846 and previous config saved to /var/cache/conftool/dbconfig/20251014-061444-root.json
06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
06:14 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
06:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83845 and previous config saved to /var/cache/conftool/dbconfig/20251014-061026-root.json
06:07 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83844 and previous config saved to /var/cache/conftool/dbconfig/20251014-060712-root.json
05:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83843 and previous config saved to /var/cache/conftool/dbconfig/20251014-055938-root.json
05:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83842 and previous config saved to /var/cache/conftool/dbconfig/20251014-055520-root.json
05:53 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
05:53 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
05:52 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83840 and previous config saved to /var/cache/conftool/dbconfig/20251014-055206-root.json
05:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83839 and previous config saved to /var/cache/conftool/dbconfig/20251014-054631-root.json
05:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83838 and previous config saved to /var/cache/conftool/dbconfig/20251014-054432-root.json
05:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1244.eqiad.wmnet with reason: Maintenance
05:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1244 T407176', diff saved to https://phabricator.wikimedia.org/P83837 and previous config saved to /var/cache/conftool/dbconfig/20251014-054200-marostegui.json
05:41 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1160 to s4 primary T407176', diff saved to https://phabricator.wikimedia.org/P83836 and previous config saved to /var/cache/conftool/dbconfig/20251014-054118-marostegui.json
05:41 marostegui: Starting s4 eqiad failover from db1244 to db1160 - T407176
05:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83835 and previous config saved to /var/cache/conftool/dbconfig/20251014-054014-root.json
05:37 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T407176
05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1160 with weight 0 T407176', diff saved to https://phabricator.wikimedia.org/P83834 and previous config saved to /var/cache/conftool/dbconfig/20251014-053654-marostegui.json
05:31 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83833 and previous config saved to /var/cache/conftool/dbconfig/20251014-053125-root.json
05:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83832 and previous config saved to /var/cache/conftool/dbconfig/20251014-052926-root.json
05:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
05:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83830 and previous config saved to /var/cache/conftool/dbconfig/20251014-052508-root.json
05:20 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
05:20 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
05:16 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83828 and previous config saved to /var/cache/conftool/dbconfig/20251014-051619-root.json
05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1031-1032].eqiad.wmnet with reason: Cloning
05:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83826 and previous config saved to /var/cache/conftool/dbconfig/20251014-050113-root.json
04:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1221 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83824 and previous config saved to /var/cache/conftool/dbconfig/20251014-045305-marostegui.json
04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Upgrading
04:41 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.20 (duration: 02m 42s)
03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679 (duration: 45m 02s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679
02:24 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
02:20 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
02:09 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
02:05 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
01:58 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
01:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
01:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
01:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 20s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-13

23:50 musikanimal@deploy2002: Finished scap sync-world: Backport for Add 'accepted' status (T406674) (duration: 40m 01s)
23:38 musikanimal@deploy2002: musikanimal: Continuing with sync
23:36 musikanimal@deploy2002: musikanimal: Backport for Add 'accepted' status (T406674) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
23:29 btullis@cumin1003: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto an-presto cluster: Reboot Presto nodes
23:10 musikanimal@deploy2002: Started scap sync-world: Backport for Add 'accepted' status (T406674)
22:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
22:30 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
22:01 btullis@cumin1003: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
22:01 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
22:01 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
22:00 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
21:52 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
21:48 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
21:05 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
21:05 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
21:03 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
20:57 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
20:56 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
20:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
20:52 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
20:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
20:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
20:34 eileen: civicrm upgraded from 385f00d8 to 9393addf
20:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
20:22 dani@deploy2002: Finished scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 01s)
20:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
20:18 dani@deploy2002: dani: Continuing with sync
20:17 dani@deploy2002: dani: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:13 dani@deploy2002: Started scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577)
19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
18:59 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2001.codfw.wmnet
17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2001.codfw.wmnet with OS trixie
17:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
17:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
17:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2001.codfw.wmnet with OS trixie
17:19 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
17:19 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2001.codfw.wmnet on all recursors
17:18 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2001.codfw.wmnet on all recursors
17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
17:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
17:14 fceratto@cumin1002: START - Cookbook sre.dns.netbox
17:14 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:11 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad
17:11 fceratto@cumin1002: START - Cookbook sre.dns.netbox
17:11 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2001.codfw.wmnet
17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host db-test1001.eqiad.wmnet
17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:08 fceratto@cumin1002: START - Cookbook sre.dns.netbox
17:02 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:59 fceratto@cumin1002: START - Cookbook sre.dns.netbox
16:59 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
16:05 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
15:59 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
15:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
15:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
15:51 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
15:47 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
15:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
15:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
15:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
15:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
15:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
15:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
15:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
15:24 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
15:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
15:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
15:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
15:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet
15:12 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
15:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1005.eqiad.wmnet
15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet
15:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1004.eqiad.wmnet
15:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
15:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
14:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
14:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1003.eqiad.wmnet
14:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1003.eqiad.wmnet
14:49 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
14:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
14:20 hnowlan: rest.php on rest-gateway at 100% for enwiki (and all other wikis)
14:19 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
14:15 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
14:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
14:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
14:07 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
14:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
14:06 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
14:04 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2057.codfw.wmnet
14:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
14:03 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
13:58 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:49 btullis@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
13:46 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
13:43 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:40 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:40 phuedx: UTC afternoon backport window done
13:37 phuedx@deploy2002: Finished scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359) (duration: 17m 39s)
13:34 btullis@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
13:33 phuedx@deploy2002: phuedx: Continuing with sync
13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
13:31 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
13:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
13:30 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
13:26 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:24 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:24 phuedx@deploy2002: phuedx: Backport for Port Java Pageview definition to bot detection (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:20 phuedx@deploy2002: Started scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359)
13:15 derick@deploy2002: Finished scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) (duration: 11m 39s)
13:11 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
13:11 derick@deploy2002: derick, d3r1ck01: Continuing with sync
13:09 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
13:09 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
13:08 derick@deploy2002: derick, d3r1ck01: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:06 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
13:04 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
13:03 derick@deploy2002: Started scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808)
13:01 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
13:01 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
12:59 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
12:59 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
12:57 Amir1: dropped flaggedrevs tables on lawikisource (fT406424)
12:57 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
12:56 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
12:56 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
12:54 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
12:53 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
12:51 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
12:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
12:50 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
12:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83815 and previous config saved to /var/cache/conftool/dbconfig/20251013-124744-root.json
12:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
12:46 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
12:45 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
12:45 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
12:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83814 and previous config saved to /var/cache/conftool/dbconfig/20251013-124439-root.json
12:41 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
12:41 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
12:40 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
12:40 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
12:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83813 and previous config saved to /var/cache/conftool/dbconfig/20251013-123238-root.json
12:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
12:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83812 and previous config saved to /var/cache/conftool/dbconfig/20251013-122933-root.json
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
12:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83811 and previous config saved to /var/cache/conftool/dbconfig/20251013-121732-root.json
12:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
12:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83810 and previous config saved to /var/cache/conftool/dbconfig/20251013-121427-root.json
12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
12:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83809 and previous config saved to /var/cache/conftool/dbconfig/20251013-120226-root.json
11:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83808 and previous config saved to /var/cache/conftool/dbconfig/20251013-115921-root.json
11:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83807 and previous config saved to /var/cache/conftool/dbconfig/20251013-114720-root.json
11:45 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
11:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83806 and previous config saved to /var/cache/conftool/dbconfig/20251013-114415-root.json
11:35 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83805 and previous config saved to /var/cache/conftool/dbconfig/20251013-113510-root.json
11:33 gehel: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly) - `sudo depool && sleep 30 && sudo systemctl restart wdqs-blazegraph.service && sleep 30 && sudo pool`
11:32 moritzm: installing openssl security updates on Bullseye
11:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83804 and previous config saved to /var/cache/conftool/dbconfig/20251013-113214-root.json
11:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83803 and previous config saved to /var/cache/conftool/dbconfig/20251013-112909-root.json
11:20 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83802 and previous config saved to /var/cache/conftool/dbconfig/20251013-112004-root.json
11:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83801 and previous config saved to /var/cache/conftool/dbconfig/20251013-111708-root.json
11:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83800 and previous config saved to /var/cache/conftool/dbconfig/20251013-111403-root.json
11:04 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83799 and previous config saved to /var/cache/conftool/dbconfig/20251013-110458-root.json
11:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83798 and previous config saved to /var/cache/conftool/dbconfig/20251013-110203-root.json
10:58 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83797 and previous config saved to /var/cache/conftool/dbconfig/20251013-105857-root.json
10:49 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83796 and previous config saved to /var/cache/conftool/dbconfig/20251013-104952-root.json
10:49 moritzm: installing systemd bugfix updates on bullseye
10:46 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83795 and previous config saved to /var/cache/conftool/dbconfig/20251013-104657-root.json
10:43 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83794 and previous config saved to /var/cache/conftool/dbconfig/20251013-104351-root.json
10:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1247 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83793 and previous config saved to /var/cache/conftool/dbconfig/20251013-104131-marostegui.json
10:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1247.eqiad.wmnet with reason: Maintenance
10:31 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83792 and previous config saved to /var/cache/conftool/dbconfig/20251013-103151-root.json
10:28 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83791 and previous config saved to /var/cache/conftool/dbconfig/20251013-102845-root.json
10:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83790 and previous config saved to /var/cache/conftool/dbconfig/20251013-102428-root.json
10:16 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83789 and previous config saved to /var/cache/conftool/dbconfig/20251013-101645-root.json
10:13 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83788 and previous config saved to /var/cache/conftool/dbconfig/20251013-101339-root.json
10:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83787 and previous config saved to /var/cache/conftool/dbconfig/20251013-100923-root.json
10:08 hashar@deploy2002: Finished deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner (duration: 00m 13s)
10:08 hashar@deploy2002: Started deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner
10:03 moritzm: installing Linux 5.10.244 on Bullseye hosts
09:54 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83786 and previous config saved to /var/cache/conftool/dbconfig/20251013-095416-root.json
09:39 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83785 and previous config saved to /var/cache/conftool/dbconfig/20251013-093910-root.json
09:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
09:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Cloning
09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1160 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83784 and previous config saved to /var/cache/conftool/dbconfig/20251013-092903-marostegui.json
09:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83783 and previous config saved to /var/cache/conftool/dbconfig/20251013-092152-root.json
09:15 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
09:11 kostajh: UTC morning deploys done
09:10 kharlan@deploy2002: Finished scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) (duration: 09m 19s)
09:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83782 and previous config saved to /var/cache/conftool/dbconfig/20251013-090647-root.json
09:06 kharlan@deploy2002: kharlan: Continuing with sync
09:05 kharlan@deploy2002: kharlan: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
09:01 kharlan@deploy2002: Started scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925)
08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
08:10 kharlan@deploy2002: kharlan: Continuing with sync
08:09 kharlan@deploy2002: kharlan: Backport for Fix locally failing QUnit tests (T406615) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83776 and previous config saved to /var/cache/conftool/dbconfig/20251013-080837-root.json
08:04 kharlan@deploy2002: Started scap sync-world: Backport for Fix locally failing QUnit tests (T406615)
08:04 kharlan@deploy2002: Finished scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis (duration
07:57 kharlan@deploy2002: revi, kharlan, dcausse: Continuing with sync
07:55 kharlan@deploy2002: revi, kharlan, dcausse: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis synced to t
07:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83773 and previous config saved to /var/cache/conftool/dbconfig/20251013-075331-root.json
07:49 kharlan@deploy2002: Started scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis
07:46 mszwarc@deploy2002: Finished scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883) (duration: 37m 46s)
07:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83772 and previous config saved to /var/cache/conftool/dbconfig/20251013-073825-root.json
07:33 mszwarc@deploy2002: mszwarc: Continuing with sync
07:33 mszwarc@deploy2002: mszwarc: Backport for arbcom_plwiki: Change favicon (T406883) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:23 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83771 and previous config saved to /var/cache/conftool/dbconfig/20251013-072320-root.json
07:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1199 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83770 and previous config saved to /var/cache/conftool/dbconfig/20251013-071521-marostegui.json
07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
07:08 mszwarc@deploy2002: Started scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883)
06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83769 and previous config saved to /var/cache/conftool/dbconfig/20251013-063046-root.json
06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83768 and previous config saved to /var/cache/conftool/dbconfig/20251013-061540-root.json
06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83767 and previous config saved to /var/cache/conftool/dbconfig/20251013-060034-root.json
05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83766 and previous config saved to /var/cache/conftool/dbconfig/20251013-054551-root.json
05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83765 and previous config saved to /var/cache/conftool/dbconfig/20251013-054528-root.json
05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1238 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83764 and previous config saved to /var/cache/conftool/dbconfig/20251013-053723-marostegui.json
05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1238.eqiad.wmnet with reason: Maintenance
05:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83763 and previous config saved to /var/cache/conftool/dbconfig/20251013-053045-root.json
05:20 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
05:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83762 and previous config saved to /var/cache/conftool/dbconfig/20251013-051540-root.json
05:06 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
05:06 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
05:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83760 and previous config saved to /var/cache/conftool/dbconfig/20251013-050034-root.json
04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1241 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83759 and previous config saved to /var/cache/conftool/dbconfig/20251013-045230-marostegui.json
04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1241.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
04:49 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
04:49 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1050].eqiad.wmnet with reason: Cloning
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-12

01:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 09s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-11

12:34 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
09:35 hashar@deploy2002: Finished deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58 (duration: 00m 11s)
09:35 hashar@deploy2002: Started deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-10

21:16 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
21:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
21:00 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
20:57 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
17:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:17 cmooney@cumin1003: START - Cookbook sre.dns.netbox
16:50 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
16:50 rzl@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
16:49 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
16:48 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
16:48 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/toolhub: apply
16:47 rzl@deploy1003: helmfile [staging] START helmfile.d/services/toolhub: apply
16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/termbox: apply
16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/termbox: apply
16:45 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
16:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
16:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:42 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:41 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:41 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
16:40 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
16:39 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
16:38 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
16:38 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: apply
16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/push-notifications: apply
16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
16:35 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
16:34 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
16:33 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:31 rzl@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
16:26 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
16:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
16:19 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
16:19 rzl@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
16:16 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:16 rzl@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
16:15 rzl@deploy1003: helmfile [staging] START helmfile.d/services/image-suggestion: apply
16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
16:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
16:11 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
16:11 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
16:10 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
16:10 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
16:09 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
16:08 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
16:07 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
16:07 rzl@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
16:06 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
16:06 rzl@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
16:05 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
16:05 rzl@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
16:04 rzl@deploy1003: helmfile [staging] START helmfile.d/services/data-gateway: apply
16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
16:03 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
16:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
16:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
16:00 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
16:00 rzl@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
15:59 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:58 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:56 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:56 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
15:39 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
15:10 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
14:45 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
14:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83756 and previous config saved to /var/cache/conftool/dbconfig/20251010-141326-root.json
14:06 elukey@cumin1003: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest2001.codfw.wmnet: Renew puppet certificate - elukey@cumin1003
14:03 bking@dns1004: END - running authdns-update
14:02 bking@dns1004: START - running authdns-update
13:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83755 and previous config saved to /var/cache/conftool/dbconfig/20251010-135820-root.json
13:56 ejegg: donorwiki upgraded from 73c34ea4 to d903982c
13:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83754 and previous config saved to /var/cache/conftool/dbconfig/20251010-134314-root.json
13:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83753 and previous config saved to /var/cache/conftool/dbconfig/20251010-132808-root.json
13:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1242 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83752 and previous config saved to /var/cache/conftool/dbconfig/20251010-132003-marostegui.json
13:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1242.eqiad.wmnet with reason: Maintenance
13:17 fabfur: revert haproxykafka to v0.3.16 on cp5021 and cp7001 (T404427)
12:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83750 and previous config saved to /var/cache/conftool/dbconfig/20251010-120643-root.json
11:51 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83749 and previous config saved to /var/cache/conftool/dbconfig/20251010-115138-root.json
11:36 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83748 and previous config saved to /var/cache/conftool/dbconfig/20251010-113632-root.json
11:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83747 and previous config saved to /var/cache/conftool/dbconfig/20251010-112126-root.json
11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es2 eqiad master to es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83746 and previous config saved to /var/cache/conftool/dbconfig/20251010-111653-marostegui.json
11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es1 eqiad master to es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83745 and previous config saved to /var/cache/conftool/dbconfig/20251010-111630-marostegui.json
11:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es3 eqiad master to es1028 T406488', diff saved to https://phabricator.wikimedia.org/P83744 and previous config saved to /var/cache/conftool/dbconfig/20251010-111605-marostegui.json
11:15 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
11:15 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:14 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:13 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
11:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1243 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83743 and previous config saved to /var/cache/conftool/dbconfig/20251010-111306-marostegui.json
11:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1243.eqiad.wmnet with reason: Maintenance
11:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83742 and previous config saved to /var/cache/conftool/dbconfig/20251010-111020-root.json
10:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83741 and previous config saved to /var/cache/conftool/dbconfig/20251010-105514-root.json
10:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83740 and previous config saved to /var/cache/conftool/dbconfig/20251010-104008-root.json
10:33 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:32 vgutierrez: restarting acme-chief and nginx on acme-chief instances
10:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83739 and previous config saved to /var/cache/conftool/dbconfig/20251010-102502-root.json
10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1248 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83738 and previous config saved to /var/cache/conftool/dbconfig/20251010-101720-marostegui.json
10:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1248.eqiad.wmnet with reason: Maintenance
09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
09:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
06:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83737 and previous config saved to /var/cache/conftool/dbconfig/20251010-062406-root.json
06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
06:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83734 and previous config saved to /var/cache/conftool/dbconfig/20251010-060900-root.json
05:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83731 and previous config saved to /var/cache/conftool/dbconfig/20251010-055354-root.json
05:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83728 and previous config saved to /var/cache/conftool/dbconfig/20251010-053848-root.json
05:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1249 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83727 and previous config saved to /var/cache/conftool/dbconfig/20251010-053040-marostegui.json
05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1249.eqiad.wmnet with reason: Maintenance
05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-09

23:10 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.*
22:11 inflatador: bking@wdqs10(18|19|20) systemctl start load-categories-daily.service T405978
22:05 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1019.eqiad.wmnet
22:04 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1020.eqiad.wmnet
22:04 jdlrobson@deploy2002: Finished scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) (duration: 41m 38s)
22:00 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1018.eqiad.wmnet
21:51 dwisehaupt: started staging db restore in root screen session on frdb1006. restoring from db backups on 20251008
21:51 jdlrobson@deploy2002: jdlrobson: Continuing with sync
21:47 jdlrobson@deploy2002: jdlrobson: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:25 TimStarling: on db2202 cleaned up the tables I created for T400696
21:22 jdlrobson@deploy2002: Started scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390)
21:20 wfan: payments-wiki upgraded from 028a0225 to d903982c
20:58 reedy@deploy2002: Finished scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) (duration: 20m 04s)
20:53 reedy@deploy2002: reedy, sbassett: Continuing with sync
20:46 Daimona: Run createAndPromote as in P83722#336349 (~100x, in series) to restore event-organizer membership # T401445
20:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
20:42 reedy@deploy2002: reedy, sbassett: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:38 reedy@deploy2002: Started scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644)
20:32 mutante: logmsgbot do you still log - test log T284123
20:29 mutante: re-enabled QoS on gerrit servers - with previously stable config - T406774 gerrit:1194811
20:28 reedy@deploy2002: Finished scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501) (duration: 10m 19s)
20:25 mutante: re-enabling QoS on gerrit servers - with previously stable config - T406774
20:24 reedy@deploy2002: sbassett, reedy: Continuing with sync
20:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
20:23 reedy@deploy2002: sbassett, reedy: Backport for OATHAuth Recovery Code code improvement (T406501) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
20:18 reedy@deploy2002: Started scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501)
20:17 reedy@deploy2002: Finished scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) (duration: 10m 46s)
20:13 reedy@deploy2002: daimona, reedy: Continuing with sync
20:11 reedy@deploy2002: daimona, reedy: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:06 reedy@deploy2002: Started scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445)
20:04 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
20:00 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1020.eqiad.wmnet
19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1019.eqiad.wmnet
19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1018.eqiad.wmnet
19:29 eileen: civicrm upgraded from 14cc3125 to 748922f0
19:22 ejegg: donorwiki upgraded from e8ef5539 to 73c34ea4
19:13 ejegg: civicrm upgraded from 132211d5 to 14cc3125
19:04 jforrester@deploy2002: Finished scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates (duration: 11m 39s)
18:59 jforrester@deploy2002: jforrester: Continuing with sync
18:58 jforrester@deploy2002: jforrester: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
18:53 jforrester@deploy2002: Started scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates
18:36 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
18:36 cmooney@cumin1003: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
18:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/apertium: apply
18:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/apertium: apply
17:31 topranks: begin work to move lvs1020 uplink cable from ssw1-f1-eqiad to ssw1-e1-eqiad
17:30 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs1020.eqiad.wmnet with reason: downtime lvs1020 to supress alerts about enp94s0f0np0 going down and losing backend connectivity
17:08 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:06 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:06 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:05 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:04 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:02 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
16:47 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
16:38 cmooney@cumin1003: START - Cookbook sre.dns.netbox
16:33 cwhite: upgrade grafana-loki on grafana hosts T406478
16:30 tgr@deploy2002: Finished scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) (duration: 20m 07s)
16:26 tgr@deploy2002: tgr, d3r1ck01: Continuing with sync
16:18 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:18 sukhe: sukhe@lvs2013:~$ sudo systemctl restart pybal.service
16:14 tgr@deploy2002: tgr, d3r1ck01: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
16:10 tgr@deploy2002: Started scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)
15:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:57 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcaptcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcatpcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
15:26 sukhe: sukhe@lvs1019:~$ sudo systemctl restart pybal.service
15:25 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:48 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS bookworm
14:47 sukhe: restart pybal on lvs1020
14:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS bookworm
14:42 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
14:42 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS bullseye
14:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS bookworm
14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
14:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2046.codfw.wmnet']
14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
14:35 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:34 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:31 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
14:29 hnowlan: rest.php group2-except-enwiki on rest-gateway at 10%
14:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
14:26 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
14:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
14:18 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
14:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
14:12 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
14:12 Lucas_WMDE: UTC afternoon backport+config window done
14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445) (duration: 14m 47s)
14:08 sukhe: restart pybal on lvs1020 to pick up WDQS changes
14:05 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1050.eqiad.wmnet
14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS bookworm
14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS bookworm
14:02 Lucas_WMDE: for the record, the `foreachwikiindblist small+medium emptyUserGroup` maintenance script run (for T401445) did *not* work, running the maintenance script separately for small and medium worked better
14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS bookworm
14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
14:00 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Delete the event-organizer user group on medium and small wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:59 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1050.eqiad.wmnet
13:56 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
13:56 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
13:55 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445)
13:54 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small+medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
13:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) (duration: 11m 51s)
13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
13:44 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
13:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
13:43 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
13:41 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:37 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
13:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
13:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
13:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445)
13:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
13:34 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:32 esanders@deploy2002: Finished scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary" (duration: 08m 29s)
13:32 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
13:28 esanders@deploy2002: esanders: Continuing with sync
13:28 esanders@deploy2002: esanders: Backport for Revert "Invalidate Flow cache on enwiktionary" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:24 esanders@deploy2002: Started scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary"
13:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
13:21 hashar: Zuul successfully reconnected to Gerrit
13:20 hashar: Closed jenkins-bot connections on Gerrit primary
13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2005.wikimedia.org
13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2005.wikimedia.org with OS trixie
13:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
13:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
12:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
12:58 fabfur: enable puppet on A:cp to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
12:55 arnaudb@dns1004: END - running authdns-update
12:53 arnaudb@dns1004: START - running authdns-update
12:53 arnaudb@dns1004: START - running authdns-update
12:53 arnaudb@dns1004: START - running authdns-update
12:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
12:47 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
12:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
12:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
12:18 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp2005.wikimedia.org with OS trixie
12:18 fabfur: reloading haproxy on A:cp-eqsin (T404427)
12:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
12:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2005.wikimedia.org on all recursors
12:17 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp2005.wikimedia.org on all recursors
12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
12:17 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
12:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
12:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp2005.wikimedia.org
12:10 fabfur: enable puppet on A:cp-eqsin to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
12:06 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:03 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
12:02 arnaudb@dns1004: START - running authdns-update
11:59 moritzm: installing luajit security updates
11:53 fabfur: disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1194676 on cp5021 (T404427)
11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1005.wikimedia.org
11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1005.wikimedia.org with OS trixie
11:46 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbprov2007.codfw.wmnet
11:40 jynus@cumin1002: START - Cookbook sre.hosts.reboot-single for host dbprov2007.codfw.wmnet
11:36 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
11:32 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
11:27 ladsgroup@cumin1003: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
11:21 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
11:21 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp1005.wikimedia.org with OS trixie
11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
11:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1005.wikimedia.org on all recursors
11:18 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp1005.wikimedia.org on all recursors
11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
11:16 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
11:14 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
11:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
11:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp1005.wikimedia.org
10:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:57 moritzm: installing qemu security updates
10:47 cmooney@dns2005: END - running authdns-update
10:46 cmooney@dns2005: START - running authdns-update
10:37 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:17 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:15 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
10:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:10 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:09 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
10:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
10:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
10:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
10:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:01 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83715 and previous config saved to /var/cache/conftool/dbconfig/20251009-095839-root.json
09:44 kharlan@deploy2002: Finished scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) (duration: 11m 18s)
09:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83713 and previous config saved to /var/cache/conftool/dbconfig/20251009-094333-root.json
09:40 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:39 kharlan@deploy2002: kharlan: Continuing with sync
09:39 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:38 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
09:37 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
09:37 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:36 kharlan@deploy2002: kharlan: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
09:36 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:32 kharlan@deploy2002: Started scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707)
09:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83711 and previous config saved to /var/cache/conftool/dbconfig/20251009-093131-root.json
09:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
09:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83709 and previous config saved to /var/cache/conftool/dbconfig/20251009-092827-root.json
09:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
09:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2045.codfw.wmnet']
09:23 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
09:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
09:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83708 and previous config saved to /var/cache/conftool/dbconfig/20251009-091626-root.json
09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83707 and previous config saved to /var/cache/conftool/dbconfig/20251009-091322-root.json
09:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1252 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83706 and previous config saved to /var/cache/conftool/dbconfig/20251009-090516-marostegui.json
09:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1252.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83705 and previous config saved to /var/cache/conftool/dbconfig/20251009-090120-root.json
08:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
08:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
08:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
08:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
08:52 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2050.codfw.wmnet']
08:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83704 and previous config saved to /var/cache/conftool/dbconfig/20251009-084614-root.json
08:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
08:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2179 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83703 and previous config saved to /var/cache/conftool/dbconfig/20251009-083801-marostegui.json
08:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
08:34 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83702 and previous config saved to /var/cache/conftool/dbconfig/20251009-083432-root.json
08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
08:22 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.22 refs T405678
08:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83701 and previous config saved to /var/cache/conftool/dbconfig/20251009-081926-root.json
08:19 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
08:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2043.codfw.wmnet']
08:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
08:12 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044']
08:12 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044']
08:07 kharlan@deploy2002: Finished scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) (duration: 13m 14s)
08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83700 and previous config saved to /var/cache/conftool/dbconfig/20251009-080420-root.json
08:03 kharlan@deploy2002: kharlan: Continuing with sync
07:59 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272] (duration: 02m 10s)
07:59 kharlan@deploy2002: kharlan: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:57 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
07:57 joal@deploy2002: Started deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272]
07:56 joal@deploy2002: Finished deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272] (duration: 03m 53s)
07:54 kharlan@deploy2002: Started scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204)
07:53 kharlan@deploy2002: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (scap version: 4.213.0) (duration: 00m 00s)
07:53 joal@deploy2002: Started deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272]
07:52 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272] (duration: 00m 54s)
07:51 joal@deploy2002: Started deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272]
07:49 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83699 and previous config saved to /var/cache/conftool/dbconfig/20251009-074914-root.json
07:47 kharlan@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream (duration: 11m 53s)
07:43 kharlan@deploy2002: kharlan, bearloga: Continuing with sync
07:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
07:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2147 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83698 and previous config saved to /var/cache/conftool/dbconfig/20251009-074055-marostegui.json
07:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
07:40 kharlan@deploy2002: kharlan, bearloga: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:35 kharlan@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream
07:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
07:29 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) (duration: 11m 54s)
07:25 kharlan@deploy2002: kharlan: Continuing with sync
07:24 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
07:22 kharlan@deploy2002: kharlan: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
07:17 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204)
07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1029,1034].eqiad.wmnet with reason: Cloning
07:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1034 and es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83697 and previous config saved to /var/cache/conftool/dbconfig/20251009-071430-marostegui.json
07:05 moritzm: installing Redis security updates
06:53 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
06:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
06:48 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
06:39 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
06:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83696 and previous config saved to /var/cache/conftool/dbconfig/20251009-063106-root.json
06:28 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1019.*
06:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
06:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 13s)
06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 14s)
06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
06:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83694 and previous config saved to /var/cache/conftool/dbconfig/20251009-061600-root.json
06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83691 and previous config saved to /var/cache/conftool/dbconfig/20251009-060054-root.json
05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83688 and previous config saved to /var/cache/conftool/dbconfig/20251009-054548-root.json
05:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1050 and es1053 depooled T406488', diff saved to https://phabricator.wikimedia.org/P83687 and previous config saved to /var/cache/conftool/dbconfig/20251009-054347-marostegui.json
05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2155 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83686 and previous config saved to /var/cache/conftool/dbconfig/20251009-053730-marostegui.json
05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
05:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
05:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
04:13 eileen: civicrm upgraded from 6f24d513 to 132211d5
02:11 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
02:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
01:54 mutante: [wdqs1020:~] $ sudo systemctl restart wdqs-blazegraph
01:32 eileen: civicrm upgraded from 4c13f904 to 6f24d513
01:18 eileen: civicrm upgraded from 2c6fedc8 to 4c13f904
01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 20s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-08

23:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
23:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
22:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
21:25 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
21:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
21:19 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
21:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
21:13 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978 (duration: 00m 12s)
21:13 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978
21:10 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
20:36 tgr_: UTC late deploys done
20:35 tgr@deploy2002: Finished scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631) (duration: 13m 53s)
20:31 tgr@deploy2002: tgr: Continuing with sync
20:26 tgr@deploy2002: tgr: Backport for Deploy JWT session cookies to group2 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:21 tgr@deploy2002: Started scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631)
20:19 tgr@deploy2002: Finished scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) (duration: 13m 03s)
20:15 tgr@deploy2002: tgr, kemayo, anzx: Continuing with sync
20:11 tgr@deploy2002: tgr, kemayo, anzx: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:06 tgr@deploy2002: Started scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422)
20:02 hashar: Disabled Gerrit Apache mod_qos by putting it to be logging only # T406774
19:30 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) (duration: 09m 26s)
19:25 krinkle@deploy2002: krinkle: Continuing with sync
19:25 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
19:20 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510)
19:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
18:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
18:56 ssastry@deploy2002: Finished scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass" (duration: 16m 00s)
18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
18:50 ssastry@deploy2002: ssastry: Continuing with sync
18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
18:46 ssastry@deploy2002: ssastry: Backport for Revert "Add a DOM version of the TOC markers pass" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
18:43 hashar: For posterity: October 8th 2025. The day brett and Krinkle are getting rid of the last .m. subdomain.
18:40 ssastry@deploy2002: Started scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass"
18:36 brett: Enable unified mobile routing on en.wikipedia.org rollout complete - T403510
18:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
18:33 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
18:31 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
18:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs2017.codfw.wmnet with reason: finish getting host ready for production
18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
17:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
17:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
17:54 swfrench-wmf: completed post-switchover right-sizing of large mediawiki services - T405955
17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:49 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
17:45 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:45 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:42 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:42 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:42 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
17:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
17:33 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:32 brett: Enable unified mobile routing on en.wikipedia.org - T403510
17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2052 gradually with 4 steps - Pooling in new host
17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:11 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
16:53 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
16:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
16:37 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2052 gradually with 4 steps - Pooling in new host
16:36 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2052.codfw.wmnet
16:36 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2052.codfw.wmnet
16:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2017.codfw.wmnet with OS bullseye
16:26 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2052 T402859', diff saved to https://phabricator.wikimedia.org/P83675 and previous config saved to /var/cache/conftool/dbconfig/20251008-162623-fceratto.json
16:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
16:10 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
15:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
15:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-launcher1003.eqiad.wmnet with OS bullseye
15:37 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:37 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:33 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
15:16 elukey: reboot ms-be1088 as a test for T404356
15:14 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1088.eqiad.wmnet with reason: testing
15:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
15:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
15:11 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: testing
15:05 Lucas_WMDE: UTC afternoon backport+config window do ne
15:03 derick@deploy2002: Finished scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice (duration: 42m 36s)
14:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-launcher1003.eqiad.wmnet with OS bullseye
14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
14:50 derick@deploy2002: d3r1ck01, derick: Continuing with sync
14:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
14:47 derick@deploy2002: d3r1ck01, derick: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
14:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
14:33 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
14:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
14:20 derick@deploy2002: Started scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice
14:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773) (duration: 14m 0
14:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
14:09 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
14:08 topranks: re-pool esams in dns after cr1-esams restored to normal operation T406705
14:07 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Continuing with sync
{{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), [[gerrit:1194150|Force OATHManage to be on central domain (T401773)}}
13:56 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773)
13:54 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638) (duration: 44m 23s)
13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
13:42 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Continuing with sync
13:39 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Backport for Disable mobilefrontend on donatewiki (T406638) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
13:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
13:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2001.codfw.wmnet
13:19 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2001.codfw.wmnet
13:14 jgleeson: civicrm upgraded from 9db8f0d5 to 2c6fedc8
13:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
13:10 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638)
13:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2002.codfw.wmnet
13:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2002.codfw.wmnet
12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2005.wikimedia.org
12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2005.wikimedia.org with OS trixie
12:45 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=fywiki --logwiki=metawiki Constable31 Shogeneral # T406731
12:33 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
12:28 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
12:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
12:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
12:24 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
12:24 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
12:22 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
12:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
12:22 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
12:15 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ms-be[2083-2084].codfw.wmnet with reason: awaiting controller swap
12:10 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
12:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
12:10 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2005.wikimedia.org on all recursors
12:09 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test2005.wikimedia.org on all recursors
12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
12:09 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
12:08 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
12:07 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
12:05 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
12:05 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2005.wikimedia.org
12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
12:05 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
12:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
12:01 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
11:59 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
11:57 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test2005.wikimedia.org
11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:47 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
11:47 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2078
11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2078
11:40 mvernon@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2078
11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:40 mvernon@cumin1002: START - Cookbook sre.dns.wipe-cache ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
11:39 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
11:39 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
11:34 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
11:34 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS bookworm
11:30 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
11:28 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
11:28 mvernon@cumin1002: START - Cookbook sre.dns.netbox
11:26 claime: Enabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
11:25 mvernon@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:22 mvernon@cumin1002: START - Cookbook sre.dns.netbox
11:22 mvernon@cumin1002: START - Cookbook sre.hosts.move-vlan for host ms-be2078
11:22 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
11:19 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
11:09 moritzm: imported megacli into thirdparty/hwraid (upstream repo doesn't cover trixie yet, copied over from bookworm) T391083
10:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS bookworm
10:43 claime: Disabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
10:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:34 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
10:33 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:31 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
10:30 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
10:29 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
10:22 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
10:20 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
10:17 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
10:16 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
10:15 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
10:14 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
10:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
09:47 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2027.codfw.wmnet onto es2052.codfw.wmnet
09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
09:36 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
09:08 topranks: disable BGP to asw*-esams from cr1-esams as the CR external links are also down
09:02 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: no reason specified, ]
09:02 Emperor: depool esams
09:02 mvernon@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
08:52 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
08:50 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83669 and previous config saved to /var/cache/conftool/dbconfig/20251008-085005-root.json
08:44 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:35 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83667 and previous config saved to /var/cache/conftool/dbconfig/20251008-083459-root.json
08:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:31 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83666 and previous config saved to /var/cache/conftool/dbconfig/20251008-081953-root.json
08:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.22 refs T405678
08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83665 and previous config saved to /var/cache/conftool/dbconfig/20251008-080448-root.json
08:03 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
08:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:00 moritzm: installing libxml2 security updates
07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2172 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83664 and previous config saved to /var/cache/conftool/dbconfig/20251008-075612-marostegui.json
07:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
07:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:47 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:27 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
07:22 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
07:21 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
07:17 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
07:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83663 and previous config saved to /var/cache/conftool/dbconfig/20251008-071656-marostegui.json
07:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:15 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
07:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
06:57 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
06:55 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
06:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
06:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
06:29 moritzm: installing openssl security updates
06:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T406488', diff saved to https://phabricator.wikimedia.org/P83662 and previous config saved to /var/cache/conftool/dbconfig/20251008-062752-marostegui.json
06:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1030].eqiad.wmnet with reason: Cloning
06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
06:24 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1049 and es1051 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83659 and previous config saved to /var/cache/conftool/dbconfig/20251008-062404-marostegui.json
06:12 moritzm: rebalance Ganeti eqiad/D following vmscape reboots
05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
04:37 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978 (duration: 00m 14s)
04:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978
03:55 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 02m 01s)
03:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
03:53 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 16m 11s)
03:52 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
03:41 ryankemper@cumin2002: conftool action : GET; selector: name=wdqs1018.eqiad.wmnet
03:38 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
03:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
02:33 eileen: civicrm upgraded from 8228670e to 9db8f0d5
02:27 eileen: civicrm upgraded from 7a81fe1c to 8228670e
02:19 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
02:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
02:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
02:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
02:05 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
01:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
01:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 13s)
01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
00:27 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
00:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
00:09 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.22
00:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply

2025-10-07

23:58 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.21
23:58 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:54 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:50 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:18 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
23:13 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
22:46 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
22:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
22:12 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
22:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
22:04 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
22:02 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1020\.eqiad\.wmnet
21:50 bking@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 45s)
21:49 bking@deploy2002: Started deploy [wdqs/wdqs@fea7794]: T405978
21:48 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs1020.eqiad.wmnet with reason: finish getting host ready for production
21:41 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
21:41 tgr_: UTC late deploys done
{{safesubst:SAL entry|1=21:40 tgr@deploy2002: Finished scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T405}}
21:36 tgr@deploy2002: tgr: Continuing with sync
21:34 tgr@deploy2002: tgr: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), session: Log cache write flags in `SessionStore::set()` (T405633 T405634) synced
{{safesubst:SAL entry|1=21:30 tgr@deploy2002: Started scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T4056}}
21:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
21:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
20:58 aaron@deploy2002: Finished scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) (duration: 10m 13s)
20:54 aaron@deploy2002: aaron: Continuing with sync
20:53 aaron@deploy2002: aaron: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
20:50 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
20:48 aaron@deploy2002: Started scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805)
20:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
20:45 kharlan@deploy2002: Finished scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) (duration: 11m 05s)
20:41 brett: Enable unified mobile routing on all except en.wikipedia.org - T403510
20:41 kharlan@deploy2002: kharlan: Continuing with sync
20:38 kharlan@deploy2002: kharlan: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:34 kharlan@deploy2002: Started scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342)
20:13 mstyles@deploy2002: Finished scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) (duration: 09m 08s)
20:09 mstyles@deploy2002: mstyles: Continuing with sync
20:08 mstyles@deploy2002: mstyles: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:05 ejegg: fundraising civicrm upgraded from eac2de65 to 7a81fe1c
20:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
20:04 mstyles@deploy2002: Started scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664)
19:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
19:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
19:01 ejegg: standalone SmashPig upgraded from 86bde4e4 to 32dc5c72
18:09 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1002.wikimedia.org
18:08 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS trixie
17:53 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
17:47 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
17:34 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
17:34 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
17:32 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
17:29 sukhe@cumin1003: START - Cookbook sre.dns.netbox
17:29 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
17:26 taavi: taavi@apt1002 ~ $ sudo -i reprepro -C thirdparty/tofu update trixie-wikimedia # T405742
17:05 mutante: releases2003 - re-enabling puppet - reacting to monitoring alert - T405352
16:30 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:26 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
16:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha1002.wikimedia.org
16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
16:13 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
16:11 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2044.codfw.wmnet with OS bullseye
16:09 sukhe@cumin1003: START - Cookbook sre.dns.netbox
16:05 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts hcaptcha1002.wikimedia.org
16:04 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
16:03 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:59 sukhe@cumin1003: START - Cookbook sre.dns.netbox
15:59 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
15:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:58 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bullseye
15:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
15:55 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:52 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS bookworm
15:52 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
15:49 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
15:49 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
15:47 jasmine@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
15:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host hcaptcha1002.wikimedia.org with OS trixie
15:40 jasmine@cumin1003: START - Cookbook sre.dns.netbox
15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
15:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:29 jasmine@cumin1003: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
15:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
15:26 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:24 hashar@deploy2002: Finished deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin (duration: 00m 11s)
15:23 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
15:23 hashar@deploy2002: Started deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin
15:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
15:11 jasmine_: homer ‘cr*eqiad’ commit "T383227"
15:09 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bookworm
15:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
15:03 hashar@deploy2002: Finished deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833 (duration: 00m 30s)
15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597 (duration: 00m 52s)
15:03 hashar@deploy2002: Started deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833
15:02 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597
15:02 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597 (duration: 00m 31s)
15:01 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597
15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
14:59 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T406597
14:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
14:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bookworm
14:53 jasmine@deploy2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
14:51 jasmine@dns1004: END - running authdns-update
14:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
14:50 jasmine@dns1004: START - running authdns-update
14:42 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
14:41 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
14:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
14:21 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
14:21 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
14:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
14:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:16 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
14:16 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
14:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:04 Lucas_WMDE: UTC afternoon backport+config window done
14:04 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) (duration: 09m 58s)
14:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
14:00 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
14:00 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
13:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
13:58 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:56 sukhe@cumin1003: START - Cookbook sre.dns.netbox
13:56 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
13:56 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
13:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:55 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
13:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
13:54 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569)
13:52 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1001.wikimedia.org
13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS trixie
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
13:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
13:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
13:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
13:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
13:41 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
13:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
13:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
13:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
13:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
13:28 moritzm: rebalance Ganeti codfw/D following vmscape reboots
13:27 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
13:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:17 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS trixie
13:17 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:16 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:16 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
13:14 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:14 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:13 esanders@deploy2002: Finished scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080) (duration: 10m 07s)
13:10 sukhe@cumin1003: START - Cookbook sre.dns.netbox
13:10 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
13:09 esanders@deploy2002: esanders: Continuing with sync
13:08 esanders@deploy2002: esanders: Backport for Invalidate Flow cache on enwiktionary (T405080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:06 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:05 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
13:03 esanders@deploy2002: Started scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080)
12:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83649 and previous config saved to /var/cache/conftool/dbconfig/20251007-122526-root.json
12:23 moritzm: rebalance Ganeti eqiad/C following vmscape reboots
12:15 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
12:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83647 and previous config saved to /var/cache/conftool/dbconfig/20251007-121020-root.json
11:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83646 and previous config saved to /var/cache/conftool/dbconfig/20251007-115513-root.json
11:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:49 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:47 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83645 and previous config saved to /var/cache/conftool/dbconfig/20251007-114716-root.json
11:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83644 and previous config saved to /var/cache/conftool/dbconfig/20251007-114007-root.json
11:33 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
11:32 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83643 and previous config saved to /var/cache/conftool/dbconfig/20251007-113210-root.json
11:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
11:27 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
11:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
11:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83642 and previous config saved to /var/cache/conftool/dbconfig/20251007-112501-root.json
11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:23 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
11:19 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:18 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:17 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83640 and previous config saved to /var/cache/conftool/dbconfig/20251007-111704-root.json
11:16 marostegui: Upgrade db1169 (s1) to 10.11.14 T406543
11:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Upgrading
11:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1169 T406543', diff saved to https://phabricator.wikimedia.org/P83639 and previous config saved to /var/cache/conftool/dbconfig/20251007-111438-marostegui.json
11:13 moritzm: imported cas 7.1.6.2 for trixie-wikimedia T406455
11:12 moritzm: imported prometheus-jmx-exporter 0.15.0 for trixie-wikimedia T406455
11:08 moritzm: rebalance Ganeti codfw/C following vmscape reboots
11:07 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:07 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:04 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:04 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83638 and previous config saved to /var/cache/conftool/dbconfig/20251007-110158-root.json
10:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83637 and previous config saved to /var/cache/conftool/dbconfig/20251007-105337-marostegui.json
10:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2206.codfw.wmnet with reason: Maintenance
10:44 slyngshede@dns1004: END - running authdns-update
10:43 slyngshede@dns1004: START - running authdns-update
10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
10:38 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
10:31 cmooney@cumin1003: START - Cookbook sre.dns.netbox
10:25 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:24 ladsgroup@deploy2002: Finished scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893) (duration: 14m 51s)
10:20 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:19 cmooney@cumin1003: START - Cookbook sre.dns.netbox
10:14 ladsgroup@deploy2002: ladsgroup: Backport for mainstash: Disable multiPrimaryMode (T389893) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
10:09 ladsgroup@deploy2002: Started scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893)
10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
10:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424) (duration: 09m 34s)
10:00 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
09:59 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb] (duration: 01m 05s)
09:58 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb]
09:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
09:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
09:57 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy FlaggedRevs from lawikisource (T406424) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
09:56 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
09:56 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2027.codfw.wmnet onto es2052.codfw.wmnet
09:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1028,1051].eqiad.wmnet with reason: Cloning
09:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2029.codfw.wmnet
09:55 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2029.codfw.wmnet
09:54 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb] (duration: 42m 33s)
09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1028 to clone es1051 T406488', diff saved to https://phabricator.wikimedia.org/P83635 and previous config saved to /var/cache/conftool/dbconfig/20251007-095339-marostegui.json
09:52 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424)
09:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2029.codfw.wmnet with reason: Setting up new ES host
09:46 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
09:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
09:33 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1050.eqiad.wmnet with OS bookworm
09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
09:25 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
09:22 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
09:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
09:19 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
09:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:17 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
09:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
09:12 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb]
09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repool es1029 and depool es1026 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83634 and previous config saved to /var/cache/conftool/dbconfig/20251007-091011-marostegui.json
09:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1029 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83633 and previous config saved to /var/cache/conftool/dbconfig/20251007-090826-marostegui.json
09:07 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb] (duration: 01m 12s)
09:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
09:06 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb]
09:05 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
09:04 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
09:04 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
09:02 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
08:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
08:58 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
08:57 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
08:53 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83631 and previous config saved to /var/cache/conftool/dbconfig/20251007-085320-root.json
08:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
08:42 topranks: tighten up acl for ssh access on pfw1-codfw T390939
08:41 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
08:38 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83630 and previous config saved to /var/cache/conftool/dbconfig/20251007-083814-root.json
08:37 hashar: Stopped Gerrit on gerrit2003, deleted /srv/gerrit/git/* and restarted a full replication due to bad files ownership # T387833
08:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:27 elukey@cumin1003: START - Cookbook sre.hosts.provision for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
08:23 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83629 and previous config saved to /var/cache/conftool/dbconfig/20251007-082309-root.json
08:20 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
08:17 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.22 refs T405678
08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83628 and previous config saved to /var/cache/conftool/dbconfig/20251007-080803-root.json
08:06 moritzm: installing libsndfile security updates
08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2210 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83627 and previous config saved to /var/cache/conftool/dbconfig/20251007-080015-marostegui.json
08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2210.codfw.wmnet with reason: Maintenance
07:43 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83626 and previous config saved to /var/cache/conftool/dbconfig/20251007-074342-root.json
07:34 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
07:33 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
07:28 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83625 and previous config saved to /var/cache/conftool/dbconfig/20251007-072837-root.json
07:21 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) (duration: 15m 32s)
07:14 dcausse@deploy2002: dcausse: Continuing with sync
07:13 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83624 and previous config saved to /var/cache/conftool/dbconfig/20251007-071331-root.json
07:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
07:11 dcausse@deploy2002: dcausse: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:10 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1050.eqiad.wmnet with OS bookworm
07:05 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858)
06:58 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83623 and previous config saved to /var/cache/conftool/dbconfig/20251007-065825-root.json
06:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2219 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83622 and previous config saved to /var/cache/conftool/dbconfig/20251007-065019-marostegui.json
06:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2219.codfw.wmnet with reason: Maintenance
06:44 kart_: Updated cxserver to 2025-10-06-084053-production (T394982, T403574)
06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
06:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:40 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
06:35 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83621 and previous config saved to /var/cache/conftool/dbconfig/20251007-063014-root.json
06:24 moritzm: rebalance Ganeti eqiad/B following vmscape reboots
06:24 moritzm: rebalance Ganeti codfw/B following vmscape reboots
06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83620 and previous config saved to /var/cache/conftool/dbconfig/20251007-061509-root.json
06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:06 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83619 and previous config saved to /var/cache/conftool/dbconfig/20251007-060003-root.json
05:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
05:44 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83618 and previous config saved to /var/cache/conftool/dbconfig/20251007-054457-root.json
05:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2237 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83617 and previous config saved to /var/cache/conftool/dbconfig/20251007-053628-root.json
05:36 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
05:03 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
05:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.19 (duration: 02m 32s)
03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678 (duration: 45m 18s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 28s)
01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
00:27 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie

2025-10-06

23:35 jdlrobson@deploy2002: Finished scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) (duration: 11m 30s)
23:30 jdlrobson@deploy2002: jdlrobson: Continuing with sync
23:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
23:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
23:28 jdlrobson@deploy2002: jdlrobson: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
23:23 jdlrobson@deploy2002: Started scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122)
23:13 jdlrobson@deploy2002: Finished scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361) (duration: 09m 47s)
23:08 jdlrobson@deploy2002: jdlrobson, lmora: Continuing with sync
23:07 jdlrobson@deploy2002: jdlrobson, lmora: Backport for Remove old, unused ArticleSummaries Stream (T406361) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
23:03 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
23:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
23:03 jdlrobson@deploy2002: Started scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361)
22:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
22:48 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
22:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
22:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
22:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
22:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
22:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:59 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
21:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
21:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
21:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
21:37 eileen: config revision changed from 65339a1a to 02eee6ac
21:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
21:29 sbassett: Deployed security mitigation for T251032
21:28 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
21:25 eileen: civicrm upgraded from 17092e23 to eac2de65
21:25 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:24 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:14 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:11 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
20:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
20:39 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
20:35 dani@deploy2002: Finished scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 37s)
20:31 dani@deploy2002: dani: Continuing with sync
20:30 dani@deploy2002: dani: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:26 dani@deploy2002: Started scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577)
20:24 arlolra@deploy2002: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) (duration: 10m 43s)
20:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
20:19 arlolra@deploy2002: arlolra: Continuing with sync
20:19 arlolra@deploy2002: arlolra: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:16 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
20:13 arlolra@deploy2002: Started scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250)
20:10 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) (duration: 14m 13s)
20:04 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
20:04 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:04 samtar@deploy2002: samtar: Continuing with sync
20:04 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
20:01 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
19:58 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
19:56 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
19:56 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
19:56 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)
19:49 btullis@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
19:45 musikanimal@deploy2002: Finished scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) (duration: 39m 00s)
19:33 musikanimal@deploy2002: musikanimal: Continuing with sync
19:32 musikanimal@deploy2002: musikanimal: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
19:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
19:10 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
19:06 musikanimal@deploy2002: Started scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194)
18:53 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
18:40 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
18:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
18:02 ejegg: fundraising python tools upgraded from 3fba9888 to 698309f1
17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
17:42 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-misc2001.codfw.wmnet with OS bookworm
17:42 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:31 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:29 jasmine@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: decom
17:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
17:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
17:13 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
17:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2051 - Depooling host
17:12 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2051 - Depooling host
16:46 otto@deploy2002: Finished deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666 (duration: 02m 11s)
16:44 otto@deploy2002: Started deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666
16:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
16:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
16:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
16:22 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
16:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:06 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:55 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 59s)
15:55 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
15:53 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 59s)
15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test1005.wikimedia.org
15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1005.wikimedia.org with OS trixie
15:39 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2051 T402859', diff saved to https://phabricator.wikimedia.org/P83607 and previous config saved to /var/cache/conftool/dbconfig/20251006-153927-fceratto.json
15:32 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
15:27 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
15:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2002.wikimedia.org
15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS trixie
15:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
15:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
15:08 moritzm: installing libxslt security updates
15:06 moritzm: installing libcpanel-json-xs-perl security updates
15:03 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
14:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host hcaptcha1001.wikimedia.org
14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
14:56 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test1005.wikimedia.org with OS trixie
14:55 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
14:55 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
14:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox
14:51 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
14:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
14:42 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS trixie
14:42 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:40 marostegui@cumin1003: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
14:40 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2002.wikimedia.org on all recursors
14:40 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2002.wikimedia.org on all recursors
14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
14:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1028.eqiad.wmnet with reason: Maintenance
14:39 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
14:38 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
14:37 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
14:36 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
14:36 sukhe@cumin1003: START - Cookbook sre.dns.netbox
14:36 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2002.wikimedia.org
14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2001.wikimedia.org
14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS trixie
14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
14:19 Lucas_WMDE: UTC afternoon backport+config window done
14:17 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: namespaceDupes diqwiki --fix # T328207
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) (duration: 11m 31s)
14:13 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
14:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Continuing with sync
14:06 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
14:06 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
14:04 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
14:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480)
13:58 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:58 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS trixie
13:53 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
13:52 cdanis@deploy2002: Finished scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) (duration: 12m 24s)
13:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2001.wikimedia.org on all recursors
13:52 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2001.wikimedia.org on all recursors
13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
13:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
13:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
13:48 sukhe@cumin1003: START - Cookbook sre.dns.netbox
13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
13:48 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:47 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:47 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
13:46 cdanis@deploy2002: cdanis, otto: Continuing with sync
13:46 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2001.wikimedia.org
13:46 cdanis@deploy2002: cdanis, otto: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:44 sukhe@cumin1003: START - Cookbook sre.dns.netbox
13:44 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
13:39 cdanis@deploy2002: Started scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373)
13:37 bwojtowicz@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:34 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:29 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
13:19 mfossati@deploy2002: Finished scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) (duration: 11m 32s)
13:15 mfossati@deploy2002: mfossati: Continuing with sync
13:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
13:14 mfossati@deploy2002: mfossati: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
13:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
13:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
13:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
13:08 mfossati@deploy2002: Started scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259)
12:55 hashar: Restarting Zuul. Deadlocked due to zombie connections with Gerrit
12:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
12:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
12:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:43 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest1005
12:43 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
12:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
12:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
12:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
12:39 arnaudb@dns1004: END - running authdns-update
12:38 arnaudb@dns1004: START - running authdns-update
12:37 jclark@cumin1002: START - Cookbook sre.dns.netbox
12:29 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:29 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
12:28 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:27 arnaudb@cumin1003: END (ERROR) - Cookbook sre.gerrit.failover (exit_code=97) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:22 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
12:22 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
12:20 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:08 moritzm: upgrade Envoy on yarn/turnilo hosts T403663
12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
12:07 hashar: stopped CI Jenkins
12:07 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
12:05 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
12:05 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
12:05 arnaudb@dns1004: START - running authdns-update
12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
11:25 Amir1: dropping interwiki table on group2 (T397367)
11:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
11:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
11:17 Amir1: dropping interwiki table on group1 (T397367)
11:15 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
10:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
10:53 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 214657
10:52 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 214657
10:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:42 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
10:41 elukey: upgraded spicerack to 11.10.0 on all cumin nodes
10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1005.wikimedia.org on all recursors
10:40 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test1005.wikimedia.org on all recursors
10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
10:40 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
10:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
10:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
10:39 vgutierrez: upgrading to haproxy 2.8.16 on magru - T406451
10:36 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
10:36 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test1005.wikimedia.org
10:33 moritzm: restarting postfix to pick up openssl security updates
10:26 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-eqiad
10:12 moritzm: restarting spamsasssin/clamav on VRTS to pick up OpenSSL updates
10:12 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
10:00 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
10:00 vgutierrez: upgrade to haproxy 2.8.16 on cp7008 and cp7016 - T406451
09:55 vgutierrez: fetch haproxy 2.8.16 on thirdparty/haproxy28-bullseye (apt.wm.o) - T406451
09:35 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
09:33 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
09:26 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
09:23 moritzm: upgrade Envoy on schema* T403663
09:18 elukey: uploaded spicerack_11.10.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
08:56 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-eqiad
08:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
08:40 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
08:09 moritzm: installing OpenSSL security updates on trixie/bookworm
08:07 dcausse: closing the UTC morning backport window
08:06 dcausse@deploy2002: Finished scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) (duration: 12m 48s)
08:01 dcausse@deploy2002: hamishz, dcausse: Continuing with sync
08:00 dcausse@deploy2002: hamishz, dcausse: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:53 dcausse@deploy2002: Started scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858)
07:49 kharlan@deploy2002: Finished scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) (duration: 11m 42s)
07:44 kharlan@deploy2002: kharlan: Continuing with sync
07:43 kharlan@deploy2002: kharlan: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:37 kharlan@deploy2002: Started scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239)
07:34 kharlan@deploy2002: Finished scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) (duration: 14m 04s)
07:32 moritzm: rebalance Ganeti codfw/A following vmscape reboots
07:30 kharlan@deploy2002: kharlan: Continuing with sync
07:26 kharlan@deploy2002: kharlan: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:20 kharlan@deploy2002: Started scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096)
07:02 kharlan@deploy2002: Finished scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) (duration: 42m 35s)
07:00 moritzm: rebalance Ganeti eqiad/A following vmscape reboots
06:49 kharlan@deploy2002: kharlan: Continuing with sync
06:47 kharlan@deploy2002: kharlan: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
06:19 kharlan@deploy2002: Started scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466)
06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
06:11 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
06:11 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
05:43 marostegui@dns1006: END - running authdns-update
05:41 marostegui@dns1006: START - running authdns-update
04:49 eileen: civicrm upgraded from ff529ecf to 17092e23
01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 31s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-05

23:50 eileen: civicrm upgraded from 7c31a25c to ff529ecf
23:19 eileen: config revision changed from 0d78c876 to 276d34f0
01:02 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 24s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-04

01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 44s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-03

19:37 mutante: LDAP added user btracy to group wmf T405366
19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
18:50 ejegg: payments-wiki upgraded from e8ef5539 to 4b8293df
18:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
18:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:33 jasmine@dns1004: END - running authdns-update
17:31 jasmine@dns1004: START - running authdns-update
17:30 jasmine@dns1004: START - running authdns-update
17:27 jasmine@dns1004: START - running authdns-update
17:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
17:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
17:03 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
17:02 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
16:59 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
15:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
15:44 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
15:37 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
15:37 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
15:27 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
13:37 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:18 stevemunene@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:16 stevemunene@cumin1003: START - Cookbook sre.dns.netbox
13:11 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:08 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:08 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
13:07 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
13:02 logmsgbot: reedy Deployed security patch for T406322
12:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:16 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:12 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:11 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:11 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:11 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:57 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
11:15 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
11:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:27 topranks: reset PIC 1/0 on cr2-eqiad to configure port 5 speed T402588
10:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,cr2-eqord,cr1-magru,ssw1-f1-eqiad with reason: reset PIC 0/1 in cr2 to set port 5 speed
10:21 topranks: drain traffic from cr2-codfw <-> ssw1-f1-codfw link to allow for cr2-codfw card reset T402588
10:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
10:14 btullis@cumin1003: START - Cookbook sre.dns.netbox
10:14 topranks: drain transport circuits on PIC 1/0 of cr2-eqiad to allow for card reboot T402588
10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts druid1008.eqiad.wmnet
10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
10:09 btullis@cumin1003: START - Cookbook sre.dns.netbox
10:02 cmooney@cumin1003: START - Cookbook sre.dns.netbox
10:01 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1008.eqiad.wmnet
10:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1007.eqiad.wmnet
10:00 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:00 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
09:59 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
09:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2057.codfw.wmnet']
09:56 btullis@cumin1003: START - Cookbook sre.dns.netbox
09:55 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
09:48 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1007.eqiad.wmnet
09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
09:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
09:40 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
09:33 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
09:27 jynus@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
09:11 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2054.codfw.wmnet']
09:07 jynus@cumin1003: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
09:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
08:59 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
08:46 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2028.codfw.wmnet onto es2051.codfw.wmnet
08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
08:44 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:44 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
08:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
08:33 brouberol@cumin1003: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
08:31 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2053.codfw.wmnet']
08:29 brouberol@cumin1003: START - Cookbook sre.wdqs.restart
08:25 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
08:25 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:05 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
08:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
07:51 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
07:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
07:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
07:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:40 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:38 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:16 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:16 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:12 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
04:47 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
04:41 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
04:40 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
04:32 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
03:36 tstarling@deploy2002: Finished scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) (duration: 11m 15s)
03:31 tstarling@deploy2002: tstarling: Continuing with sync
03:30 tstarling@deploy2002: tstarling: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
03:24 tstarling@deploy2002: Started scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192)
01:30 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 12s)
01:03 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
01:02 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
00:57 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
00:56 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
00:49 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
00:49 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
00:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
00:43 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
00:38 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
00:32 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm

2025-10-02

23:24 samwilson@deploy2002: Finished scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang (duration: 16m 07s)
23:20 samwilson@deploy2002: samwilson: Continuing with sync
23:10 samwilson@deploy2002: samwilson: Backport for Fetch wikitext from the translation lang subpage, not the baselang synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
23:08 samwilson@deploy2002: Started scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang
22:46 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
22:15 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
21:53 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
21:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) (duration: 12m 37s)
21:47 zabe@deploy2002: zabe: Continuing with sync
21:46 zabe@deploy2002: zabe: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:41 zabe@deploy2002: Started scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951)
21:37 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
21:30 ejegg: donorwiki upgraded from dc7cda24 to e8ef5539
21:30 ejegg: payments-wiki upgraded from 2b281477 to e8ef5539
21:27 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
21:27 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:26 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:25 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:25 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:17 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) (duration: 12m 35s)
21:12 samtar@deploy2002: samtar: Continuing with sync
21:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:08 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:04 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:04 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)
21:03 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:47 ebomani@deploy2002: Finished scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) (duration: 13m 17s)
20:45 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
20:42 ebomani@deploy2002: reedy, ebomani: Continuing with sync
20:40 ebomani@deploy2002: reedy, ebomani: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:40 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
20:33 ebomani@deploy2002: Started scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936)
20:30 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) (duration: 09m 29s)
20:30 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:26 ebernhardson@deploy2002: ebernhardson: Continuing with sync
20:25 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83587 and previous config saved to /var/cache/conftool/dbconfig/20251002-202536-ladsgroup.json
20:23 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:23 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:21 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858)
20:16 dani@deploy2002: Finished scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410) (duration: 11m 29s)
20:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s1 in eqiad', diff saved to https://phabricator.wikimedia.org/P83586 and previous config saved to /var/cache/conftool/dbconfig/20251002-201611-ladsgroup.json
20:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83585 and previous config saved to /var/cache/conftool/dbconfig/20251002-201532-ladsgroup.json
20:15 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:12 dani@deploy2002: dani: Continuing with sync
20:11 dani@deploy2002: dani: Backport for Deploy reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:09 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s8 in eqiad', diff saved to https://phabricator.wikimedia.org/P83584 and previous config saved to /var/cache/conftool/dbconfig/20251002-200948-ladsgroup.json
20:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83583 and previous config saved to /var/cache/conftool/dbconfig/20251002-200621-ladsgroup.json
20:05 dani@deploy2002: Started scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410)
20:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83582 and previous config saved to /var/cache/conftool/dbconfig/20251002-200354-ladsgroup.json
20:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83581 and previous config saved to /var/cache/conftool/dbconfig/20251002-200143-ladsgroup.json
19:59 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
19:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83580 and previous config saved to /var/cache/conftool/dbconfig/20251002-195426-ladsgroup.json
19:49 samtar@deploy2002: Finished scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) (duration: 10m 46s)
19:44 samtar@deploy2002: samtar: Continuing with sync
19:44 samtar@deploy2002: samtar: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
19:38 samtar@deploy2002: Started scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575)
19:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83579 and previous config saved to /var/cache/conftool/dbconfig/20251002-193217-ladsgroup.json
19:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83578 and previous config saved to /var/cache/conftool/dbconfig/20251002-192928-ladsgroup.json
19:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83577 and previous config saved to /var/cache/conftool/dbconfig/20251002-192726-ladsgroup.json
19:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83576 and previous config saved to /var/cache/conftool/dbconfig/20251002-191918-ladsgroup.json
19:14 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
19:11 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
19:08 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
18:58 ladsgroup@deploy2002: Finished scap sync-world: Backport for db-production: Enable shuffle sharding (T405087) (duration: 22m 32s)
18:53 ladsgroup@deploy2002: ladsgroup: Continuing with sync
18:41 ladsgroup@deploy2002: ladsgroup: Backport for db-production: Enable shuffle sharding (T405087) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
18:35 ladsgroup@deploy2002: Started scap sync-world: Backport for db-production: Enable shuffle sharding (T405087)
18:27 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
17:50 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:44 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
17:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:40 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:25 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
17:03 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
16:42 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
16:42 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
15:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2035.codfw.wmnet
15:42 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2035.codfw.wmnet
15:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
15:12 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
15:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
14:58 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
14:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox
14:36 topranks: reset PIC 0/1 on cr1-eqiad to set port speed for port 5 T402588
14:36 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,ssw1-e1-eqiad with reason: reset PIC 0/1 in cr1-eqiad to set port 5 speed
14:28 topranks: drain link from cr1-eqiad <-> ssw1-e1-eqiad to allow PIC card reboot on cr1-eqiad T402588
14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
14:25 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
14:25 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
14:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
14:17 topranks: drain transport circuit cr1-eqiad <-> cr1-codfw to allow for PIC card reboot on cr1-eqiad T402588
14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
14:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
14:10 tgr_: UTC afternoon deploys done
14:10 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
14:08 tgr@deploy2002: Finished scap sync-world: Backport for Enable JWT session cookies on group1 (T399631) (duration: 17m 41s)
14:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
14:04 tgr@deploy2002: tgr: Continuing with sync
14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2046.codfw.wmnet
14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
13:58 tgr@deploy2002: tgr: Backport for Enable JWT session cookies on group1 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
13:51 tgr@deploy2002: Started scap sync-world: Backport for Enable JWT session cookies on group1 (T399631)
13:47 jforrester@deploy2002: Finished scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (duration: 11m 39s)
13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
13:44 moritzm: failover Ganeti master in eqiad to ganeti1048
13:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
13:42 jforrester@deploy2002: jforrester: Continuing with sync
13:42 jforrester@deploy2002: jforrester: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:41 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2035']
13:39 jayme@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2035.codfw.wmnet with reason: Hardware failure
13:35 jforrester@deploy2002: Started scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator"
13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) (duration: 12m 56s)
13:29 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
13:27 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:23 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808)
13:17 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
13:16 dani@deploy2002: Finished scap sync-world: Backport for Update reader foundational survey on enwiki (T405410) (duration: 11m 54s)
13:11 dani@deploy2002: dani: Continuing with sync
13:11 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
13:10 dani@deploy2002: dani: Backport for Update reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
13:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
13:04 dani@deploy2002: Started scap sync-world: Backport for Update reader foundational survey on enwiki (T405410)
12:57 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2035']
12:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
12:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
12:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
12:10 moritzm: failover Ganeti master in codfw to ganeti2048
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
12:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
12:06 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
12:06 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2028.codfw.wmnet onto es2051.codfw.wmnet
12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
11:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
11:45 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on druid[1007-1008].eqiad.wmnet with reason: Decommissioning druid_public hosts
11:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:35 moritzm: failover Ganeti master in drmrs02 to ganeti6002
11:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
11:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
11:21 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:20 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:19 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:19 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:18 moritzm: installing postgresql security updates on netboxdb nodes
11:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6003.drmrs.wmnet
11:12 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:12 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
11:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
11:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
11:02 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
10:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
10:57 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
10:52 zabe@deploy2002: Finished scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" (duration: 11m 06s)
10:48 moritzm: failover Ganeti master in drmrs01 to ganeti6001
10:48 zabe@deploy2002: zabe: Continuing with sync
10:47 zabe@deploy2002: zabe: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
10:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:41 zabe@deploy2002: Started scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1"
10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
10:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
10:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
10:02 moritzm: installing OpenSSL security updates on trixie/bookworm
10:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
09:59 moritzm: failover Ganeti master in eqsin to ganeti5007
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
09:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
09:17 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.21 refs T405677
09:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
09:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
09:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2051.codfw.wmnet with reason: Setting up new ES host
09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
08:55 awight@deploy2002: Finished scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) (duration: 48m 54s)
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
08:43 awight@deploy2002: awight, hashar: Continuing with sync
08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
08:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
08:35 hashar@deploy2002: Finished deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 12s)
08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
08:35 hashar@deploy2002: deploy aborted: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 00s)
08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
08:34 awight@deploy2002: awight, hashar: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verif
08:16 brouberol@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-druid1007.eqiad.wmnet with reason: Hosts are being decomissioned
08:10 moritzm: failover Ganeti master in ulsfo to ganeti4008
08:06 awight@deploy2002: Started scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002)
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
08:05 root@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2035.codfw.wmnet
08:02 root@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2035.codfw.wmnet
07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
07:54 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
07:45 hashar@deploy2002: Finished scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) (duration: 15m 40s)
07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
07:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
07:41 hashar@deploy2002: eggroll97, hashar: Continuing with sync
07:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
07:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:36 hashar@deploy2002: eggroll97, hashar: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
07:29 hashar@deploy2002: Started scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999)
07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
07:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
07:07 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
07:06 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
06:26 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) (duration: 23m 01s)
06:21 krinkle@deploy2002: krinkle: Continuing with sync
06:09 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
06:03 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510)
03:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
02:55 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
02:27 musikanimal@deploy2002: Finished scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967) (duration: 13m 47s)
02:22 musikanimal@deploy2002: musikanimal: Continuing with sync
02:20 musikanimal@deploy2002: musikanimal: Backport for Enable debug logging for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
02:13 musikanimal@deploy2002: Started scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967)
02:02 musikanimal@deploy2002: Finished scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) (duration: 12m 25s)
01:57 musikanimal@deploy2002: musikanimal: Continuing with sync
01:56 musikanimal@deploy2002: musikanimal: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
01:50 musikanimal@deploy2002: Started scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967)
01:16 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 38s)
01:02 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
01:02 musikanimal@deploy2002: Finished scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) (duration: 11m 14s)
00:57 musikanimal@deploy2002: musikanimal: Continuing with sync
00:57 musikanimal@deploy2002: musikanimal: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
00:50 musikanimal@deploy2002: Started scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967)
00:29 musikanimal@deploy2002: Finished scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967) (duration: 13m 30s)
00:22 musikanimal@deploy2002: musikanimal: Continuing with sync
00:22 musikanimal@deploy2002: musikanimal: Backport for Increase timeout for MessageIndex lock (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
00:15 musikanimal@deploy2002: Started scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967)

2025-10-01

23:16 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
23:14 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
22:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
22:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
22:52 bvibber@deploy2002: Finished scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) (duration: 40m 32s)
22:40 bvibber@deploy2002: egardner, bvibber: Continuing with sync
22:39 bvibber@deploy2002: egardner, bvibber: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
22:13 TimStarling: migrating wishes to CommunityRequests with migrateFromGadget.php
22:12 bvibber@deploy2002: Started scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398)
22:08 tstarling@deploy2002: Finished scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) (duration: 10m 42s)
22:04 tstarling@deploy2002: musikanimal, tstarling: Continuing with sync
22:02 tstarling@deploy2002: musikanimal, tstarling: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:57 tstarling@deploy2002: Started scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967)
21:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
21:56 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:39 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:36 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:35 jforrester@deploy2002: Finished scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) (duration: 09m 39s)
21:31 jforrester@deploy2002: jforrester: Continuing with sync
21:30 jforrester@deploy2002: jforrester: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:25 jforrester@deploy2002: Started scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682)
21:18 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
21:17 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
21:17 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:16 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:15 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:15 tstarling@deploy2002: Finished scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967) (duration: 07m 36s)
21:15 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:11 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
21:11 tstarling@deploy2002: tstarling: Continuing with sync
21:10 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
21:10 tstarling@deploy2002: tstarling: Backport for Configure CommunityRequests virtual domain (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
21:10 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:09 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:07 tstarling@deploy2002: Started scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967)
21:07 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:06 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:05 arlolra@deploy2002: Finished scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension" (duration: 09m 47s)
21:00 arlolra@deploy2002: arlolra: Continuing with sync
20:59 arlolra@deploy2002: arlolra: Backport for Revert "Add parsoid support in ProofreadPage extension" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:55 arlolra@deploy2002: Started scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension"
20:51 derick@deploy2002: Finished scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" (duration: 12m 46s)
20:46 derick@deploy2002: d3r1ck01, derick: Continuing with sync
20:44 derick@deploy2002: d3r1ck01, derick: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:38 derick@deploy2002: Started scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis"
20:34 derick@deploy2002: Finished scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) (duration: 12m 57s)
20:30 derick@deploy2002: derick, d3r1ck01: Continuing with sync
20:27 derick@deploy2002: derick, d3r1ck01: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
20:21 derick@deploy2002: Started scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)
19:49 mutante: cloud
19:13 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239) (duration: 26m 24s)
19:11 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:08 kharlan@deploy2002: kharlan: Continuing with sync
18:53 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable A/B test for frwiki (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
18:46 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239)
18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.21 refs T405677
16:39 swfrench@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
16:34 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
16:33 swfrench@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
{{safesubst:SAL entry|1=16:23 kharlan@deploy2002: Finished scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Sim}}
16:19 kharlan@deploy2002: kharlan: Continuing with sync
{{safesubst:SAL entry|1=16:17 kharlan@deploy2002: kharlan: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|SimpleCaptcha::canSk}}
16:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
{{safesubst:SAL entry|1=16:10 kharlan@deploy2002: Started scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Simp}}
16:07 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
16:07 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
15:57 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
15:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
15:51 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:49 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:49 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:35 claime: Finished eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
15:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
15:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
15:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:28 cgoubert@deploy2002: Finished scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 03m 16s)
15:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
15:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1259 after maint T401906', diff saved to https://phabricator.wikimedia.org/P83573 and previous config saved to /var/cache/conftool/dbconfig/20251001-152620-ladsgroup.json
15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
15:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
15:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
15:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
15:07 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
15:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
15:04 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:49 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
14:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
14:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:43 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:43 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
14:37 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
14:34 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
14:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
14:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
14:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
14:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
14:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
14:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
14:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
14:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
14:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
14:28 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
14:28 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
14:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
14:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
14:25 cgoubert@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 201m 05s)
14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
14:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
14:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
14:22 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
14:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
14:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
14:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
14:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
14:20 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
14:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
14:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
14:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
14:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
14:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
14:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
14:16 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=codfw
14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
14:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
14:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
14:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
14:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
14:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
14:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
14:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
14:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83572 and previous config saved to /var/cache/conftool/dbconfig/20251001-140538-fceratto.json
14:05 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
14:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83571 and previous config saved to /var/cache/conftool/dbconfig/20251001-140422-fceratto.json
14:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1259.eqiad.wmnet with reason: Maintenance
14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83570 and previous config saved to /var/cache/conftool/dbconfig/20251001-140400-fceratto.json
14:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
14:01 cgoubert@cumin1003: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=toolhub.*
14:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
13:56 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs2016\.codfw\.wmnet
13:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
13:51 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 239 hosts with reason: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
13:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83569 and previous config saved to /var/cache/conftool/dbconfig/20251001-134852-fceratto.json
13:46 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
13:44 SandraEbele_: Deployed refinery-source using jenkins(weekly deployment train)
13:44 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
13:44 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
13:35 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
13:35 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
13:34 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
13:34 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:33 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83568 and previous config saved to /var/cache/conftool/dbconfig/20251001-133344-fceratto.json
13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
13:33 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:31 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:31 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:30 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:29 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:28 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:28 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:24 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:24 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:24 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83566 and previous config saved to /var/cache/conftool/dbconfig/20251001-131836-fceratto.json
13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83565 and previous config saved to /var/cache/conftool/dbconfig/20251001-131719-fceratto.json
13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1254.eqiad.wmnet with reason: Maintenance
13:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
13:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83564 and previous config saved to /var/cache/conftool/dbconfig/20251001-131639-fceratto.json
13:13 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
13:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1172 after upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83563 and previous config saved to /var/cache/conftool/dbconfig/20251001-131033-ladsgroup.json
13:07 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1258* gradually with 4 steps - Work done
13:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83561 and previous config saved to /var/cache/conftool/dbconfig/20251001-130131-fceratto.json
12:56 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
12:53 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
12:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1172 for upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83559 and previous config saved to /var/cache/conftool/dbconfig/20251001-125120-ladsgroup.json
12:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Upgrade to 10.11
12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83558 and previous config saved to /var/cache/conftool/dbconfig/20251001-124622-fceratto.json
12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83556 and previous config saved to /var/cache/conftool/dbconfig/20251001-123115-fceratto.json
12:31 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83555 and previous config saved to /var/cache/conftool/dbconfig/20251001-122959-fceratto.json
12:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
12:29 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83554 and previous config saved to /var/cache/conftool/dbconfig/20251001-122936-fceratto.json
12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
12:21 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1258* gradually with 4 steps - Work done
12:21 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1258.eqiad.wmnet
12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
12:19 mvernon@cumin2002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
12:15 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1258 - Upgrading db1258.eqiad.wmnet
12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1258 - Upgrading db1258.eqiad.wmnet
12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1258.eqiad.wmnet
12:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83552 and previous config saved to /var/cache/conftool/dbconfig/20251001-121429-fceratto.json
12:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1258 T406116', diff saved to https://phabricator.wikimedia.org/P83551 and previous config saved to /var/cache/conftool/dbconfig/20251001-121339-ladsgroup.json
12:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:11 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
12:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1255 to x3 primary T406116', diff saved to https://phabricator.wikimedia.org/P83550 and previous config saved to /var/cache/conftool/dbconfig/20251001-120629-ladsgroup.json
12:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:06 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
12:06 Amir1: Starting x3 eqiad failover from db1258 to db1255 - T406116
12:05 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1255 with weight 0 T406116', diff saved to https://phabricator.wikimedia.org/P83549 and previous config saved to /var/cache/conftool/dbconfig/20251001-120140-ladsgroup.json
12:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x3 T406116
11:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83548 and previous config saved to /var/cache/conftool/dbconfig/20251001-115922-fceratto.json
11:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:58 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:48 cgoubert@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83547 and previous config saved to /var/cache/conftool/dbconfig/20251001-114414-fceratto.json
11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83546 and previous config saved to /var/cache/conftool/dbconfig/20251001-114259-fceratto.json
11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
11:42 hnowlan: manually bumped thumbor replicas in codfw to 140
11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83545 and previous config saved to /var/cache/conftool/dbconfig/20251001-114214-fceratto.json
11:41 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
11:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:39 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:37 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:35 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:29 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83544 and previous config saved to /var/cache/conftool/dbconfig/20251001-112707-fceratto.json
11:25 Amir1: dropping two unused tables in phabricator db (T403542)
11:18 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=codfw
11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83542 and previous config saved to /var/cache/conftool/dbconfig/20251001-111159-fceratto.json
11:05 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=toolhub.*
11:04 cgoubert@cumin1003: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool toolhub in eqiad: maintenance
11:04 cgoubert@cumin1003: START - Cookbook sre.discovery.service-route depool toolhub in eqiad: maintenance
11:03 cgoubert@deploy2002: Locking from deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
11:03 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
11:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
11:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
11:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
11:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
11:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
11:02 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
11:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
11:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
11:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
11:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/ratelimit: apply
11:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
11:00 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
10:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
10:59 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
10:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
10:59 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
10:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
10:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
10:58 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
10:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
10:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83541 and previous config saved to /var/cache/conftool/dbconfig/20251001-105652-fceratto.json
10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83540 and previous config saved to /var/cache/conftool/dbconfig/20251001-105538-fceratto.json
10:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83539 and previous config saved to /var/cache/conftool/dbconfig/20251001-105514-fceratto.json
10:55 claime: Starting eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
10:45 hashar@deploy2002: Finished scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) (duration: 13m 47s)
10:40 hashar@deploy2002: hashar, dreamyjazz: Continuing with sync
10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83538 and previous config saved to /var/cache/conftool/dbconfig/20251001-104006-fceratto.json
10:36 hashar@deploy2002: hashar, dreamyjazz: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
10:31 hashar@deploy2002: Started scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094)
10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83537 and previous config saved to /var/cache/conftool/dbconfig/20251001-102458-fceratto.json
10:11 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:11 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:11 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83536 and previous config saved to /var/cache/conftool/dbconfig/20251001-100951-fceratto.json
10:09 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:08 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83535 and previous config saved to /var/cache/conftool/dbconfig/20251001-100837-fceratto.json
10:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83534 and previous config saved to /var/cache/conftool/dbconfig/20251001-100814-fceratto.json
09:59 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) (duration: 15m 47s)
09:54 kharlan@deploy2002: kharlan: Continuing with sync
09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83533 and previous config saved to /var/cache/conftool/dbconfig/20251001-095306-fceratto.json
09:50 kharlan@deploy2002: kharlan: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
09:48 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
09:44 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239)
09:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83532 and previous config saved to /var/cache/conftool/dbconfig/20251001-093758-fceratto.json
09:28 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
09:28 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
09:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83531 and previous config saved to /var/cache/conftool/dbconfig/20251001-092251-fceratto.json
09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83530 and previous config saved to /var/cache/conftool/dbconfig/20251001-092136-fceratto.json
09:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83529 and previous config saved to /var/cache/conftool/dbconfig/20251001-092112-fceratto.json
09:17 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
09:17 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
09:14 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:14 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:12 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:06 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:06 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83528 and previous config saved to /var/cache/conftool/dbconfig/20251001-090604-fceratto.json
08:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
08:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83527 and previous config saved to /var/cache/conftool/dbconfig/20251001-085056-fceratto.json
08:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83526 and previous config saved to /var/cache/conftool/dbconfig/20251001-083549-fceratto.json
08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83525 and previous config saved to /var/cache/conftool/dbconfig/20251001-083435-fceratto.json
08:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83524 and previous config saved to /var/cache/conftool/dbconfig/20251001-083412-fceratto.json
08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83523 and previous config saved to /var/cache/conftool/dbconfig/20251001-081905-fceratto.json
08:13 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
08:10 Emperor: restart swift on ms-fe2012 T360913
08:08 bwojtowicz@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83522 and previous config saved to /var/cache/conftool/dbconfig/20251001-080357-fceratto.json
07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83521 and previous config saved to /var/cache/conftool/dbconfig/20251001-074850-fceratto.json
07:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83520 and previous config saved to /var/cache/conftool/dbconfig/20251001-074736-fceratto.json
07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
07:10 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) (duration: 14m 09s)
07:05 kharlan@deploy2002: kharlan: Continuing with sync
07:02 kharlan@deploy2002: kharlan: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
06:55 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239)
06:40 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) (duration: 22m 34s)
06:35 kharlan@deploy2002: kharlan: Continuing with sync
06:22 kharlan@deploy2002: kharlan: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
06:17 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744)
04:54 TimStarling: on x1 metawiki creating tables for CommunityRequests
02:31 musikanimal@deploy2002: Finished scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage (duration: 12m 19s)
02:26 musikanimal@deploy2002: musikanimal: Continuing with sync
02:26 musikanimal@deploy2002: musikanimal: Backport for AbstractRenderer: fix extistence dependency on Votes subpage synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
02:19 musikanimal@deploy2002: Started scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage
01:52 musikanimal@deploy2002: Finished scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) (duration: 10m 47s)
01:47 musikanimal@deploy2002: musikanimal: Continuing with sync
01:46 musikanimal@deploy2002: musikanimal: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
01:41 musikanimal@deploy2002: Started scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748)
01:28 musikanimal@deploy2002: Finished scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) (duration: 10m 53s)
01:23 musikanimal@deploy2002: musikanimal: Continuing with sync
01:22 musikanimal@deploy2002: musikanimal: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
01:17 musikanimal@deploy2002: Started scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234)
01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 33s)
01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
00:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
00:00 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikidata (T403510) (duration: 13m 23s)

Other archives

See Server Admin Log/Archives.