Jump to content

Server Admin Log

From Wikitech
(Redirected from Server admin log)

2025-10-20

  • 05:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84071 and previous config saved to /var/cache/conftool/dbconfig/20251020-052438-root.json
  • 05:20 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1027 from dbctl T407595', diff saved to https://phabricator.wikimedia.org/P84070 and previous config saved to /var/cache/conftool/dbconfig/20251020-052057-marostegui.json
  • 05:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84069 and previous config saved to /var/cache/conftool/dbconfig/20251020-051712-marostegui.json
  • 05:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 05:04 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 05:04 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
  • 05:03 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
  • 05:03 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 52s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-19

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-18

  • 08:45 brett@dns1004: END - running authdns-update
  • 08:44 brett@dns1004: START - running authdns-update
  • 08:25 brett@dns1004: END - running authdns-update
  • 08:23 brett@dns1004: START - running authdns-update

2025-10-17

  • 21:49 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:48 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:45 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:44 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:42 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:29 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:29 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:26 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:26 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 21:21 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:20 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 20:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:44 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 20:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:37 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
  • 20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:18 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
  • 20:17 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 20:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 19:50 ejegg: donorwiki upgraded from 70a7050f to 039e5a15
  • 19:50 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:49 ejegg: payments-wiki upgraded from 70a7050f to 039e5a15
  • 19:11 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:11 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:45 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 17:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:08 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:09 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 16:01 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 15:33 Dreamy_Jazz: Ran `mwscript-k8s --comment='First emails to users to get them to confirm their email address for T58074' extensions/WikimediaMaintenance/sendVerifyEmailReminderNotification.php --wiki=metawiki 20250917000000`
  • 13:09 vgutierrez: updating ca-certificates package on bookworm puppetservers
  • 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84067 and previous config saved to /var/cache/conftool/dbconfig/20251017-130106-root.json
  • 12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84066 and previous config saved to /var/cache/conftool/dbconfig/20251017-124600-root.json
  • 12:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84064 and previous config saved to /var/cache/conftool/dbconfig/20251017-123054-root.json
  • 12:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84063 and previous config saved to /var/cache/conftool/dbconfig/20251017-121548-root.json
  • 12:07 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1195 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84062 and previous config saved to /var/cache/conftool/dbconfig/20251017-120737-marostegui.json
  • 12:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2248.codfw.wmnet onto db2246.codfw.wmnet
  • 11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 11:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 10:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:35 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:35 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:08 eileen: civicrm upgraded from ab1d21dc to 7b70cb83
  • 10:05 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:05 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:03 topranks: un-draining Arelion 100G transport eqiad <-> codfw following carrier fibre fix and return to stability T407578
  • 10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:02 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:02 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:37 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 08:47 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 08:46 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 08:19 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
  • 08:19 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
  • 08:19 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2246.codfw.wmnet
  • 08:08 topranks: draining Arelion eqiad <-> codfw transport wiht OSPF metric and re-enabling port on cr1-eqiad
  • 08:04 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 07:42 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84056 and previous config saved to /var/cache/conftool/dbconfig/20251017-074221-root.json
  • 07:27 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84055 and previous config saved to /var/cache/conftool/dbconfig/20251017-072715-root.json
  • 07:12 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84054 and previous config saved to /var/cache/conftool/dbconfig/20251017-071209-root.json
  • 06:57 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84053 and previous config saved to /var/cache/conftool/dbconfig/20251017-065703-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84052 and previous config saved to /var/cache/conftool/dbconfig/20251017-064157-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84051 and previous config saved to /var/cache/conftool/dbconfig/20251017-062651-root.json
  • 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84050 and previous config saved to /var/cache/conftool/dbconfig/20251017-061145-root.json
  • 05:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84049 and previous config saved to /var/cache/conftool/dbconfig/20251017-055639-root.json
  • 05:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es1027.eqiad.wmnet with reason: Cloning
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T407595', diff saved to https://phabricator.wikimedia.org/P84048 and previous config saved to /var/cache/conftool/dbconfig/20251017-054458-marostegui.json
  • 05:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84047 and previous config saved to /var/cache/conftool/dbconfig/20251017-054133-root.json
  • 05:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84046 and previous config saved to /var/cache/conftool/dbconfig/20251017-052627-root.json
  • 05:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84045 and previous config saved to /var/cache/conftool/dbconfig/20251017-051121-root.json
  • 05:11 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1056 to dbctl T406488', diff saved to https://phabricator.wikimedia.org/P84044 and previous config saved to /var/cache/conftool/dbconfig/20251017-051114-marostegui.json
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 04s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-16

  • 23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 23:19 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 23:18 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 23:18 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 23:17 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 23:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 23:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 23:15 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 23:15 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 23:13 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 23:13 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 23:12 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 23:10 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 23:10 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 23:09 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 23:09 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 23:08 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 23:07 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 23:07 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 23:06 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 23:06 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 23:02 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:59 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:59 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 22:58 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 22:58 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:57 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:55 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 22:43 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:42 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:39 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:39 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 22:35 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 22:34 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:32 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:32 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 22:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 22:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:24 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:24 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 22:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:04 sbassett: Deployed security fix for T407131
  • 21:46 jdlrobson@deploy2002: Finished scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549) (duration: 15m 21s)
  • 21:42 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 21:35 jdlrobson@deploy2002: jdlrobson: Backport for Temporary user banner should not have such a high z-index (T407549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 jdlrobson@deploy2002: Started scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549)
  • 21:26 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7004.*
  • 21:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 21:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7004.magru.wmnet} and A:cp
  • 21:20 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
  • 21:08 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7004.magru.wmnet} and A:cp
  • 21:00 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: Debugging sre.cdn.roll-reboot bugs
  • 20:59 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7004.*
  • 20:56 bblack: see also https://phabricator.wikimedia.org/T407578 for above port disables
  • 20:51 bblack: disabling cr1-eqiad:et-1/1/2 and cr1-codfw:et-1/0/2 (both ends of same Arelion transport, been erroring/flapping for a while)
  • 20:50 eileen: civicrm upgraded from ac4c185b to ab1d21dc
  • 20:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:43 ebernhardson@deploy2002: Finished scap sync-world: Backport for Add wgSitename for azwiktionary (T407358) (duration: 09m 29s)
  • 20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Continuing with sync
  • 20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Backport for Add wgSitename for azwiktionary (T407358) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:33 ebernhardson@deploy2002: Started scap sync-world: Backport for Add wgSitename for azwiktionary (T407358)
  • 20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) (duration: 10m 57s)
  • 20:26 ebernhardson@deploy2002: ebernhardson, hamishz: Continuing with sync
  • 20:24 ebernhardson@deploy2002: ebernhardson, hamishz: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:20 ebernhardson@deploy2002: Started scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281)
  • 20:19 ejegg: fundraising python tools upgraded from 698309f1 to 3b0b3fc0
  • 20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:18 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 20:15 ebernhardson@deploy2002: Finished scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) (duration: 09m 36s)
  • 20:11 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 20:10 ebernhardson@deploy2002: ebernhardson: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 ebernhardson@deploy2002: Started scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858)
  • 19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:38 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:25 dancy: dancy@deploy2002 Installation of scap version "4.214.0" completed for 2 hosts
  • 19:22 dancy@deploy2002: Installing scap version "4.214.0" for 2 host(s)
  • 19:03 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:57 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
  • 18:44 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: no active host - disabled
  • 18:42 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 18:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:26 brett@dns1004: END - running authdns-update
  • 18:25 brett@dns1004: START - running authdns-update
  • 18:08 brett: Import varnish 7.1.1-2~bpo13+wmf1 into trixie-wikimedia - T401832
  • 17:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
  • 17:38 swfrench@deploy2002: Finished scap sync-world: New PHP 8.3 production image (duration: 27m 32s)
  • 17:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 17:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 17:17 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac] (duration: 01m 29s)
  • 17:16 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac]
  • 17:16 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac] (duration: 06m 48s)
  • 17:12 swfrench@deploy2002: Started scap sync-world: New PHP 8.3 production image
  • 17:10 topranks: re-enable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad after maintenance on the lvs host T405499
  • 17:09 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac]
  • 17:06 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 17:00 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 16:58 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac] (duration: 01m 16s)
  • 16:57 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac]
  • 16:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 16:46 swfrench-wmf: reprepro include php8.3_8.3.26-1+wmf11u2 in component/php83
  • 16:34 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:20 topranks: disable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad to move traffic to backup load-balancer lvs1020 T405499
  • 16:19 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: remove lvs1018 enp94s0f0np0 link to rack E1
  • 16:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
  • 16:13 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
  • 16:13 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 15:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2055.codfw.wmnet with reason: Setting up new ES host
  • 15:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp700[7-8].magru.wmnet [reason: pool after firmware updated]
  • 15:27 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7008.magru.wmnet
  • 15:27 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7008.magru.wmnet
  • 15:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7008']
  • 15:15 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: firmware upgrade
  • 15:10 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7008']
  • 15:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7007.magru.wmnet
  • 15:10 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7007.magru.wmnet
  • 15:10 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7008.magru.wmnet [reason: updating firmware]
  • 15:03 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7007']
  • 14:54 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7007']
  • 14:51 ejegg: donorwiki upgraded from d903982c to 70a7050f
  • 14:37 moritzm: installing libarchive security updates
  • 14:33 urandom: starting `removenode` of aqs1012-b (id=bc700f01-8120-4d77-908f-eea943470a25)— T407414
  • 14:30 moritzm: installing distro-info-data updates on Bookworm
  • 14:27 urandom: starting `removenode` of aqs1012-a (id=0b0f0cd5-a1f8-44e2-a8e2-75800ebaea80) — T407414
  • 14:17 tappof: bump space for prometheus k8s-dse in eqiad
  • 14:09 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7008*} and A:cp
  • 14:09 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7008.magru.wmnet
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
  • 13:59 sukhe: sudo ipmitool -I lanplus -H "cp7008.mgmt.magru.wmnet" -U root -E chassis power cycle
  • 13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
  • 13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 13:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 13:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 13:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 13:49 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 13:28 zabe@deploy2002: Finished scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738) (duration: 08m 09s)
  • 13:27 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7008*} and A:cp
  • 13:24 zabe@deploy2002: zabe: Continuing with sync
  • 13:22 zabe@deploy2002: zabe: Backport for BETA: Try using Hadoop QueryPage computations (T309738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:20 zabe@deploy2002: Started scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738)
  • 13:13 esanders@deploy2002: Finished scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) (duration: 10m 14s)
  • 13:12 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:09 esanders@deploy2002: esanders: Continuing with sync
  • 13:06 esanders@deploy2002: esanders: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 esanders@deploy2002: Started scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357)
  • 12:51 moritzm: installing git security updates
  • 12:36 moritzm: installing gst-plugins-base1.0 security updates
  • 12:13 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2054 slowly with 10 steps - Pooling in new host
  • 12:05 jmm@dns1004: END - running authdns-update
  • 12:03 jmm@dns1004: START - running authdns-update
  • 11:54 ozge@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:26 claime: sudo cumin 'A:cp' "enable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'
  • 11:22 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:19 hnowlan@deploy2002: Finished deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553 (duration: 27m 01s)
  • 11:12 moritzm: installing Squid security updates
  • 11:08 claime: sudo cumin 'A:cp' "disable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'"
  • 11:05 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:53 hnowlan@deploy2002: Started deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553
  • 10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:21 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84027 and previous config saved to /var/cache/conftool/dbconfig/20251016-102110-root.json
  • 10:15 moritzm: installing libfcgi security updates
  • 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84025 and previous config saved to /var/cache/conftool/dbconfig/20251016-100605-root.json
  • 09:57 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2054 slowly with 10 steps - Pooling in new host
  • 09:56 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2054.codfw.wmnet
  • 09:56 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2054.codfw.wmnet
  • 09:55 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2054 T402859', diff saved to https://phabricator.wikimedia.org/P84023 and previous config saved to /var/cache/conftool/dbconfig/20251016-095534-fceratto.json
  • 09:51 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84022 and previous config saved to /var/cache/conftool/dbconfig/20251016-095058-root.json
  • 09:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84021 and previous config saved to /var/cache/conftool/dbconfig/20251016-093553-root.json
  • 09:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
  • 09:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
  • 09:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
  • 09:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84019 and previous config saved to /var/cache/conftool/dbconfig/20251016-092047-root.json
  • 09:14 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ssw1-d1-eqiad.mgmt with reason: downtime ssw1-d1-eqiad until we have the monitoring checks fully working for the new platform
  • 09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84018 and previous config saved to /var/cache/conftool/dbconfig/20251016-091343-root.json
  • 09:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84017 and previous config saved to /var/cache/conftool/dbconfig/20251016-090541-root.json
  • 09:02 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:00 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gerrit2002.wikimedia.org with reason: T407110
  • 09:00 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84016 and previous config saved to /var/cache/conftool/dbconfig/20251016-085837-root.json
  • 08:57 cmooney@dns2005: END - running authdns-update
  • 08:56 cmooney@dns2005: START - running authdns-update
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84014 and previous config saved to /var/cache/conftool/dbconfig/20251016-085035-root.json
  • 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84013 and previous config saved to /var/cache/conftool/dbconfig/20251016-084331-root.json
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1026.eqiad.wmnet
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:35 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84011 and previous config saved to /var/cache/conftool/dbconfig/20251016-083529-root.json
  • 08:32 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 08:32 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.23 refs T405679
  • 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84010 and previous config saved to /var/cache/conftool/dbconfig/20251016-082825-root.json
  • 08:26 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1026.eqiad.wmnet
  • 08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:22 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84009 and previous config saved to /var/cache/conftool/dbconfig/20251016-082237-root.json
  • 08:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1235 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84007 and previous config saved to /var/cache/conftool/dbconfig/20251016-082031-marostegui.json
  • 08:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84006 and previous config saved to /var/cache/conftool/dbconfig/20251016-082023-root.json
  • 08:15 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:09 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1026 from dbctl T407351', diff saved to https://phabricator.wikimedia.org/P84005 and previous config saved to /var/cache/conftool/dbconfig/20251016-080948-marostegui.json
  • 08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84004 and previous config saved to /var/cache/conftool/dbconfig/20251016-080731-root.json
  • 08:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 08:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84002 and previous config saved to /var/cache/conftool/dbconfig/20251016-080518-root.json
  • 08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2033.codfw.wmnet onto es2054.codfw.wmnet
  • 08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 07:55 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264936
  • 07:54 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264936
  • 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84000 and previous config saved to /var/cache/conftool/dbconfig/20251016-075225-root.json
  • 07:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83999 and previous config saved to /var/cache/conftool/dbconfig/20251016-075012-root.json
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 100%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83997 and previous config saved to /var/cache/conftool/dbconfig/20251016-074122-root.json
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1055 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83996 and previous config saved to /var/cache/conftool/dbconfig/20251016-074118-marostegui.json
  • 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83995 and previous config saved to /var/cache/conftool/dbconfig/20251016-073719-root.json
  • 07:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2188 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83992 and previous config saved to /var/cache/conftool/dbconfig/20251016-072932-marostegui.json
  • 07:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 07:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 75%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83991 and previous config saved to /var/cache/conftool/dbconfig/20251016-072610-root.json
  • 07:18 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83989 and previous config saved to /var/cache/conftool/dbconfig/20251016-071136-root.json
  • 07:11 kostajh: UTC morning deploys done
  • 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 60%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83988 and previous config saved to /var/cache/conftool/dbconfig/20251016-071104-root.json
  • 07:09 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83987 and previous config saved to /var/cache/conftool/dbconfig/20251016-070916-root.json
  • 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83986 and previous config saved to /var/cache/conftool/dbconfig/20251016-065630-root.json
  • 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83985 and previous config saved to /var/cache/conftool/dbconfig/20251016-065612-root.json
  • 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 50%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83984 and previous config saved to /var/cache/conftool/dbconfig/20251016-065558-root.json
  • 06:54 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83983 and previous config saved to /var/cache/conftool/dbconfig/20251016-065410-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83982 and previous config saved to /var/cache/conftool/dbconfig/20251016-064124-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83981 and previous config saved to /var/cache/conftool/dbconfig/20251016-064106-root.json
  • 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 30%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83980 and previous config saved to /var/cache/conftool/dbconfig/20251016-064052-root.json
  • 06:39 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83979 and previous config saved to /var/cache/conftool/dbconfig/20251016-063904-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83978 and previous config saved to /var/cache/conftool/dbconfig/20251016-062618-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83977 and previous config saved to /var/cache/conftool/dbconfig/20251016-062600-root.json
  • 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 25%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83976 and previous config saved to /var/cache/conftool/dbconfig/20251016-062546-root.json
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83975 and previous config saved to /var/cache/conftool/dbconfig/20251016-062358-root.json
  • 06:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2145 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83974 and previous config saved to /var/cache/conftool/dbconfig/20251016-061818-marostegui.json
  • 06:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83973 and previous config saved to /var/cache/conftool/dbconfig/20251016-061054-root.json
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 20%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83972 and previous config saved to /var/cache/conftool/dbconfig/20251016-061040-root.json
  • 06:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83971 and previous config saved to /var/cache/conftool/dbconfig/20251016-060852-root.json
  • 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1186 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83970 and previous config saved to /var/cache/conftool/dbconfig/20251016-060300-marostegui.json
  • 06:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 10%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83969 and previous config saved to /var/cache/conftool/dbconfig/20251016-055534-root.json
  • 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83968 and previous config saved to /var/cache/conftool/dbconfig/20251016-055346-root.json
  • 05:51 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83967 and previous config saved to /var/cache/conftool/dbconfig/20251016-054504-root.json
  • 05:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 7%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83965 and previous config saved to /var/cache/conftool/dbconfig/20251016-054027-root.json
  • 05:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83964 and previous config saved to /var/cache/conftool/dbconfig/20251016-053840-root.json
  • 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83963 and previous config saved to /var/cache/conftool/dbconfig/20251016-052958-root.json
  • 05:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 5%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83962 and previous config saved to /var/cache/conftool/dbconfig/20251016-052521-root.json
  • 05:23 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83961 and previous config saved to /var/cache/conftool/dbconfig/20251016-052335-root.json
  • 05:14 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83960 and previous config saved to /var/cache/conftool/dbconfig/20251016-051452-root.json
  • 05:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 1%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83959 and previous config saved to /var/cache/conftool/dbconfig/20251016-051015-root.json
  • 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2248 to dbctl depooled T406551', diff saved to https://phabricator.wikimedia.org/P83958 and previous config saved to /var/cache/conftool/dbconfig/20251016-050917-marostegui.json
  • 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83957 and previous config saved to /var/cache/conftool/dbconfig/20251016-050829-root.json
  • 04:59 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83956 and previous config saved to /var/cache/conftool/dbconfig/20251016-045946-root.json
  • 04:58 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 04:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83955 and previous config saved to /var/cache/conftool/dbconfig/20251016-045323-root.json
  • 04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 04:47 marostegui@dns1006: END - running authdns-update
  • 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2240 T407177', diff saved to https://phabricator.wikimedia.org/P83954 and previous config saved to /var/cache/conftool/dbconfig/20251016-044650-marostegui.json
  • 04:46 marostegui@dns1006: START - running authdns-update
  • 04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2179 to s4 primary and set section read-write T407177', diff saved to https://phabricator.wikimedia.org/P83953 and previous config saved to /var/cache/conftool/dbconfig/20251016-044557-marostegui.json
  • 04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T407177', diff saved to https://phabricator.wikimedia.org/P83952 and previous config saved to /var/cache/conftool/dbconfig/20251016-044533-marostegui.json
  • 04:45 marostegui: Starting s4 codfw failover from db2240 to db2179 - T407177
  • 04:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s4 T407177
  • 04:39 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2179 with weight 0 T407177', diff saved to https://phabricator.wikimedia.org/P83951 and previous config saved to /var/cache/conftool/dbconfig/20251016-043920-marostegui.json
  • 04:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83950 and previous config saved to /var/cache/conftool/dbconfig/20251016-043816-root.json
  • 04:35 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1054 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83949 and previous config saved to /var/cache/conftool/dbconfig/20251016-043510-marostegui.json
  • 04:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
  • 04:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
  • 04:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
  • 04:16 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
  • 03:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 03:22 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 03:04 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
  • 02:50 eileen: civicrm upgraded from 25df5996 to ac4c185b
  • 00:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye

2025-10-15

  • 23:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 23:36 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 22:56 andrew@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM cloudbackup1002-dev.eqiad.wmnet
  • 21:35 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
  • 21:29 bvibber@deploy2002: Finished scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv (duration: 07m 13s)
  • 21:25 bvibber@deploy2002: bvibber: Continuing with sync
  • 21:24 bvibber@deploy2002: bvibber: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:22 bvibber@deploy2002: Started scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv
  • 21:05 cjming: end of UTC late backport window
  • 21:03 cjming@deploy2002: Finished scap sync-world: Backport for Enable protection indicator for srwiki (T407183) (duration: 08m 25s)
  • 21:03 andrewbogott: adding additional disk space to cloudbackup1002-dev with "sudo gnt-instance modify --disk add:size=60g cloudbackup1002-dev.eqiad.wmnet"
  • 20:59 cjming@deploy2002: cjming, zoranzoki21: Continuing with sync
  • 20:57 cjming@deploy2002: cjming, zoranzoki21: Backport for Enable protection indicator for srwiki (T407183) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 cjming@deploy2002: Started scap sync-world: Backport for Enable protection indicator for srwiki (T407183)
  • 20:51 cjming@deploy2002: Finished scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) (duration: 06m 48s)
  • 20:47 cjming@deploy2002: cjming, robertsky: Continuing with sync
  • 20:47 cjming@deploy2002: cjming, robertsky: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:44 cjming@deploy2002: Started scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422)
  • 20:41 cjming@deploy2002: Finished scap sync-world: Backport for Add reader exp to common settings (T406916) (duration: 13m 51s)
  • 20:36 cjming@deploy2002: ksarabia, cjming: Continuing with sync
  • 20:33 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
  • 20:29 cjming@deploy2002: ksarabia, cjming: Backport for Add reader exp to common settings (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 cjming@deploy2002: Started scap sync-world: Backport for Add reader exp to common settings (T406916)
  • 20:24 cjming@deploy2002: Finished scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359) (duration: 07m 12s)
  • 20:20 cjming@deploy2002: cjming: Continuing with sync
  • 20:19 cjming@deploy2002: cjming: Backport for Fix action_context for simple bot detection instrument (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 cjming@deploy2002: Started scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359)
  • 20:12 kemayo@deploy2002: Finished scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095) (duration: 07m 04s)
  • 20:08 kemayo@deploy2002: kemayo: Continuing with sync
  • 20:07 kemayo@deploy2002: kemayo: Backport for DiscussionTools: enable thanking comments (T366095) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 kemayo@deploy2002: Started scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095)
  • 19:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS bullseye
  • 19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: sync
  • 19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: sync
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:27 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS bullseye
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:19 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7007.magru.wmnet with reason: hardware issues, depooled
  • 19:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:03 sukhe: sudo ipmitool -I lanplus -H "cp7007.mgmt.magru.wmnet" -U root -E chassis power cycle
  • 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:45 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
  • 18:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
  • 18:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7016.magru.wmnet
  • 18:18 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 18:18 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6016.drmrs.wmnet
  • 18:14 swfrench@deploy2002: Finished scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955) (duration: 10m 21s)
  • 18:14 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 18:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6008.drmrs.wmnet
  • 18:10 swfrench@deploy2002: swfrench: Continuing with sync
  • 18:10 sukhe@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
  • 18:07 swfrench@deploy2002: swfrench: Backport for Disable enrollment in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:04 swfrench@deploy2002: Started scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955)
  • 17:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7015.magru.wmnet
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6015.drmrs.wmnet
  • 17:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6007.drmrs.wmnet
  • 17:26 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
  • 17:23 swfrench@deploy2002: Finished scap sync-world: Revert to PHP 8.1 - T405955 (duration: 02m 47s)
  • 17:21 swfrench@deploy2002: Started scap sync-world: Revert to PHP 8.1 - T405955
  • 17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:04 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7014.magru.wmnet
  • 16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:55 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6014.drmrs.wmnet
  • 16:53 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6006.drmrs.wmnet
  • 16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:40 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2053 slowly with 10 steps - Pooling in new host
  • 16:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:20 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7013.magru.wmnet
  • 16:19 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7006.magru.wmnet
  • 16:16 eevans@cumin1003: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:aqs-eqiad
  • 16:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6013.drmrs.wmnet
  • 16:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6005.drmrs.wmnet
  • 15:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2247.codfw.wmnet
  • 15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7012.magru.wmnet
  • 15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7005.magru.wmnet
  • 15:33 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6012.drmrs.wmnet
  • 15:31 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6004.drmrs.wmnet
  • 15:29 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8] (duration: 01m 06s)
  • 15:28 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8]
  • 15:28 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8] (duration: 06m 37s)
  • 15:21 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8]
  • 15:21 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8] (duration: 02m 17s)
  • 15:19 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8]
  • 15:03 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
  • 14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7011.magru.wmnet
  • 14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6011.drmrs.wmnet
  • 14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6003.drmrs.wmnet
  • 14:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
  • 14:43 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
  • 14:43 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2033.codfw.wmnet onto es2054.codfw.wmnet
  • 14:41 claime: armed keyholder on deploy[1003|2002] following reboots
  • 14:40 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
  • 14:39 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:37 moritzm: armed keyholder on cumin1002 following reboot
  • 14:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2054.codfw.wmnet with reason: Setting up new ES host
  • 14:34 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-eqiad
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
  • 14:34 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
  • 14:29 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 14:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
  • 14:24 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'es2053 set ipaddr before pool-in', diff saved to https://phabricator.wikimedia.org/P83930 and previous config saved to /var/cache/conftool/dbconfig/20251015-142339-fceratto.json
  • 14:22 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2053 slowly with 10 steps - Pooling in new host
  • 14:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2002.codfw.wmnet
  • 14:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2001.codfw.wmnet
  • 14:19 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:19 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1003.eqiad.wmnet
  • 14:18 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:17 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:16 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2002.codfw.wmnet
  • 14:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
  • 14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2001.codfw.wmnet
  • 14:14 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
  • 14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
  • 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7003.magru.wmnet
  • 14:11 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
  • 14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7010.magru.wmnet
  • 14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
  • 14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
  • 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
  • 14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
  • 14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6010.drmrs.wmnet
  • 14:10 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6002.drmrs.wmnet
  • 14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy1003.eqiad.wmnet
  • 14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
  • 14:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:04 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2053 T402859', diff saved to https://phabricator.wikimedia.org/P83929 and previous config saved to /var/cache/conftool/dbconfig/20251015-135630-fceratto.json
  • 13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:33 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6009.drmrs.wmnet
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6001.drmrs.wmnet
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7002.magru.wmnet
  • 13:28 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7009.magru.wmnet
  • 13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
  • 13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 13:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 13:17 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
  • 13:16 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
  • 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 13:16 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: already rebooted; pooling]
  • 13:15 sukhe@cumin1003: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_magru
  • 13:15 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
  • 13:14 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:14 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:00 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1026 T407351', diff saved to https://phabricator.wikimedia.org/P83925 and previous config saved to /var/cache/conftool/dbconfig/20251015-124927-marostegui.json
  • 12:44 claime: enabling puppet on cp nodes for T406318
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
  • 12:29 claime: disabling puppet on cp nodes for T406318
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
  • 12:26 moritzm: installing ghostscript security updates
  • 12:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2009.codfw.wmnet
  • 12:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2009.codfw.wmnet
  • 12:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2008.codfw.wmnet
  • 12:12 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2008.codfw.wmnet
  • 12:12 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2007.codfw.wmnet
  • 12:05 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2007.codfw.wmnet
  • 12:05 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2006.codfw.wmnet
  • 12:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
  • 12:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
  • 12:01 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2247.codfw.wmnet
  • 11:57 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2006.codfw.wmnet
  • 11:57 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2005.codfw.wmnet
  • 11:50 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:16 claime: Enabling puppet on all cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 11:16 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:12 claime: Enabling puppet on cp6015 for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 11:07 claime: disabling puppet on cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:44 hashar@deploy2002: Finished scap sync-world: Backport for Replace call to deprecated method getImages (T407184) (duration: 32m 19s)
  • 10:40 hashar@deploy2002: hashar: Continuing with sync
  • 10:37 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1009.eqiad.wmnet
  • 10:30 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1009.eqiad.wmnet
  • 10:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1008.eqiad.wmnet
  • 10:23 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1008.eqiad.wmnet
  • 10:23 moritzm: installing libcommons-lang3-java security updates
  • 10:23 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1007.eqiad.wmnet
  • 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS trixie
  • 10:18 hnowlan: deleted legacy EMEA/Americas business hours Splunk rotations
  • 10:16 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1007.eqiad.wmnet
  • 10:16 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1006.eqiad.wmnet
  • 10:16 hashar@deploy2002: hashar: Backport for Replace call to deprecated method getImages (T407184) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:11 hashar@deploy2002: Started scap sync-world: Backport for Replace call to deprecated method getImages (T407184)
  • 10:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1006.eqiad.wmnet
  • 10:09 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1005.eqiad.wmnet
  • 10:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 10:02 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
  • 09:58 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 09:44 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS trixie
  • 09:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS trixie
  • 09:37 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
  • 09:33 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2248.codfw.wmnet
  • 09:32 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 09:32 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 09:32 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 09:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 09:18 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:17 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:17 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:16 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:14 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:01 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:01 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 08:59 Amir1: mwscript-k8s -- purgeUserOptions.php --wiki=loginwiki (T406724)
  • 08:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 08:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 08:49 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:47 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 08:44 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 08:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83918 and previous config saved to /var/cache/conftool/dbconfig/20251015-083339-root.json
  • 08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83917 and previous config saved to /var/cache/conftool/dbconfig/20251015-083333-root.json
  • 08:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 08:29 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:22 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:22 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:19 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.23 refs T405679
  • 08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83916 and previous config saved to /var/cache/conftool/dbconfig/20251015-081833-root.json
  • 08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83915 and previous config saved to /var/cache/conftool/dbconfig/20251015-081827-root.json
  • 08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:14 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:13 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2032.codfw.wmnet onto es2053.codfw.wmnet
  • 08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 08:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:04 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:04 slyngshede@dns1004: END - running authdns-update
  • 08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83913 and previous config saved to /var/cache/conftool/dbconfig/20251015-080327-root.json
  • 08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83912 and previous config saved to /var/cache/conftool/dbconfig/20251015-080321-root.json
  • 08:03 slyngshede@dns1004: START - running authdns-update
  • 08:02 slyngs: Moving CAS/IDP/SSO to Trixie.
  • 07:58 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 07:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 07:53 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 07:50 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83910 and previous config saved to /var/cache/conftool/dbconfig/20251015-074821-root.json
  • 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83909 and previous config saved to /var/cache/conftool/dbconfig/20251015-074815-root.json
  • 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83907 and previous config saved to /var/cache/conftool/dbconfig/20251015-073316-root.json
  • 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83906 and previous config saved to /var/cache/conftool/dbconfig/20251015-073309-root.json
  • 07:28 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 07:27 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366) (duration: 09m 02s)
  • 07:23 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:21 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable on enwiki (T402366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 07:18 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366)
  • 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83904 and previous config saved to /var/cache/conftool/dbconfig/20251015-071810-root.json
  • 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83903 and previous config saved to /var/cache/conftool/dbconfig/20251015-071803-root.json
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83901 and previous config saved to /var/cache/conftool/dbconfig/20251015-070304-root.json
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83900 and previous config saved to /var/cache/conftool/dbconfig/20251015-070258-root.json
  • 06:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83899 and previous config saved to /var/cache/conftool/dbconfig/20251015-064758-root.json
  • 06:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83898 and previous config saved to /var/cache/conftool/dbconfig/20251015-064752-root.json
  • 06:46 jmm@dns1004: END - running authdns-update
  • 06:45 jmm@dns1004: START - running authdns-update
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83897 and previous config saved to /var/cache/conftool/dbconfig/20251015-063252-root.json
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83896 and previous config saved to /var/cache/conftool/dbconfig/20251015-063246-root.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83895 and previous config saved to /var/cache/conftool/dbconfig/20251015-061746-root.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83894 and previous config saved to /var/cache/conftool/dbconfig/20251015-061740-root.json
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83891 and previous config saved to /var/cache/conftool/dbconfig/20251015-060240-root.json
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83890 and previous config saved to /var/cache/conftool/dbconfig/20251015-060234-root.json
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1052 and es1057 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83889 and previous config saved to /var/cache/conftool/dbconfig/20251015-060210-marostegui.json
  • 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
  • 05:54 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1260 to dbctl depooled T406550', diff saved to https://phabricator.wikimedia.org/P83886 and previous config saved to /var/cache/conftool/dbconfig/20251015-055457-marostegui.json
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
  • 05:43 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
  • 05:43 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2248.codfw.wmnet
  • 05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
  • 05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
  • 04:56 eileen: civicrm upgraded from 4d3107fc to 25df5996
  • 01:40 musikanimal@deploy2002: Finished scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719) (duration: 07m 25s)
  • 01:36 musikanimal@deploy2002: hmonroy, musikanimal: Continuing with sync
  • 01:35 musikanimal@deploy2002: hmonroy, musikanimal: Backport for Make tags be links to wish-index with filter applied (T406719) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:33 musikanimal@deploy2002: Started scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719)
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 06s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-14

  • 23:39 musikanimal@deploy2002: Finished scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) (duration: 07m 08s)
  • 23:35 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 23:34 musikanimal@deploy2002: musikanimal: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:32 musikanimal@deploy2002: Started scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945)
  • 21:55 greg-g: (from eileen) civicrm upgraded from f68c287a to 4d3107fc
  • 21:43 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set reader experiment to true (T406916) (duration: 11m 26s)
  • 21:38 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
  • 21:34 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for Set reader experiment to true (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Set reader experiment to true (T406916)
  • 21:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) (duration: 14m 22s)
  • 21:25 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
  • 21:19 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:17 ladsgroup@deploy2002: Started scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992)
  • 21:16 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP" (duration: 16m 34s)
  • 21:10 ladsgroup@deploy2002: neslihanturan, ladsgroup: Continuing with sync
  • 21:04 ladsgroup@deploy2002: neslihanturan, ladsgroup: Backport for Revert "Add icons for wikibase changes. WIP" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:59 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP"
  • 20:37 toyofuku@deploy2002: Finished scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627) (duration: 11m 58s)
  • 20:30 toyofuku@deploy2002: lmora, toyofuku: Continuing with sync
  • 20:29 toyofuku@deploy2002: lmora, toyofuku: Backport for Add ReadingList Stream to EventStreamConfig (T406627) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 toyofuku@deploy2002: Started scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627)
  • 20:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS trixie
  • 20:17 kemayo@deploy2002: Finished scap sync-world: Backport for Suggestions mode (T399612) (duration: 12m 47s)
  • 20:09 kemayo@deploy2002: kemayo: Continuing with sync
  • 20:09 kemayo@deploy2002: kemayo: Backport for Suggestions mode (T399612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 kemayo@deploy2002: Started scap sync-world: Backport for Suggestions mode (T399612)
  • 19:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 19:55 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 19:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 19:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 19:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 19:40 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 19:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 19:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 19:39 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 19:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 19:36 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 19:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 19:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 19:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 19:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 19:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 19:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 19:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 19:09 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 19:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 19:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 19:06 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 19:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 19:05 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 19:05 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 19:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 19:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 19:01 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:01 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:00 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:00 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:59 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum5002.eqsin.wmnet with OS trixie
  • 18:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:57 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 18:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 18:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 18:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 18:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 18:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 18:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 18:48 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 18:48 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 18:46 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:46 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS trixie
  • 18:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS trixie
  • 18:44 rzl: rzl@deploy1003:~$ kube-env mw-script-deploy codfw; helm uninstall amfcta11 # HelmReleaseBadStatus alert was firing for this mw-script job in state pending-install, even though the job was long since finished
  • 18:38 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS trixie
  • 18:36 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 18:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 18:34 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:34 ejegg: fundraising civicrm upgraded from 9393addf to f68c287a
  • 18:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 18:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 18:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 18:31 brett@dns1004: END - running authdns-update
  • 18:29 brett@dns1004: START - running authdns-update
  • 18:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 18:28 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 18:26 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:23 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:19 brett@dns1004: END - running authdns-update
  • 18:18 brett@dns1004: START - running authdns-update
  • 18:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:11 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) (duration: 19m 18s)
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum4002.ulsfo.wmnet with OS trixie
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS trixie
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS trixie
  • 18:01 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:56 swfrench@deploy2002: swfrench: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:52 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955)
  • 17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:41 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:40 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:19 swfrench@deploy2002: Finished scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955 (duration: 05m 41s)
  • 17:15 swfrench@deploy2002: Started scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955
  • 16:55 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:55 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:36 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:36 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:32 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 16:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:27 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
  • 16:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 16:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 16:19 mutante: rebooting backend of releases.wikimedia.org
  • 16:19 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1003.eqiad.wmnet with reason: reboot
  • 16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1003.eqiad.wmnet
  • 16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1003.eqiad.wmnet with OS trixie
  • 16:17 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:16 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:12 mutante: rebooting phab2002
  • 16:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: reboot
  • 16:04 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 16:03 mutante: CI should be back in operation as normal
  • 15:57 mutante: rebooting main CI server - integration.wikimedia.org will be down for a minute
  • 15:57 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 15:56 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: reboot
  • 15:50 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot
  • 15:50 mutante: contint2002 - rebooting - (not the manager host)
  • 15:47 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1003.eqiad.wmnet with OS trixie
  • 15:46 swfrench-wmf: rolling run-puppet-agent on A:cp hosts - T405955
  • 15:33 swfrench-wmf: disable-puppet on A:cp hosts - T405955
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:30 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1003.eqiad.wmnet on all recursors
  • 15:30 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1003.eqiad.wmnet on all recursors
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:21 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:20 moritzm: installing jq security updates
  • 15:17 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
  • 15:05 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 15:05 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1003.eqiad.wmnet
  • 15:04 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244 (duration: 00m 58s)
  • 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244 (duration: 00m 31s)
  • 15:02 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244
  • 14:58 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T407244
  • 14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:36 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 14:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 14:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1001.eqiad.wmnet
  • 14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1001.eqiad.wmnet with OS trixie
  • 14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
  • 14:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4037.ulsfo.wmnet
  • 14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 14:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 14:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:26 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
  • 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 14:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
  • 14:18 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
  • 14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:14 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:12 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
  • 14:11 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) (duration: 09m 25s)
  • 14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:09 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:07 samtar@deploy2002: samtar: Continuing with sync
  • 14:06 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:02 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)
  • 14:02 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1001.eqiad.wmnet with OS trixie
  • 14:01 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:01 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1001.eqiad.wmnet on all recursors
  • 14:00 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1001.eqiad.wmnet on all recursors
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 phuedx@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema (duration: 10m 17s)
  • 13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:56 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 13:56 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
  • 13:56 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:56 phuedx@deploy2002: phuedx: Continuing with sync
  • 13:55 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:54 phuedx@deploy2002: phuedx: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:52 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:49 phuedx@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 13:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:46 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7001.magru.wmnet
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 13:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:34 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
  • 13:31 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:26 logmsgbot: daniel Deployed security patch for T405859
  • 13:19 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
  • 13:16 logmsgbot: daniel Deployed security patch for T405859
  • 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 13:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye
  • 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
  • 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
  • 13:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1095.eqiad.wmnet
  • 12:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1095.eqiad.wmnet
  • 12:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
  • 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
  • 12:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
  • 12:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2053.codfw.wmnet
  • 12:34 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
  • 12:33 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
  • 12:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
  • 12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
  • 12:30 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 12:18 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:17 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:17 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:16 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 12:13 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:12 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:12 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:08 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
  • 12:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 12:07 ladsgroup@deploy2002: Finished scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872) (duration: 12m 46s)
  • 12:07 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:07 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1094.eqiad.wmnet
  • 12:03 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:03 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:03 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:59 ladsgroup@deploy2002: ladsgroup: Backport for filebackend: Remove consistency check for multi-backend (T328872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 11:54 ladsgroup@deploy2002: Started scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872)
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 11:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1094.eqiad.wmnet
  • 11:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1093.eqiad.wmnet
  • 11:45 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
  • 11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1093.eqiad.wmnet
  • 11:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1092.eqiad.wmnet
  • 11:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1092.eqiad.wmnet
  • 11:32 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1091.eqiad.wmnet
  • 11:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2089.codfw.wmnet
  • 11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:26 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1091.eqiad.wmnet
  • 11:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1090.eqiad.wmnet
  • 11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2089.codfw.wmnet
  • 11:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2088.codfw.wmnet
  • 11:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1090.eqiad.wmnet
  • 11:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1089.eqiad.wmnet
  • 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
  • 11:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2087.codfw.wmnet
  • 10:58 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1087.eqiad.wmnet
  • 10:58 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1086.eqiad.wmnet
  • 10:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2085.codfw.wmnet
  • 10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1002.eqiad.wmnet
  • 10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1002.eqiad.wmnet with OS trixie
  • 10:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1086.eqiad.wmnet
  • 10:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1085.eqiad.wmnet
  • 10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2082.codfw.wmnet
  • 10:49 hashar: Restarted Zuul to have it reconnect to Gerrit
  • 10:48 fabfur: enable puppet on all DNS hosts for manual gerrit switch (T407200)
  • 10:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1085.eqiad.wmnet
  • 10:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1084.eqiad.wmnet
  • 10:43 arnaudb@dns1004: END - running authdns-update
  • 10:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
  • 10:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2082.codfw.wmnet
  • 10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2081.codfw.wmnet
  • 10:42 arnaudb@dns1004: START - running authdns-update
  • 10:38 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1084.eqiad.wmnet
  • 10:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1083.eqiad.wmnet
  • 10:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
  • 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2081.codfw.wmnet
  • 10:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
  • 10:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
  • 10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1083.eqiad.wmnet
  • 10:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
  • 10:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
  • 10:27 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1002.eqiad.wmnet with OS trixie
  • 10:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
  • 10:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
  • 10:20 fabfur: disabling puppet on all DNS hosts for manual gerrit switch (T407200)
  • 10:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1002.eqiad.wmnet on all recursors
  • 10:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1002.eqiad.wmnet on all recursors
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:16 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1081.eqiad.wmnet
  • 10:15 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:09 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 10:09 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1002.eqiad.wmnet
  • 10:04 Amir1: mwscript-k8s --follow --dblist=group0 -- purgeUserOptions.php (T406724)
  • 09:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2002.codfw.wmnet
  • 09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2002.codfw.wmnet with OS trixie
  • 09:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 09:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
  • 09:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
  • 09:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
  • 09:38 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
  • 09:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
  • 09:33 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
  • 09:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
  • 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 09:25 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 09:22 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
  • 09:22 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
  • 09:22 arnaudb@dns1004: END - running authdns-update
  • 09:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2002.codfw.wmnet with OS trixie
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 09:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2002.codfw.wmnet on all recursors
  • 09:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2002.codfw.wmnet on all recursors
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:13 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 09:12 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2002.codfw.wmnet
  • 09:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 09:10 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
  • 09:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
  • 09:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
  • 09:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
  • 09:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 09:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 09:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 09:00 arnaudb@dns1004: START - running authdns-update
  • 08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
  • 08:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
  • 08:56 topranks: enable new inter.link IP transit circuit on cr1-drms T401104
  • 08:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 08:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
  • 08:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
  • 08:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 08:45 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:44 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:42 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
  • 08:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 08:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
  • 08:38 brouberol@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:37 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.23 refs T405679
  • 08:37 brouberol@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:32 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 08:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
  • 08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 08:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 08:30 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 08:29 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
  • 08:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
  • 08:25 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
  • 08:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 08:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 08:18 dcausse: closing the UTC morning backport window
  • 08:14 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) (duration: 10m 46s)
  • 08:12 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 08:10 dcausse@deploy2002: dcausse, phuedx: Continuing with sync
  • 08:07 dcausse@deploy2002: dcausse, phuedx: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:03 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332)
  • 08:02 dcausse@deploy2002: mwscript-k8s job started: namespaceDupes eswiktionary --fix # T407150
  • 08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83863 and previous config saved to /var/cache/conftool/dbconfig/20251014-080025-root.json
  • 08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:59 dcausse@deploy2002: Finished scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) (duration: 11m 29s)
  • 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83862 and previous config saved to /var/cache/conftool/dbconfig/20251014-075608-root.json
  • 07:54 dcausse@deploy2002: dcausse, superpes: Continuing with sync
  • 07:51 dcausse@deploy2002: dcausse, superpes: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:47 dcausse@deploy2002: Started scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076)
  • 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83861 and previous config saved to /var/cache/conftool/dbconfig/20251014-074519-root.json
  • 07:43 dcausse@deploy2002: Finished scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290) (duration: 10m 50s)
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83860 and previous config saved to /var/cache/conftool/dbconfig/20251014-074102-root.json
  • 07:39 dcausse@deploy2002: joelyrookewmde, dcausse: Continuing with sync
  • 07:36 dcausse@deploy2002: joelyrookewmde, dcausse: Backport for Implement new usage types for statement with qualifiers and references (T401290) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:32 dcausse@deploy2002: Started scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290)
  • 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83859 and previous config saved to /var/cache/conftool/dbconfig/20251014-073013-root.json
  • 07:28 dcausse@deploy2002: Finished scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark (duration: 11m 46s)
  • 07:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83858 and previous config saved to /var/cache/conftool/dbconfig/20251014-072556-root.json
  • 07:22 dcausse@deploy2002: jhsoby, dcausse: Continuing with sync
  • 07:21 dcausse@deploy2002: jhsoby, dcausse: Backport for Remove artifact from Quechua Wikipedia wordmark synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:16 dcausse@deploy2002: Started scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83857 and previous config saved to /var/cache/conftool/dbconfig/20251014-071507-root.json
  • 07:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83856 and previous config saved to /var/cache/conftool/dbconfig/20251014-071050-root.json
  • 07:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83855 and previous config saved to /var/cache/conftool/dbconfig/20251014-070001-root.json
  • 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83854 and previous config saved to /var/cache/conftool/dbconfig/20251014-065544-root.json
  • 06:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83853 and previous config saved to /var/cache/conftool/dbconfig/20251014-064455-root.json
  • 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83852 and previous config saved to /var/cache/conftool/dbconfig/20251014-064038-root.json
  • 06:37 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83851 and previous config saved to /var/cache/conftool/dbconfig/20251014-063724-root.json
  • 06:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83850 and previous config saved to /var/cache/conftool/dbconfig/20251014-062949-root.json
  • 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83848 and previous config saved to /var/cache/conftool/dbconfig/20251014-062532-root.json
  • 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83847 and previous config saved to /var/cache/conftool/dbconfig/20251014-062218-root.json
  • 06:21 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 06:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83846 and previous config saved to /var/cache/conftool/dbconfig/20251014-061444-root.json
  • 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
  • 06:14 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83845 and previous config saved to /var/cache/conftool/dbconfig/20251014-061026-root.json
  • 06:07 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83844 and previous config saved to /var/cache/conftool/dbconfig/20251014-060712-root.json
  • 05:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83843 and previous config saved to /var/cache/conftool/dbconfig/20251014-055938-root.json
  • 05:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83842 and previous config saved to /var/cache/conftool/dbconfig/20251014-055520-root.json
  • 05:53 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
  • 05:53 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
  • 05:52 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83840 and previous config saved to /var/cache/conftool/dbconfig/20251014-055206-root.json
  • 05:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83839 and previous config saved to /var/cache/conftool/dbconfig/20251014-054631-root.json
  • 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83838 and previous config saved to /var/cache/conftool/dbconfig/20251014-054432-root.json
  • 05:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1244 T407176', diff saved to https://phabricator.wikimedia.org/P83837 and previous config saved to /var/cache/conftool/dbconfig/20251014-054200-marostegui.json
  • 05:41 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1160 to s4 primary T407176', diff saved to https://phabricator.wikimedia.org/P83836 and previous config saved to /var/cache/conftool/dbconfig/20251014-054118-marostegui.json
  • 05:41 marostegui: Starting s4 eqiad failover from db1244 to db1160 - T407176
  • 05:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83835 and previous config saved to /var/cache/conftool/dbconfig/20251014-054014-root.json
  • 05:37 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T407176
  • 05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1160 with weight 0 T407176', diff saved to https://phabricator.wikimedia.org/P83834 and previous config saved to /var/cache/conftool/dbconfig/20251014-053654-marostegui.json
  • 05:31 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83833 and previous config saved to /var/cache/conftool/dbconfig/20251014-053125-root.json
  • 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83832 and previous config saved to /var/cache/conftool/dbconfig/20251014-052926-root.json
  • 05:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
  • 05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
  • 05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
  • 05:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83830 and previous config saved to /var/cache/conftool/dbconfig/20251014-052508-root.json
  • 05:20 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
  • 05:20 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
  • 05:16 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83828 and previous config saved to /var/cache/conftool/dbconfig/20251014-051619-root.json
  • 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1031-1032].eqiad.wmnet with reason: Cloning
  • 05:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83826 and previous config saved to /var/cache/conftool/dbconfig/20251014-050113-root.json
  • 04:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1221 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83824 and previous config saved to /var/cache/conftool/dbconfig/20251014-045305-marostegui.json
  • 04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Upgrading
  • 04:41 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.20 (duration: 02m 42s)
  • 03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679 (duration: 45m 02s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679
  • 02:24 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 02:20 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 02:09 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 02:05 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 01:58 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 01:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 01:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 01:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 20s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-13

  • 23:50 musikanimal@deploy2002: Finished scap sync-world: Backport for Add 'accepted' status (T406674) (duration: 40m 01s)
  • 23:38 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 23:36 musikanimal@deploy2002: musikanimal: Backport for Add 'accepted' status (T406674) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:29 btullis@cumin1003: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto an-presto cluster: Reboot Presto nodes
  • 23:10 musikanimal@deploy2002: Started scap sync-world: Backport for Add 'accepted' status (T406674)
  • 22:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 22:30 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 22:01 btullis@cumin1003: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 22:01 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 22:01 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 22:00 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 21:52 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
  • 21:48 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 21:05 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 21:05 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 21:03 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 20:57 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:56 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 20:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 20:52 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 20:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 20:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 20:34 eileen: civicrm upgraded from 385f00d8 to 9393addf
  • 20:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 20:22 dani@deploy2002: Finished scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 01s)
  • 20:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 20:18 dani@deploy2002: dani: Continuing with sync
  • 20:17 dani@deploy2002: dani: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:13 dani@deploy2002: Started scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577)
  • 19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 18:59 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2001.codfw.wmnet
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2001.codfw.wmnet with OS trixie
  • 17:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
  • 17:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
  • 17:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2001.codfw.wmnet with OS trixie
  • 17:19 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:19 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2001.codfw.wmnet on all recursors
  • 17:18 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2001.codfw.wmnet on all recursors
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:14 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:14 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:11 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad
  • 17:11 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:11 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2001.codfw.wmnet
  • 17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host db-test1001.eqiad.wmnet
  • 17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:08 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:02 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:59 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 16:59 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
  • 16:05 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 15:59 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 15:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 15:51 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:47 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 15:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 15:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 15:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 15:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 15:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 15:24 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 15:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 15:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 15:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 15:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet
  • 15:12 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 15:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1005.eqiad.wmnet
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet
  • 15:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1004.eqiad.wmnet
  • 15:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 15:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 14:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 14:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1003.eqiad.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1003.eqiad.wmnet
  • 14:49 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 14:20 hnowlan: rest.php on rest-gateway at 100% for enwiki (and all other wikis)
  • 14:19 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 14:15 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
  • 14:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 14:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 14:07 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 14:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 14:06 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 14:04 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2057.codfw.wmnet
  • 14:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 14:03 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 13:58 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:49 btullis@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 13:46 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:43 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:40 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:40 phuedx: UTC afternoon backport window done
  • 13:37 phuedx@deploy2002: Finished scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359) (duration: 17m 39s)
  • 13:34 btullis@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 13:33 phuedx@deploy2002: phuedx: Continuing with sync
  • 13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
  • 13:31 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 13:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
  • 13:30 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 13:26 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:24 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:24 phuedx@deploy2002: phuedx: Backport for Port Java Pageview definition to bot detection (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:20 phuedx@deploy2002: Started scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359)
  • 13:15 derick@deploy2002: Finished scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) (duration: 11m 39s)
  • 13:11 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 13:11 derick@deploy2002: derick, d3r1ck01: Continuing with sync
  • 13:09 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 13:09 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 13:08 derick@deploy2002: derick, d3r1ck01: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 13:04 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 13:03 derick@deploy2002: Started scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808)
  • 13:01 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 13:01 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 12:59 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 12:59 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 12:57 Amir1: dropped flaggedrevs tables on lawikisource (fT406424)
  • 12:57 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 12:56 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 12:56 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 12:54 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 12:53 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 12:51 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 12:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 12:50 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 12:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83815 and previous config saved to /var/cache/conftool/dbconfig/20251013-124744-root.json
  • 12:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 12:46 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 12:45 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 12:45 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 12:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83814 and previous config saved to /var/cache/conftool/dbconfig/20251013-124439-root.json
  • 12:41 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 12:41 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 12:40 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 12:40 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83813 and previous config saved to /var/cache/conftool/dbconfig/20251013-123238-root.json
  • 12:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 12:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83812 and previous config saved to /var/cache/conftool/dbconfig/20251013-122933-root.json
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
  • 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83811 and previous config saved to /var/cache/conftool/dbconfig/20251013-121732-root.json
  • 12:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 12:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83810 and previous config saved to /var/cache/conftool/dbconfig/20251013-121427-root.json
  • 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
  • 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83809 and previous config saved to /var/cache/conftool/dbconfig/20251013-120226-root.json
  • 11:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83808 and previous config saved to /var/cache/conftool/dbconfig/20251013-115921-root.json
  • 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83807 and previous config saved to /var/cache/conftool/dbconfig/20251013-114720-root.json
  • 11:45 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 11:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83806 and previous config saved to /var/cache/conftool/dbconfig/20251013-114415-root.json
  • 11:35 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83805 and previous config saved to /var/cache/conftool/dbconfig/20251013-113510-root.json
  • 11:33 gehel: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly) - `sudo depool && sleep 30 && sudo systemctl restart wdqs-blazegraph.service && sleep 30 && sudo pool`
  • 11:32 moritzm: installing openssl security updates on Bullseye
  • 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83804 and previous config saved to /var/cache/conftool/dbconfig/20251013-113214-root.json
  • 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83803 and previous config saved to /var/cache/conftool/dbconfig/20251013-112909-root.json
  • 11:20 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83802 and previous config saved to /var/cache/conftool/dbconfig/20251013-112004-root.json
  • 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83801 and previous config saved to /var/cache/conftool/dbconfig/20251013-111708-root.json
  • 11:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83800 and previous config saved to /var/cache/conftool/dbconfig/20251013-111403-root.json
  • 11:04 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83799 and previous config saved to /var/cache/conftool/dbconfig/20251013-110458-root.json
  • 11:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83798 and previous config saved to /var/cache/conftool/dbconfig/20251013-110203-root.json
  • 10:58 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83797 and previous config saved to /var/cache/conftool/dbconfig/20251013-105857-root.json
  • 10:49 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83796 and previous config saved to /var/cache/conftool/dbconfig/20251013-104952-root.json
  • 10:49 moritzm: installing systemd bugfix updates on bullseye
  • 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83795 and previous config saved to /var/cache/conftool/dbconfig/20251013-104657-root.json
  • 10:43 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83794 and previous config saved to /var/cache/conftool/dbconfig/20251013-104351-root.json
  • 10:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1247 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83793 and previous config saved to /var/cache/conftool/dbconfig/20251013-104131-marostegui.json
  • 10:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83792 and previous config saved to /var/cache/conftool/dbconfig/20251013-103151-root.json
  • 10:28 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83791 and previous config saved to /var/cache/conftool/dbconfig/20251013-102845-root.json
  • 10:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83790 and previous config saved to /var/cache/conftool/dbconfig/20251013-102428-root.json
  • 10:16 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83789 and previous config saved to /var/cache/conftool/dbconfig/20251013-101645-root.json
  • 10:13 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83788 and previous config saved to /var/cache/conftool/dbconfig/20251013-101339-root.json
  • 10:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83787 and previous config saved to /var/cache/conftool/dbconfig/20251013-100923-root.json
  • 10:08 hashar@deploy2002: Finished deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner (duration: 00m 13s)
  • 10:08 hashar@deploy2002: Started deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner
  • 10:03 moritzm: installing Linux 5.10.244 on Bullseye hosts
  • 09:54 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83786 and previous config saved to /var/cache/conftool/dbconfig/20251013-095416-root.json
  • 09:39 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83785 and previous config saved to /var/cache/conftool/dbconfig/20251013-093910-root.json
  • 09:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Cloning
  • 09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1160 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83784 and previous config saved to /var/cache/conftool/dbconfig/20251013-092903-marostegui.json
  • 09:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83783 and previous config saved to /var/cache/conftool/dbconfig/20251013-092152-root.json
  • 09:15 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 09:11 kostajh: UTC morning deploys done
  • 09:10 kharlan@deploy2002: Finished scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) (duration: 09m 19s)
  • 09:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83782 and previous config saved to /var/cache/conftool/dbconfig/20251013-090647-root.json
  • 09:06 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:05 kharlan@deploy2002: kharlan: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:01 kharlan@deploy2002: Started scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925)
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 08:10 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:09 kharlan@deploy2002: kharlan: Backport for Fix locally failing QUnit tests (T406615) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83776 and previous config saved to /var/cache/conftool/dbconfig/20251013-080837-root.json
  • 08:04 kharlan@deploy2002: Started scap sync-world: Backport for Fix locally failing QUnit tests (T406615)
  • 08:04 kharlan@deploy2002: Finished scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis (duration
  • 07:57 kharlan@deploy2002: revi, kharlan, dcausse: Continuing with sync
  • 07:55 kharlan@deploy2002: revi, kharlan, dcausse: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis synced to t
  • 07:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83773 and previous config saved to /var/cache/conftool/dbconfig/20251013-075331-root.json
  • 07:49 kharlan@deploy2002: Started scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis
  • 07:46 mszwarc@deploy2002: Finished scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883) (duration: 37m 46s)
  • 07:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83772 and previous config saved to /var/cache/conftool/dbconfig/20251013-073825-root.json
  • 07:33 mszwarc@deploy2002: mszwarc: Continuing with sync
  • 07:33 mszwarc@deploy2002: mszwarc: Backport for arbcom_plwiki: Change favicon (T406883) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:23 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83771 and previous config saved to /var/cache/conftool/dbconfig/20251013-072320-root.json
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1199 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83770 and previous config saved to /var/cache/conftool/dbconfig/20251013-071521-marostegui.json
  • 07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:08 mszwarc@deploy2002: Started scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883)
  • 06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83769 and previous config saved to /var/cache/conftool/dbconfig/20251013-063046-root.json
  • 06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83768 and previous config saved to /var/cache/conftool/dbconfig/20251013-061540-root.json
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83767 and previous config saved to /var/cache/conftool/dbconfig/20251013-060034-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83766 and previous config saved to /var/cache/conftool/dbconfig/20251013-054551-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83765 and previous config saved to /var/cache/conftool/dbconfig/20251013-054528-root.json
  • 05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1238 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83764 and previous config saved to /var/cache/conftool/dbconfig/20251013-053723-marostegui.json
  • 05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83763 and previous config saved to /var/cache/conftool/dbconfig/20251013-053045-root.json
  • 05:20 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
  • 05:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83762 and previous config saved to /var/cache/conftool/dbconfig/20251013-051540-root.json
  • 05:06 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
  • 05:06 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
  • 05:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83760 and previous config saved to /var/cache/conftool/dbconfig/20251013-050034-root.json
  • 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1241 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83759 and previous config saved to /var/cache/conftool/dbconfig/20251013-045230-marostegui.json
  • 04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
  • 04:49 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
  • 04:49 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1050].eqiad.wmnet with reason: Cloning
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-12

  • 01:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 09s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-11

  • 12:34 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 09:35 hashar@deploy2002: Finished deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58 (duration: 00m 11s)
  • 09:35 hashar@deploy2002: Started deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-10

  • 21:16 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 21:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 21:00 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 20:57 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 17:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 16:50 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:50 rzl@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:49 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 16:48 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 16:48 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 16:47 rzl@deploy1003: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/termbox: apply
  • 16:45 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 16:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:42 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:41 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:41 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:40 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:39 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:38 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:38 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:35 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:34 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:33 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:31 rzl@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 16:26 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 16:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 16:19 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:19 rzl@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:16 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:16 rzl@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 16:15 rzl@deploy1003: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 16:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 16:11 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:11 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:10 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:10 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:07 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:07 rzl@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 16:06 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:06 rzl@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:05 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 16:05 rzl@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 16:04 rzl@deploy1003: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:03 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 16:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 16:00 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 16:00 rzl@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 15:59 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:58 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:56 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:56 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:39 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
  • 15:10 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
  • 14:45 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83756 and previous config saved to /var/cache/conftool/dbconfig/20251010-141326-root.json
  • 14:06 elukey@cumin1003: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest2001.codfw.wmnet: Renew puppet certificate - elukey@cumin1003
  • 14:03 bking@dns1004: END - running authdns-update
  • 14:02 bking@dns1004: START - running authdns-update
  • 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83755 and previous config saved to /var/cache/conftool/dbconfig/20251010-135820-root.json
  • 13:56 ejegg: donorwiki upgraded from 73c34ea4 to d903982c
  • 13:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83754 and previous config saved to /var/cache/conftool/dbconfig/20251010-134314-root.json
  • 13:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83753 and previous config saved to /var/cache/conftool/dbconfig/20251010-132808-root.json
  • 13:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1242 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83752 and previous config saved to /var/cache/conftool/dbconfig/20251010-132003-marostegui.json
  • 13:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 13:17 fabfur: revert haproxykafka to v0.3.16 on cp5021 and cp7001 (T404427)
  • 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83750 and previous config saved to /var/cache/conftool/dbconfig/20251010-120643-root.json
  • 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83749 and previous config saved to /var/cache/conftool/dbconfig/20251010-115138-root.json
  • 11:36 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83748 and previous config saved to /var/cache/conftool/dbconfig/20251010-113632-root.json
  • 11:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83747 and previous config saved to /var/cache/conftool/dbconfig/20251010-112126-root.json
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es2 eqiad master to es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83746 and previous config saved to /var/cache/conftool/dbconfig/20251010-111653-marostegui.json
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es1 eqiad master to es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83745 and previous config saved to /var/cache/conftool/dbconfig/20251010-111630-marostegui.json
  • 11:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es3 eqiad master to es1028 T406488', diff saved to https://phabricator.wikimedia.org/P83744 and previous config saved to /var/cache/conftool/dbconfig/20251010-111605-marostegui.json
  • 11:15 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:15 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:14 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:13 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1243 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83743 and previous config saved to /var/cache/conftool/dbconfig/20251010-111306-marostegui.json
  • 11:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 11:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83742 and previous config saved to /var/cache/conftool/dbconfig/20251010-111020-root.json
  • 10:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83741 and previous config saved to /var/cache/conftool/dbconfig/20251010-105514-root.json
  • 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83740 and previous config saved to /var/cache/conftool/dbconfig/20251010-104008-root.json
  • 10:33 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:32 vgutierrez: restarting acme-chief and nginx on acme-chief instances
  • 10:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83739 and previous config saved to /var/cache/conftool/dbconfig/20251010-102502-root.json
  • 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1248 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83738 and previous config saved to /var/cache/conftool/dbconfig/20251010-101720-marostegui.json
  • 10:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83737 and previous config saved to /var/cache/conftool/dbconfig/20251010-062406-root.json
  • 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
  • 06:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83734 and previous config saved to /var/cache/conftool/dbconfig/20251010-060900-root.json
  • 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83731 and previous config saved to /var/cache/conftool/dbconfig/20251010-055354-root.json
  • 05:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83728 and previous config saved to /var/cache/conftool/dbconfig/20251010-053848-root.json
  • 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1249 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83727 and previous config saved to /var/cache/conftool/dbconfig/20251010-053040-marostegui.json
  • 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
  • 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-09

  • 23:10 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.*
  • 22:11 inflatador: bking@wdqs10(18|19|20) systemctl start load-categories-daily.service T405978
  • 22:05 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1019.eqiad.wmnet
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1020.eqiad.wmnet
  • 22:04 jdlrobson@deploy2002: Finished scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) (duration: 41m 38s)
  • 22:00 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1018.eqiad.wmnet
  • 21:51 dwisehaupt: started staging db restore in root screen session on frdb1006. restoring from db backups on 20251008
  • 21:51 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 21:47 jdlrobson@deploy2002: jdlrobson: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:25 TimStarling: on db2202 cleaned up the tables I created for T400696
  • 21:22 jdlrobson@deploy2002: Started scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390)
  • 21:20 wfan: payments-wiki upgraded from 028a0225 to d903982c
  • 20:58 reedy@deploy2002: Finished scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) (duration: 20m 04s)
  • 20:53 reedy@deploy2002: reedy, sbassett: Continuing with sync
  • 20:46 Daimona: Run createAndPromote as in P83722#336349 (~100x, in series) to restore event-organizer membership # T401445
  • 20:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:42 reedy@deploy2002: reedy, sbassett: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 reedy@deploy2002: Started scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644)
  • 20:32 mutante: logmsgbot do you still log - test log T284123
  • 20:29 mutante: re-enabled QoS on gerrit servers - with previously stable config - T406774 gerrit:1194811
  • 20:28 reedy@deploy2002: Finished scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501) (duration: 10m 19s)
  • 20:25 mutante: re-enabling QoS on gerrit servers - with previously stable config - T406774
  • 20:24 reedy@deploy2002: sbassett, reedy: Continuing with sync
  • 20:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 20:23 reedy@deploy2002: sbassett, reedy: Backport for OATHAuth Recovery Code code improvement (T406501) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 20:18 reedy@deploy2002: Started scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501)
  • 20:17 reedy@deploy2002: Finished scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) (duration: 10m 46s)
  • 20:13 reedy@deploy2002: daimona, reedy: Continuing with sync
  • 20:11 reedy@deploy2002: daimona, reedy: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 reedy@deploy2002: Started scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445)
  • 20:04 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:00 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1020.eqiad.wmnet
  • 19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1019.eqiad.wmnet
  • 19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1018.eqiad.wmnet
  • 19:29 eileen: civicrm upgraded from 14cc3125 to 748922f0
  • 19:22 ejegg: donorwiki upgraded from e8ef5539 to 73c34ea4
  • 19:13 ejegg: civicrm upgraded from 132211d5 to 14cc3125
  • 19:04 jforrester@deploy2002: Finished scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates (duration: 11m 39s)
  • 18:59 jforrester@deploy2002: jforrester: Continuing with sync
  • 18:58 jforrester@deploy2002: jforrester: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:53 jforrester@deploy2002: Started scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates
  • 18:36 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 18:36 cmooney@cumin1003: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 18:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 18:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/apertium: apply
  • 17:31 topranks: begin work to move lvs1020 uplink cable from ssw1-f1-eqiad to ssw1-e1-eqiad
  • 17:30 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs1020.eqiad.wmnet with reason: downtime lvs1020 to supress alerts about enp94s0f0np0 going down and losing backend connectivity
  • 17:08 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:04 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
  • 16:47 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
  • 16:38 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 16:33 cwhite: upgrade grafana-loki on grafana hosts T406478
  • 16:30 tgr@deploy2002: Finished scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) (duration: 20m 07s)
  • 16:26 tgr@deploy2002: tgr, d3r1ck01: Continuing with sync
  • 16:18 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:18 sukhe: sukhe@lvs2013:~$ sudo systemctl restart pybal.service
  • 16:14 tgr@deploy2002: tgr, d3r1ck01: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:10 tgr@deploy2002: Started scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)
  • 15:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:57 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcaptcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
  • 15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcatpcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
  • 15:26 sukhe: sukhe@lvs1019:~$ sudo systemctl restart pybal.service
  • 15:25 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:48 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS bookworm
  • 14:47 sukhe: restart pybal on lvs1020
  • 14:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS bookworm
  • 14:42 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:42 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS bullseye
  • 14:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS bookworm
  • 14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:35 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:34 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:31 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:29 hnowlan: rest.php group2-except-enwiki on rest-gateway at 10%
  • 14:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 14:26 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 14:18 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 14:12 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 14:12 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445) (duration: 14m 47s)
  • 14:08 sukhe: restart pybal on lvs1020 to pick up WDQS changes
  • 14:05 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1050.eqiad.wmnet
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS bookworm
  • 14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS bookworm
  • 14:02 Lucas_WMDE: for the record, the `foreachwikiindblist small+medium emptyUserGroup` maintenance script run (for T401445) did *not* work, running the maintenance script separately for small and medium worked better
  • 14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS bookworm
  • 14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 14:00 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Delete the event-organizer user group on medium and small wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:59 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1050.eqiad.wmnet
  • 13:56 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:56 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:55 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445)
  • 13:54 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small+medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 13:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) (duration: 11m 51s)
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:44 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 13:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:43 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:41 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:37 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445)
  • 13:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:34 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:32 esanders@deploy2002: Finished scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary" (duration: 08m 29s)
  • 13:32 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:28 esanders@deploy2002: esanders: Continuing with sync
  • 13:28 esanders@deploy2002: esanders: Backport for Revert "Invalidate Flow cache on enwiktionary" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:24 esanders@deploy2002: Started scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary"
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:21 hashar: Zuul successfully reconnected to Gerrit
  • 13:20 hashar: Closed jenkins-bot connections on Gerrit primary
  • 13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2005.wikimedia.org
  • 13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2005.wikimedia.org with OS trixie
  • 13:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 12:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:58 fabfur: enable puppet on A:cp to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
  • 12:55 arnaudb@dns1004: END - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
  • 12:47 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 12:18 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp2005.wikimedia.org with OS trixie
  • 12:18 fabfur: reloading haproxy on A:cp-eqsin (T404427)
  • 12:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2005.wikimedia.org on all recursors
  • 12:17 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp2005.wikimedia.org on all recursors
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:17 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 12:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp2005.wikimedia.org
  • 12:10 fabfur: enable puppet on A:cp-eqsin to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
  • 12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:06 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:03 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:02 arnaudb@dns1004: START - running authdns-update
  • 11:59 moritzm: installing luajit security updates
  • 11:53 fabfur: disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1194676 on cp5021 (T404427)
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1005.wikimedia.org
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1005.wikimedia.org with OS trixie
  • 11:46 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbprov2007.codfw.wmnet
  • 11:40 jynus@cumin1002: START - Cookbook sre.hosts.reboot-single for host dbprov2007.codfw.wmnet
  • 11:36 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
  • 11:32 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
  • 11:27 ladsgroup@cumin1003: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 11:21 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 11:21 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp1005.wikimedia.org with OS trixie
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1005.wikimedia.org on all recursors
  • 11:18 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp1005.wikimedia.org on all recursors
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:16 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:14 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 11:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp1005.wikimedia.org
  • 10:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:57 moritzm: installing qemu security updates
  • 10:47 cmooney@dns2005: END - running authdns-update
  • 10:46 cmooney@dns2005: START - running authdns-update
  • 10:37 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:17 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:15 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:10 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:09 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:01 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83715 and previous config saved to /var/cache/conftool/dbconfig/20251009-095839-root.json
  • 09:44 kharlan@deploy2002: Finished scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) (duration: 11m 18s)
  • 09:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83713 and previous config saved to /var/cache/conftool/dbconfig/20251009-094333-root.json
  • 09:40 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:39 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:39 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:38 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:37 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 kharlan@deploy2002: kharlan: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:36 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:32 kharlan@deploy2002: Started scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707)
  • 09:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83711 and previous config saved to /var/cache/conftool/dbconfig/20251009-093131-root.json
  • 09:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83709 and previous config saved to /var/cache/conftool/dbconfig/20251009-092827-root.json
  • 09:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:23 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 09:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83708 and previous config saved to /var/cache/conftool/dbconfig/20251009-091626-root.json
  • 09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83707 and previous config saved to /var/cache/conftool/dbconfig/20251009-091322-root.json
  • 09:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1252 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83706 and previous config saved to /var/cache/conftool/dbconfig/20251009-090516-marostegui.json
  • 09:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83705 and previous config saved to /var/cache/conftool/dbconfig/20251009-090120-root.json
  • 08:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:52 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83704 and previous config saved to /var/cache/conftool/dbconfig/20251009-084614-root.json
  • 08:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2179 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83703 and previous config saved to /var/cache/conftool/dbconfig/20251009-083801-marostegui.json
  • 08:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83702 and previous config saved to /var/cache/conftool/dbconfig/20251009-083432-root.json
  • 08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:22 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.22 refs T405678
  • 08:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83701 and previous config saved to /var/cache/conftool/dbconfig/20251009-081926-root.json
  • 08:19 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 08:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 08:12 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044']
  • 08:12 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044']
  • 08:07 kharlan@deploy2002: Finished scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) (duration: 13m 14s)
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83700 and previous config saved to /var/cache/conftool/dbconfig/20251009-080420-root.json
  • 08:03 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:59 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272] (duration: 02m 10s)
  • 07:59 kharlan@deploy2002: kharlan: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:57 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 07:57 joal@deploy2002: Started deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272]
  • 07:56 joal@deploy2002: Finished deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272] (duration: 03m 53s)
  • 07:54 kharlan@deploy2002: Started scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204)
  • 07:53 kharlan@deploy2002: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (scap version: 4.213.0) (duration: 00m 00s)
  • 07:53 joal@deploy2002: Started deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272]
  • 07:52 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272] (duration: 00m 54s)
  • 07:51 joal@deploy2002: Started deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272]
  • 07:49 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83699 and previous config saved to /var/cache/conftool/dbconfig/20251009-074914-root.json
  • 07:47 kharlan@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream (duration: 11m 53s)
  • 07:43 kharlan@deploy2002: kharlan, bearloga: Continuing with sync
  • 07:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 07:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2147 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83698 and previous config saved to /var/cache/conftool/dbconfig/20251009-074055-marostegui.json
  • 07:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 07:40 kharlan@deploy2002: kharlan, bearloga: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:35 kharlan@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream
  • 07:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
  • 07:29 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) (duration: 11m 54s)
  • 07:25 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:24 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
  • 07:22 kharlan@deploy2002: kharlan: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 07:17 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204)
  • 07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1029,1034].eqiad.wmnet with reason: Cloning
  • 07:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1034 and es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83697 and previous config saved to /var/cache/conftool/dbconfig/20251009-071430-marostegui.json
  • 07:05 moritzm: installing Redis security updates
  • 06:53 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 06:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
  • 06:48 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 06:39 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83696 and previous config saved to /var/cache/conftool/dbconfig/20251009-063106-root.json
  • 06:28 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1019.*
  • 06:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 13s)
  • 06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
  • 06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 14s)
  • 06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
  • 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83694 and previous config saved to /var/cache/conftool/dbconfig/20251009-061600-root.json
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83691 and previous config saved to /var/cache/conftool/dbconfig/20251009-060054-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83688 and previous config saved to /var/cache/conftool/dbconfig/20251009-054548-root.json
  • 05:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1050 and es1053 depooled T406488', diff saved to https://phabricator.wikimedia.org/P83687 and previous config saved to /var/cache/conftool/dbconfig/20251009-054347-marostegui.json
  • 05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2155 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83686 and previous config saved to /var/cache/conftool/dbconfig/20251009-053730-marostegui.json
  • 05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 05:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
  • 04:13 eileen: civicrm upgraded from 6f24d513 to 132211d5
  • 02:11 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
  • 02:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
  • 01:54 mutante: [wdqs1020:~] $ sudo systemctl restart wdqs-blazegraph
  • 01:32 eileen: civicrm upgraded from 4c13f904 to 6f24d513
  • 01:18 eileen: civicrm upgraded from 2c6fedc8 to 4c13f904
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 20s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-08

  • 23:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 23:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 22:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:25 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
  • 21:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:19 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:13 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978 (duration: 00m 12s)
  • 21:13 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978
  • 21:10 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 20:36 tgr_: UTC late deploys done
  • 20:35 tgr@deploy2002: Finished scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631) (duration: 13m 53s)
  • 20:31 tgr@deploy2002: tgr: Continuing with sync
  • 20:26 tgr@deploy2002: tgr: Backport for Deploy JWT session cookies to group2 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 tgr@deploy2002: Started scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631)
  • 20:19 tgr@deploy2002: Finished scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) (duration: 13m 03s)
  • 20:15 tgr@deploy2002: tgr, kemayo, anzx: Continuing with sync
  • 20:11 tgr@deploy2002: tgr, kemayo, anzx: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 tgr@deploy2002: Started scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422)
  • 20:02 hashar: Disabled Gerrit Apache mod_qos by putting it to be logging only # T406774
  • 19:30 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) (duration: 09m 26s)
  • 19:25 krinkle@deploy2002: krinkle: Continuing with sync
  • 19:25 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:20 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510)
  • 19:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 18:56 ssastry@deploy2002: Finished scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass" (duration: 16m 00s)
  • 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 18:50 ssastry@deploy2002: ssastry: Continuing with sync
  • 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 18:46 ssastry@deploy2002: ssastry: Backport for Revert "Add a DOM version of the TOC markers pass" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:43 hashar: For posterity: October 8th 2025. The day brett and Krinkle are getting rid of the last .m. subdomain.
  • 18:40 ssastry@deploy2002: Started scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass"
  • 18:36 brett: Enable unified mobile routing on en.wikipedia.org rollout complete - T403510
  • 18:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:33 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
  • 18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:31 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
  • 18:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs2017.codfw.wmnet with reason: finish getting host ready for production
  • 18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 17:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 17:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 17:54 swfrench-wmf: completed post-switchover right-sizing of large mediawiki services - T405955
  • 17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 17:45 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:45 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:42 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:42 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:42 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 17:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 17:33 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:32 brett: Enable unified mobile routing on en.wikipedia.org - T403510
  • 17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2052 gradually with 4 steps - Pooling in new host
  • 17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:11 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 16:53 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
  • 16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 16:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 16:37 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2052 gradually with 4 steps - Pooling in new host
  • 16:36 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2052.codfw.wmnet
  • 16:36 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2052.codfw.wmnet
  • 16:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2017.codfw.wmnet with OS bullseye
  • 16:26 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2052 T402859', diff saved to https://phabricator.wikimedia.org/P83675 and previous config saved to /var/cache/conftool/dbconfig/20251008-162623-fceratto.json
  • 16:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 16:10 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 15:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 15:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-launcher1003.eqiad.wmnet with OS bullseye
  • 15:37 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:37 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:33 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 15:16 elukey: reboot ms-be1088 as a test for T404356
  • 15:14 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1088.eqiad.wmnet with reason: testing
  • 15:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 15:11 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: testing
  • 15:05 Lucas_WMDE: UTC afternoon backport+config window do ne
  • 15:03 derick@deploy2002: Finished scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice (duration: 42m 36s)
  • 14:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-launcher1003.eqiad.wmnet with OS bullseye
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 14:50 derick@deploy2002: d3r1ck01, derick: Continuing with sync
  • 14:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 14:47 derick@deploy2002: d3r1ck01, derick: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 14:33 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 14:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 14:20 derick@deploy2002: Started scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice
  • 14:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773) (duration: 14m 0
  • 14:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
  • 14:09 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
  • 14:08 topranks: re-pool esams in dns after cr1-esams restored to normal operation T406705
  • 14:07 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Continuing with sync
  • {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), [[gerrit:1194150|Force OATHManage to be on central domain (T401773)}}
  • 13:56 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773)
  • 13:54 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638) (duration: 44m 23s)
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:42 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Continuing with sync
  • 13:39 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Backport for Disable mobilefrontend on donatewiki (T406638) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2001.codfw.wmnet
  • 13:19 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2001.codfw.wmnet
  • 13:14 jgleeson: civicrm upgraded from 9db8f0d5 to 2c6fedc8
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:10 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638)
  • 13:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2002.codfw.wmnet
  • 13:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2002.codfw.wmnet
  • 12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2005.wikimedia.org
  • 12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2005.wikimedia.org with OS trixie
  • 12:45 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=fywiki --logwiki=metawiki Constable31 Shogeneral # T406731
  • 12:33 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 12:28 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 12:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:24 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:24 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:22 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
  • 12:22 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:15 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ms-be[2083-2084].codfw.wmnet with reason: awaiting controller swap
  • 12:10 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 12:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:10 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2005.wikimedia.org on all recursors
  • 12:09 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test2005.wikimedia.org on all recursors
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:09 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:08 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
  • 12:07 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2005.wikimedia.org
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 12:01 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:59 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:57 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test2005.wikimedia.org
  • 11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
  • 11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:47 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:47 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
  • 11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2078
  • 11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2078
  • 11:40 mvernon@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2078
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:40 mvernon@cumin1002: START - Cookbook sre.dns.wipe-cache ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
  • 11:39 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:39 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 11:34 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
  • 11:34 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS bookworm
  • 11:30 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
  • 11:28 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
  • 11:28 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 11:26 claime: Enabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
  • 11:25 mvernon@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:22 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 11:22 mvernon@cumin1002: START - Cookbook sre.hosts.move-vlan for host ms-be2078
  • 11:22 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
  • 11:19 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
  • 11:09 moritzm: imported megacli into thirdparty/hwraid (upstream repo doesn't cover trixie yet, copied over from bookworm) T391083
  • 10:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS bookworm
  • 10:43 claime: Disabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
  • 10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:34 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 10:33 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:31 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 10:30 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 10:29 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 10:22 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 10:20 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:17 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 10:16 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:15 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:47 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2027.codfw.wmnet onto es2052.codfw.wmnet
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 09:36 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:08 topranks: disable BGP to asw*-esams from cr1-esams as the CR external links are also down
  • 09:02 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: no reason specified, ]
  • 09:02 Emperor: depool esams
  • 09:02 mvernon@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
  • 08:52 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83669 and previous config saved to /var/cache/conftool/dbconfig/20251008-085005-root.json
  • 08:44 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:35 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83667 and previous config saved to /var/cache/conftool/dbconfig/20251008-083459-root.json
  • 08:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:31 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83666 and previous config saved to /var/cache/conftool/dbconfig/20251008-081953-root.json
  • 08:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.22 refs T405678
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83665 and previous config saved to /var/cache/conftool/dbconfig/20251008-080448-root.json
  • 08:03 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 08:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:00 moritzm: installing libxml2 security updates
  • 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2172 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83664 and previous config saved to /var/cache/conftool/dbconfig/20251008-075612-marostegui.json
  • 07:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 07:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:47 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:27 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 07:22 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 07:21 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
  • 07:17 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 07:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83663 and previous config saved to /var/cache/conftool/dbconfig/20251008-071656-marostegui.json
  • 07:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:15 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 06:57 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 06:55 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 06:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 06:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 06:29 moritzm: installing openssl security updates
  • 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T406488', diff saved to https://phabricator.wikimedia.org/P83662 and previous config saved to /var/cache/conftool/dbconfig/20251008-062752-marostegui.json
  • 06:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1030].eqiad.wmnet with reason: Cloning
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1049 and es1051 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83659 and previous config saved to /var/cache/conftool/dbconfig/20251008-062404-marostegui.json
  • 06:12 moritzm: rebalance Ganeti eqiad/D following vmscape reboots
  • 05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
  • 05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
  • 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 04:37 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978 (duration: 00m 14s)
  • 04:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978
  • 03:55 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 02m 01s)
  • 03:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
  • 03:53 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 16m 11s)
  • 03:52 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
  • 03:41 ryankemper@cumin2002: conftool action : GET; selector: name=wdqs1018.eqiad.wmnet
  • 03:38 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
  • 03:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
  • 02:33 eileen: civicrm upgraded from 8228670e to 9db8f0d5
  • 02:27 eileen: civicrm upgraded from 7a81fe1c to 8228670e
  • 02:19 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 02:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:05 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 01:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 01:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 13s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:27 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 00:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 00:09 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.22
  • 00:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply

2025-10-07

  • 23:58 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.21
  • 23:58 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:54 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:50 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:18 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:13 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:46 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:12 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
  • 22:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
  • 22:04 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 22:02 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1020\.eqiad\.wmnet
  • 21:50 bking@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 45s)
  • 21:49 bking@deploy2002: Started deploy [wdqs/wdqs@fea7794]: T405978
  • 21:48 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs1020.eqiad.wmnet with reason: finish getting host ready for production
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:41 tgr_: UTC late deploys done
  • {{safesubst:SAL entry|1=21:40 tgr@deploy2002: Finished scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T405}}
  • 21:36 tgr@deploy2002: tgr: Continuing with sync
  • 21:34 tgr@deploy2002: tgr: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), session: Log cache write flags in `SessionStore::set()` (T405633 T405634) synced
  • {{safesubst:SAL entry|1=21:30 tgr@deploy2002: Started scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T4056}}
  • 21:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 20:58 aaron@deploy2002: Finished scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) (duration: 10m 13s)
  • 20:54 aaron@deploy2002: aaron: Continuing with sync
  • 20:53 aaron@deploy2002: aaron: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:50 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 20:48 aaron@deploy2002: Started scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805)
  • 20:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:45 kharlan@deploy2002: Finished scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) (duration: 11m 05s)
  • 20:41 brett: Enable unified mobile routing on all except en.wikipedia.org - T403510
  • 20:41 kharlan@deploy2002: kharlan: Continuing with sync
  • 20:38 kharlan@deploy2002: kharlan: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:34 kharlan@deploy2002: Started scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342)
  • 20:13 mstyles@deploy2002: Finished scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) (duration: 09m 08s)
  • 20:09 mstyles@deploy2002: mstyles: Continuing with sync
  • 20:08 mstyles@deploy2002: mstyles: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 ejegg: fundraising civicrm upgraded from eac2de65 to 7a81fe1c
  • 20:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 20:04 mstyles@deploy2002: Started scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664)
  • 19:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 19:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 19:01 ejegg: standalone SmashPig upgraded from 86bde4e4 to 32dc5c72
  • 18:09 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1002.wikimedia.org
  • 18:08 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS trixie
  • 17:53 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 17:47 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:34 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
  • 17:34 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:32 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:29 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 17:29 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 17:26 taavi: taavi@apt1002 ~ $ sudo -i reprepro -C thirdparty/tofu update trixie-wikimedia # T405742
  • 17:05 mutante: releases2003 - re-enabling puppet - reacting to monitoring alert - T405352
  • 16:30 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:26 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 16:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha1002.wikimedia.org
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 16:13 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 16:11 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2044.codfw.wmnet with OS bullseye
  • 16:09 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 16:05 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts hcaptcha1002.wikimedia.org
  • 16:04 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
  • 16:03 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:59 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 15:59 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 15:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bullseye
  • 15:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 15:55 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:52 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS bookworm
  • 15:52 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:49 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 15:49 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:47 jasmine@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
  • 15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host hcaptcha1002.wikimedia.org with OS trixie
  • 15:40 jasmine@cumin1003: START - Cookbook sre.dns.netbox
  • 15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 15:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:29 jasmine@cumin1003: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
  • 15:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 15:26 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 hashar@deploy2002: Finished deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin (duration: 00m 11s)
  • 15:23 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 15:23 hashar@deploy2002: Started deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin
  • 15:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 15:11 jasmine_: homer ‘cr*eqiad’ commit "T383227"
  • 15:09 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bookworm
  • 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 15:03 hashar@deploy2002: Finished deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833 (duration: 00m 30s)
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597 (duration: 00m 52s)
  • 15:03 hashar@deploy2002: Started deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833
  • 15:02 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597
  • 15:02 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597 (duration: 00m 31s)
  • 15:01 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597
  • 15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:59 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T406597
  • 14:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 14:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bookworm
  • 14:53 jasmine@deploy2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 14:51 jasmine@dns1004: END - running authdns-update
  • 14:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 14:50 jasmine@dns1004: START - running authdns-update
  • 14:42 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 14:21 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
  • 14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:21 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 14:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:16 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:16 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:04 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:04 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) (duration: 09m 58s)
  • 14:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
  • 14:00 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:00 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 13:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 13:58 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
  • 13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:56 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 13:56 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:55 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 13:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 13:54 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569)
  • 13:52 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
  • 13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1001.wikimedia.org
  • 13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS trixie
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 13:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 13:41 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 13:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 13:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:28 moritzm: rebalance Ganeti codfw/D following vmscape reboots
  • 13:27 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 13:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:17 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS trixie
  • 13:17 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:16 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:16 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
  • 13:14 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:14 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:13 esanders@deploy2002: Finished scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080) (duration: 10m 07s)
  • 13:10 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:10 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
  • 13:09 esanders@deploy2002: esanders: Continuing with sync
  • 13:08 esanders@deploy2002: esanders: Backport for Invalidate Flow cache on enwiktionary (T405080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:05 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 13:03 esanders@deploy2002: Started scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080)
  • 12:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83649 and previous config saved to /var/cache/conftool/dbconfig/20251007-122526-root.json
  • 12:23 moritzm: rebalance Ganeti eqiad/C following vmscape reboots
  • 12:15 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 12:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83647 and previous config saved to /var/cache/conftool/dbconfig/20251007-121020-root.json
  • 11:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83646 and previous config saved to /var/cache/conftool/dbconfig/20251007-115513-root.json
  • 11:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83645 and previous config saved to /var/cache/conftool/dbconfig/20251007-114716-root.json
  • 11:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83644 and previous config saved to /var/cache/conftool/dbconfig/20251007-114007-root.json
  • 11:33 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83643 and previous config saved to /var/cache/conftool/dbconfig/20251007-113210-root.json
  • 11:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
  • 11:27 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
  • 11:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
  • 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83642 and previous config saved to /var/cache/conftool/dbconfig/20251007-112501-root.json
  • 11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:23 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
  • 11:19 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:18 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83640 and previous config saved to /var/cache/conftool/dbconfig/20251007-111704-root.json
  • 11:16 marostegui: Upgrade db1169 (s1) to 10.11.14 T406543
  • 11:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Upgrading
  • 11:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1169 T406543', diff saved to https://phabricator.wikimedia.org/P83639 and previous config saved to /var/cache/conftool/dbconfig/20251007-111438-marostegui.json
  • 11:13 moritzm: imported cas 7.1.6.2 for trixie-wikimedia T406455
  • 11:12 moritzm: imported prometheus-jmx-exporter 0.15.0 for trixie-wikimedia T406455
  • 11:08 moritzm: rebalance Ganeti codfw/C following vmscape reboots
  • 11:07 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:07 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83638 and previous config saved to /var/cache/conftool/dbconfig/20251007-110158-root.json
  • 10:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83637 and previous config saved to /var/cache/conftool/dbconfig/20251007-105337-marostegui.json
  • 10:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 10:44 slyngshede@dns1004: END - running authdns-update
  • 10:43 slyngshede@dns1004: START - running authdns-update
  • 10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
  • 10:38 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
  • 10:31 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:25 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:24 ladsgroup@deploy2002: Finished scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893) (duration: 14m 51s)
  • 10:20 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:19 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:14 ladsgroup@deploy2002: ladsgroup: Backport for mainstash: Disable multiPrimaryMode (T389893) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:09 ladsgroup@deploy2002: Started scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893)
  • 10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
  • 10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
  • 10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
  • 10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
  • 10:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424) (duration: 09m 34s)
  • 10:00 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
  • 09:59 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb] (duration: 01m 05s)
  • 09:58 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb]
  • 09:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
  • 09:57 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy FlaggedRevs from lawikisource (T406424) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:56 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
  • 09:56 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2027.codfw.wmnet onto es2052.codfw.wmnet
  • 09:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1028,1051].eqiad.wmnet with reason: Cloning
  • 09:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2029.codfw.wmnet
  • 09:55 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2029.codfw.wmnet
  • 09:54 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb] (duration: 42m 33s)
  • 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1028 to clone es1051 T406488', diff saved to https://phabricator.wikimedia.org/P83635 and previous config saved to /var/cache/conftool/dbconfig/20251007-095339-marostegui.json
  • 09:52 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424)
  • 09:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2029.codfw.wmnet with reason: Setting up new ES host
  • 09:46 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:33 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1050.eqiad.wmnet with OS bookworm
  • 09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
  • 09:25 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
  • 09:22 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
  • 09:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 09:19 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
  • 09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 09:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:17 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 09:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
  • 09:12 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb]
  • 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repool es1029 and depool es1026 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83634 and previous config saved to /var/cache/conftool/dbconfig/20251007-091011-marostegui.json
  • 09:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1029 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83633 and previous config saved to /var/cache/conftool/dbconfig/20251007-090826-marostegui.json
  • 09:07 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb] (duration: 01m 12s)
  • 09:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 09:06 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb]
  • 09:05 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:04 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:04 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:02 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 08:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:58 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:57 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:53 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83631 and previous config saved to /var/cache/conftool/dbconfig/20251007-085320-root.json
  • 08:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 08:42 topranks: tighten up acl for ssh access on pfw1-codfw T390939
  • 08:41 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83630 and previous config saved to /var/cache/conftool/dbconfig/20251007-083814-root.json
  • 08:37 hashar: Stopped Gerrit on gerrit2003, deleted /srv/gerrit/git/* and restarted a full replication due to bad files ownership # T387833
  • 08:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:27 elukey@cumin1003: START - Cookbook sre.hosts.provision for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83629 and previous config saved to /var/cache/conftool/dbconfig/20251007-082309-root.json
  • 08:20 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 08:17 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.22 refs T405678
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83628 and previous config saved to /var/cache/conftool/dbconfig/20251007-080803-root.json
  • 08:06 moritzm: installing libsndfile security updates
  • 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2210 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83627 and previous config saved to /var/cache/conftool/dbconfig/20251007-080015-marostegui.json
  • 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83626 and previous config saved to /var/cache/conftool/dbconfig/20251007-074342-root.json
  • 07:34 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 07:33 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 07:28 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83625 and previous config saved to /var/cache/conftool/dbconfig/20251007-072837-root.json
  • 07:21 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) (duration: 15m 32s)
  • 07:14 dcausse@deploy2002: dcausse: Continuing with sync
  • 07:13 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83624 and previous config saved to /var/cache/conftool/dbconfig/20251007-071331-root.json
  • 07:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 07:11 dcausse@deploy2002: dcausse: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:10 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1050.eqiad.wmnet with OS bookworm
  • 07:05 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858)
  • 06:58 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83623 and previous config saved to /var/cache/conftool/dbconfig/20251007-065825-root.json
  • 06:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2219 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83622 and previous config saved to /var/cache/conftool/dbconfig/20251007-065019-marostegui.json
  • 06:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 06:44 kart_: Updated cxserver to 2025-10-06-084053-production (T394982, T403574)
  • 06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 06:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:40 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 06:35 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83621 and previous config saved to /var/cache/conftool/dbconfig/20251007-063014-root.json
  • 06:24 moritzm: rebalance Ganeti eqiad/B following vmscape reboots
  • 06:24 moritzm: rebalance Ganeti codfw/B following vmscape reboots
  • 06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83620 and previous config saved to /var/cache/conftool/dbconfig/20251007-061509-root.json
  • 06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83619 and previous config saved to /var/cache/conftool/dbconfig/20251007-060003-root.json
  • 05:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83618 and previous config saved to /var/cache/conftool/dbconfig/20251007-054457-root.json
  • 05:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2237 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83617 and previous config saved to /var/cache/conftool/dbconfig/20251007-053628-root.json
  • 05:36 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 05:03 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 05:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.19 (duration: 02m 32s)
  • 03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678 (duration: 45m 18s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 28s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:27 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie

2025-10-06

  • 23:35 jdlrobson@deploy2002: Finished scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) (duration: 11m 30s)
  • 23:30 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 23:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 23:28 jdlrobson@deploy2002: jdlrobson: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:23 jdlrobson@deploy2002: Started scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122)
  • 23:13 jdlrobson@deploy2002: Finished scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361) (duration: 09m 47s)
  • 23:08 jdlrobson@deploy2002: jdlrobson, lmora: Continuing with sync
  • 23:07 jdlrobson@deploy2002: jdlrobson, lmora: Backport for Remove old, unused ArticleSummaries Stream (T406361) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:03 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 23:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 23:03 jdlrobson@deploy2002: Started scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361)
  • 22:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 22:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:59 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 21:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 21:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 21:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
  • 21:37 eileen: config revision changed from 65339a1a to 02eee6ac
  • 21:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
  • 21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 21:29 sbassett: Deployed security mitigation for T251032
  • 21:28 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 21:25 eileen: civicrm upgraded from 17092e23 to eac2de65
  • 21:25 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:24 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:14 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:11 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
  • 20:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 20:39 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 20:35 dani@deploy2002: Finished scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 37s)
  • 20:31 dani@deploy2002: dani: Continuing with sync
  • 20:30 dani@deploy2002: dani: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 dani@deploy2002: Started scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577)
  • 20:24 arlolra@deploy2002: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) (duration: 10m 43s)
  • 20:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 20:19 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:19 arlolra@deploy2002: arlolra: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:16 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 20:13 arlolra@deploy2002: Started scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250)
  • 20:10 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) (duration: 14m 13s)
  • 20:04 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 20:04 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:04 samtar@deploy2002: samtar: Continuing with sync
  • 20:04 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
  • 20:01 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:58 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
  • 19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
  • 19:56 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
  • 19:56 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
  • 19:56 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)
  • 19:49 btullis@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
  • 19:45 musikanimal@deploy2002: Finished scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) (duration: 39m 00s)
  • 19:33 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 19:32 musikanimal@deploy2002: musikanimal: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:10 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 19:06 musikanimal@deploy2002: Started scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194)
  • 18:53 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 18:40 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
  • 18:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
  • 18:02 ejegg: fundraising python tools upgraded from 3fba9888 to 698309f1
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
  • 17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
  • 17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
  • 17:42 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 17:42 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:31 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:29 jasmine@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: decom
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
  • 17:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
  • 17:13 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
  • 17:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2051 - Depooling host
  • 17:12 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2051 - Depooling host
  • 16:46 otto@deploy2002: Finished deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666 (duration: 02m 11s)
  • 16:44 otto@deploy2002: Started deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666
  • 16:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
  • 16:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
  • 16:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
  • 16:22 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 16:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:06 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:55 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 59s)
  • 15:55 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
  • 15:53 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 59s)
  • 15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test1005.wikimedia.org
  • 15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1005.wikimedia.org with OS trixie
  • 15:39 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2051 T402859', diff saved to https://phabricator.wikimedia.org/P83607 and previous config saved to /var/cache/conftool/dbconfig/20251006-153927-fceratto.json
  • 15:32 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
  • 15:27 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
  • 15:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2002.wikimedia.org
  • 15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS trixie
  • 15:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
  • 15:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
  • 15:08 moritzm: installing libxslt security updates
  • 15:06 moritzm: installing libcpanel-json-xs-perl security updates
  • 15:03 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host hcaptcha1001.wikimedia.org
  • 14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:56 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test1005.wikimedia.org with OS trixie
  • 14:55 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 14:55 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 14:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 14:51 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 14:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 14:42 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS trixie
  • 14:42 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:40 marostegui@cumin1003: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:40 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2002.wikimedia.org on all recursors
  • 14:40 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2002.wikimedia.org on all recursors
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1028.eqiad.wmnet with reason: Maintenance
  • 14:39 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:38 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 14:37 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:36 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:36 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 14:36 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2002.wikimedia.org
  • 14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2001.wikimedia.org
  • 14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS trixie
  • 14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
  • 14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
  • 14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:19 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:17 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: namespaceDupes diqwiki --fix # T328207
  • 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) (duration: 11m 31s)
  • 14:13 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Continuing with sync
  • 14:06 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
  • 14:06 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
  • 14:04 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 14:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480)
  • 13:58 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS trixie
  • 13:53 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 cdanis@deploy2002: Finished scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) (duration: 12m 24s)
  • 13:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2001.wikimedia.org on all recursors
  • 13:52 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2001.wikimedia.org on all recursors
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 13:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 13:48 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
  • 13:48 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
  • 13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:47 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:46 cdanis@deploy2002: cdanis, otto: Continuing with sync
  • 13:46 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2001.wikimedia.org
  • 13:46 cdanis@deploy2002: cdanis, otto: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:44 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:44 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
  • 13:39 cdanis@deploy2002: Started scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373)
  • 13:37 bwojtowicz@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:34 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:29 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
  • 13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
  • 13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
  • 13:19 mfossati@deploy2002: Finished scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) (duration: 11m 32s)
  • 13:15 mfossati@deploy2002: mfossati: Continuing with sync
  • 13:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 13:14 mfossati@deploy2002: mfossati: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 13:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 13:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:08 mfossati@deploy2002: Started scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259)
  • 12:55 hashar: Restarting Zuul. Deadlocked due to zombie connections with Gerrit
  • 12:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:43 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest1005
  • 12:43 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:39 arnaudb@dns1004: END - running authdns-update
  • 12:38 arnaudb@dns1004: START - running authdns-update
  • 12:37 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:29 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:29 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:28 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:27 arnaudb@cumin1003: END (ERROR) - Cookbook sre.gerrit.failover (exit_code=97) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:22 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:22 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:20 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:08 moritzm: upgrade Envoy on yarn/turnilo hosts T403663
  • 12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:07 hashar: stopped CI Jenkins
  • 12:07 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:05 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:05 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:05 arnaudb@dns1004: START - running authdns-update
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 11:25 Amir1: dropping interwiki table on group2 (T397367)
  • 11:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 11:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 11:17 Amir1: dropping interwiki table on group1 (T397367)
  • 11:15 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 10:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 10:53 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 214657
  • 10:52 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 214657
  • 10:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:42 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey: upgraded spicerack to 11.10.0 on all cumin nodes
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1005.wikimedia.org on all recursors
  • 10:40 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test1005.wikimedia.org on all recursors
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 10:40 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 10:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:39 vgutierrez: upgrading to haproxy 2.8.16 on magru - T406451
  • 10:36 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 10:36 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test1005.wikimedia.org
  • 10:33 moritzm: restarting postfix to pick up openssl security updates
  • 10:26 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-eqiad
  • 10:12 moritzm: restarting spamsasssin/clamav on VRTS to pick up OpenSSL updates
  • 10:12 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:00 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:00 vgutierrez: upgrade to haproxy 2.8.16 on cp7008 and cp7016 - T406451
  • 09:55 vgutierrez: fetch haproxy 2.8.16 on thirdparty/haproxy28-bullseye (apt.wm.o) - T406451
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 09:33 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 09:26 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 09:23 moritzm: upgrade Envoy on schema* T403663
  • 09:18 elukey: uploaded spicerack_11.10.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 08:56 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-eqiad
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 08:40 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 08:09 moritzm: installing OpenSSL security updates on trixie/bookworm
  • 08:07 dcausse: closing the UTC morning backport window
  • 08:06 dcausse@deploy2002: Finished scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) (duration: 12m 48s)
  • 08:01 dcausse@deploy2002: hamishz, dcausse: Continuing with sync
  • 08:00 dcausse@deploy2002: hamishz, dcausse: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:53 dcausse@deploy2002: Started scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858)
  • 07:49 kharlan@deploy2002: Finished scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) (duration: 11m 42s)
  • 07:44 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:43 kharlan@deploy2002: kharlan: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:37 kharlan@deploy2002: Started scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239)
  • 07:34 kharlan@deploy2002: Finished scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) (duration: 14m 04s)
  • 07:32 moritzm: rebalance Ganeti codfw/A following vmscape reboots
  • 07:30 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:26 kharlan@deploy2002: kharlan: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 kharlan@deploy2002: Started scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096)
  • 07:02 kharlan@deploy2002: Finished scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) (duration: 42m 35s)
  • 07:00 moritzm: rebalance Ganeti eqiad/A following vmscape reboots
  • 06:49 kharlan@deploy2002: kharlan: Continuing with sync
  • 06:47 kharlan@deploy2002: kharlan: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:19 kharlan@deploy2002: Started scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466)
  • 06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
  • 06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
  • 06:11 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
  • 06:11 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
  • 05:43 marostegui@dns1006: END - running authdns-update
  • 05:41 marostegui@dns1006: START - running authdns-update
  • 04:49 eileen: civicrm upgraded from ff529ecf to 17092e23
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 31s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-05

  • 23:50 eileen: civicrm upgraded from 7c31a25c to ff529ecf
  • 23:19 eileen: config revision changed from 0d78c876 to 276d34f0
  • 01:02 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 24s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-04

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 44s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-03

  • 19:37 mutante: LDAP added user btracy to group wmf T405366
  • 19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 18:50 ejegg: payments-wiki upgraded from e8ef5539 to 4b8293df
  • 18:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 18:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:33 jasmine@dns1004: END - running authdns-update
  • 17:31 jasmine@dns1004: START - running authdns-update
  • 17:30 jasmine@dns1004: START - running authdns-update
  • 17:27 jasmine@dns1004: START - running authdns-update
  • 17:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 17:03 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 17:02 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 16:59 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 15:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 15:44 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
  • 15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
  • 15:37 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
  • 15:37 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
  • 15:27 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 13:37 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 stevemunene@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:16 stevemunene@cumin1003: START - Cookbook sre.dns.netbox
  • 13:11 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:08 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:08 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:07 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:02 logmsgbot: reedy Deployed security patch for T406322
  • 12:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:12 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:57 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:15 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:27 topranks: reset PIC 1/0 on cr2-eqiad to configure port 5 speed T402588
  • 10:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,cr2-eqord,cr1-magru,ssw1-f1-eqiad with reason: reset PIC 0/1 in cr2 to set port 5 speed
  • 10:21 topranks: drain traffic from cr2-codfw <-> ssw1-f1-codfw link to allow for cr2-codfw card reset T402588
  • 10:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 10:14 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:14 topranks: drain transport circuits on PIC 1/0 of cr2-eqiad to allow for card reboot T402588
  • 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts druid1008.eqiad.wmnet
  • 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 10:09 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:02 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:01 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1008.eqiad.wmnet
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1007.eqiad.wmnet
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:59 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 09:56 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:55 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
  • 09:48 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1007.eqiad.wmnet
  • 09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 09:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 09:40 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:33 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 09:27 jynus@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:11 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:07 jynus@cumin1003: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 09:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 08:59 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:46 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2028.codfw.wmnet onto es2051.codfw.wmnet
  • 08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
  • 08:44 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:44 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:33 brouberol@cumin1003: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 08:31 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:29 brouberol@cumin1003: START - Cookbook sre.wdqs.restart
  • 08:25 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:25 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:05 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
  • 07:51 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 07:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 07:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 07:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:40 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:38 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:16 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:16 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:12 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 04:47 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 04:41 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 04:40 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 04:32 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 03:36 tstarling@deploy2002: Finished scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) (duration: 11m 15s)
  • 03:31 tstarling@deploy2002: tstarling: Continuing with sync
  • 03:30 tstarling@deploy2002: tstarling: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 03:24 tstarling@deploy2002: Started scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192)
  • 01:30 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 12s)
  • 01:03 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 01:02 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:57 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:56 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:49 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:49 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:43 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:38 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:32 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm

2025-10-02

  • 23:24 samwilson@deploy2002: Finished scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang (duration: 16m 07s)
  • 23:20 samwilson@deploy2002: samwilson: Continuing with sync
  • 23:10 samwilson@deploy2002: samwilson: Backport for Fetch wikitext from the translation lang subpage, not the baselang synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:08 samwilson@deploy2002: Started scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang
  • 22:46 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 22:15 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:53 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) (duration: 12m 37s)
  • 21:47 zabe@deploy2002: zabe: Continuing with sync
  • 21:46 zabe@deploy2002: zabe: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:41 zabe@deploy2002: Started scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951)
  • 21:37 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:30 ejegg: donorwiki upgraded from dc7cda24 to e8ef5539
  • 21:30 ejegg: payments-wiki upgraded from 2b281477 to e8ef5539
  • 21:27 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:27 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:26 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:25 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:25 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:17 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) (duration: 12m 35s)
  • 21:12 samtar@deploy2002: samtar: Continuing with sync
  • 21:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:08 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:04 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:04 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)
  • 21:03 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:47 ebomani@deploy2002: Finished scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) (duration: 13m 17s)
  • 20:45 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 20:42 ebomani@deploy2002: reedy, ebomani: Continuing with sync
  • 20:40 ebomani@deploy2002: reedy, ebomani: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:40 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 20:33 ebomani@deploy2002: Started scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936)
  • 20:30 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) (duration: 09m 29s)
  • 20:30 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:26 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 20:25 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83587 and previous config saved to /var/cache/conftool/dbconfig/20251002-202536-ladsgroup.json
  • 20:23 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:23 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:21 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858)
  • 20:16 dani@deploy2002: Finished scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410) (duration: 11m 29s)
  • 20:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s1 in eqiad', diff saved to https://phabricator.wikimedia.org/P83586 and previous config saved to /var/cache/conftool/dbconfig/20251002-201611-ladsgroup.json
  • 20:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83585 and previous config saved to /var/cache/conftool/dbconfig/20251002-201532-ladsgroup.json
  • 20:15 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:12 dani@deploy2002: dani: Continuing with sync
  • 20:11 dani@deploy2002: dani: Backport for Deploy reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s8 in eqiad', diff saved to https://phabricator.wikimedia.org/P83584 and previous config saved to /var/cache/conftool/dbconfig/20251002-200948-ladsgroup.json
  • 20:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83583 and previous config saved to /var/cache/conftool/dbconfig/20251002-200621-ladsgroup.json
  • 20:05 dani@deploy2002: Started scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410)
  • 20:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83582 and previous config saved to /var/cache/conftool/dbconfig/20251002-200354-ladsgroup.json
  • 20:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83581 and previous config saved to /var/cache/conftool/dbconfig/20251002-200143-ladsgroup.json
  • 19:59 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 19:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83580 and previous config saved to /var/cache/conftool/dbconfig/20251002-195426-ladsgroup.json
  • 19:49 samtar@deploy2002: Finished scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) (duration: 10m 46s)
  • 19:44 samtar@deploy2002: samtar: Continuing with sync
  • 19:44 samtar@deploy2002: samtar: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:38 samtar@deploy2002: Started scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575)
  • 19:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83579 and previous config saved to /var/cache/conftool/dbconfig/20251002-193217-ladsgroup.json
  • 19:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83578 and previous config saved to /var/cache/conftool/dbconfig/20251002-192928-ladsgroup.json
  • 19:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83577 and previous config saved to /var/cache/conftool/dbconfig/20251002-192726-ladsgroup.json
  • 19:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83576 and previous config saved to /var/cache/conftool/dbconfig/20251002-191918-ladsgroup.json
  • 19:14 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 19:11 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 19:08 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 18:58 ladsgroup@deploy2002: Finished scap sync-world: Backport for db-production: Enable shuffle sharding (T405087) (duration: 22m 32s)
  • 18:53 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy2002: ladsgroup: Backport for db-production: Enable shuffle sharding (T405087) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:35 ladsgroup@deploy2002: Started scap sync-world: Backport for db-production: Enable shuffle sharding (T405087)
  • 18:27 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 17:50 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:44 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 17:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:40 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:25 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
  • 17:03 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
  • 16:42 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
  • 16:42 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
  • 15:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
  • 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2035.codfw.wmnet
  • 15:42 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2035.codfw.wmnet
  • 15:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:12 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 15:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 14:58 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 14:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:36 topranks: reset PIC 0/1 on cr1-eqiad to set port speed for port 5 T402588
  • 14:36 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,ssw1-e1-eqiad with reason: reset PIC 0/1 in cr1-eqiad to set port 5 speed
  • 14:28 topranks: drain link from cr1-eqiad <-> ssw1-e1-eqiad to allow PIC card reboot on cr1-eqiad T402588
  • 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:25 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
  • 14:25 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 14:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 14:17 topranks: drain transport circuit cr1-eqiad <-> cr1-codfw to allow for PIC card reboot on cr1-eqiad T402588
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
  • 14:10 tgr_: UTC afternoon deploys done
  • 14:10 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 14:08 tgr@deploy2002: Finished scap sync-world: Backport for Enable JWT session cookies on group1 (T399631) (duration: 17m 41s)
  • 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
  • 14:04 tgr@deploy2002: tgr: Continuing with sync
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2046.codfw.wmnet
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
  • 13:58 tgr@deploy2002: tgr: Backport for Enable JWT session cookies on group1 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
  • 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 13:51 tgr@deploy2002: Started scap sync-world: Backport for Enable JWT session cookies on group1 (T399631)
  • 13:47 jforrester@deploy2002: Finished scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (duration: 11m 39s)
  • 13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:44 moritzm: failover Ganeti master in eqiad to ganeti1048
  • 13:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:42 jforrester@deploy2002: jforrester: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:41 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2035']
  • 13:39 jayme@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2035.codfw.wmnet with reason: Hardware failure
  • 13:35 jforrester@deploy2002: Started scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator"
  • 13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) (duration: 12m 56s)
  • 13:29 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
  • 13:27 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:23 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808)
  • 13:17 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
  • 13:16 dani@deploy2002: Finished scap sync-world: Backport for Update reader foundational survey on enwiki (T405410) (duration: 11m 54s)
  • 13:11 dani@deploy2002: dani: Continuing with sync
  • 13:11 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:10 dani@deploy2002: dani: Backport for Update reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:04 dani@deploy2002: Started scap sync-world: Backport for Update reader foundational survey on enwiki (T405410)
  • 12:57 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2035']
  • 12:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
  • 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 12:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 12:10 moritzm: failover Ganeti master in codfw to ganeti2048
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 12:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
  • 12:06 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
  • 12:06 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2028.codfw.wmnet onto es2051.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 11:45 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on druid[1007-1008].eqiad.wmnet with reason: Decommissioning druid_public hosts
  • 11:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:35 moritzm: failover Ganeti master in drmrs02 to ganeti6002
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 11:21 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:20 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:18 moritzm: installing postgresql security updates on netboxdb nodes
  • 11:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6003.drmrs.wmnet
  • 11:12 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 11:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:02 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:57 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 10:52 zabe@deploy2002: Finished scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" (duration: 11m 06s)
  • 10:48 moritzm: failover Ganeti master in drmrs01 to ganeti6001
  • 10:48 zabe@deploy2002: zabe: Continuing with sync
  • 10:47 zabe@deploy2002: zabe: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:41 zabe@deploy2002: Started scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1"
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:02 moritzm: installing OpenSSL security updates on trixie/bookworm
  • 10:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:59 moritzm: failover Ganeti master in eqsin to ganeti5007
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 09:17 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.21 refs T405677
  • 09:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 09:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2051.codfw.wmnet with reason: Setting up new ES host
  • 09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 08:55 awight@deploy2002: Finished scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) (duration: 48m 54s)
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 08:43 awight@deploy2002: awight, hashar: Continuing with sync
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 08:35 hashar@deploy2002: Finished deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 12s)
  • 08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
  • 08:35 hashar@deploy2002: deploy aborted: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 00s)
  • 08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
  • 08:34 awight@deploy2002: awight, hashar: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verif
  • 08:16 brouberol@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-druid1007.eqiad.wmnet with reason: Hosts are being decomissioned
  • 08:10 moritzm: failover Ganeti master in ulsfo to ganeti4008
  • 08:06 awight@deploy2002: Started scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002)
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 08:05 root@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2035.codfw.wmnet
  • 08:02 root@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2035.codfw.wmnet
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 07:54 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 07:45 hashar@deploy2002: Finished scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) (duration: 15m 40s)
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 07:41 hashar@deploy2002: eggroll97, hashar: Continuing with sync
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 07:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:36 hashar@deploy2002: eggroll97, hashar: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 07:29 hashar@deploy2002: Started scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999)
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 07:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:07 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:06 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 06:26 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) (duration: 23m 01s)
  • 06:21 krinkle@deploy2002: krinkle: Continuing with sync
  • 06:09 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:03 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510)
  • 03:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
  • 02:55 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
  • 02:27 musikanimal@deploy2002: Finished scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967) (duration: 13m 47s)
  • 02:22 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 02:20 musikanimal@deploy2002: musikanimal: Backport for Enable debug logging for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:13 musikanimal@deploy2002: Started scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967)
  • 02:02 musikanimal@deploy2002: Finished scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) (duration: 12m 25s)
  • 01:57 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:56 musikanimal@deploy2002: musikanimal: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:50 musikanimal@deploy2002: Started scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967)
  • 01:16 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 38s)
  • 01:02 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 01:02 musikanimal@deploy2002: Finished scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) (duration: 11m 14s)
  • 00:57 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 00:57 musikanimal@deploy2002: musikanimal: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:50 musikanimal@deploy2002: Started scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967)
  • 00:29 musikanimal@deploy2002: Finished scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967) (duration: 13m 30s)
  • 00:22 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 00:22 musikanimal@deploy2002: musikanimal: Backport for Increase timeout for MessageIndex lock (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:15 musikanimal@deploy2002: Started scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967)

2025-10-01

  • 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 23:14 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 22:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 22:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 22:52 bvibber@deploy2002: Finished scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) (duration: 40m 32s)
  • 22:40 bvibber@deploy2002: egardner, bvibber: Continuing with sync
  • 22:39 bvibber@deploy2002: egardner, bvibber: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
  • 22:13 TimStarling: migrating wishes to CommunityRequests with migrateFromGadget.php
  • 22:12 bvibber@deploy2002: Started scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398)
  • 22:08 tstarling@deploy2002: Finished scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) (duration: 10m 42s)
  • 22:04 tstarling@deploy2002: musikanimal, tstarling: Continuing with sync
  • 22:02 tstarling@deploy2002: musikanimal, tstarling: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:57 tstarling@deploy2002: Started scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967)
  • 21:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:56 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:39 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:36 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:35 jforrester@deploy2002: Finished scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) (duration: 09m 39s)
  • 21:31 jforrester@deploy2002: jforrester: Continuing with sync
  • 21:30 jforrester@deploy2002: jforrester: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:25 jforrester@deploy2002: Started scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682)
  • 21:18 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:17 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:17 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:16 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:15 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 tstarling@deploy2002: Finished scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967) (duration: 07m 36s)
  • 21:15 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:11 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:11 tstarling@deploy2002: tstarling: Continuing with sync
  • 21:10 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:10 tstarling@deploy2002: tstarling: Backport for Configure CommunityRequests virtual domain (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:10 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:09 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:07 tstarling@deploy2002: Started scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967)
  • 21:07 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:06 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 arlolra@deploy2002: Finished scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension" (duration: 09m 47s)
  • 21:00 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:59 arlolra@deploy2002: arlolra: Backport for Revert "Add parsoid support in ProofreadPage extension" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 arlolra@deploy2002: Started scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension"
  • 20:51 derick@deploy2002: Finished scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" (duration: 12m 46s)
  • 20:46 derick@deploy2002: d3r1ck01, derick: Continuing with sync
  • 20:44 derick@deploy2002: d3r1ck01, derick: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 derick@deploy2002: Started scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis"
  • 20:34 derick@deploy2002: Finished scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) (duration: 12m 57s)
  • 20:30 derick@deploy2002: derick, d3r1ck01: Continuing with sync
  • 20:27 derick@deploy2002: derick, d3r1ck01: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 derick@deploy2002: Started scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)
  • 19:49 mutante: cloud
  • 19:13 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239) (duration: 26m 24s)
  • 19:11 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:08 kharlan@deploy2002: kharlan: Continuing with sync
  • 18:53 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable A/B test for frwiki (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:46 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239)
  • 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.21 refs T405677
  • 16:39 swfrench@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 16:34 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:33 swfrench@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • {{safesubst:SAL entry|1=16:23 kharlan@deploy2002: Finished scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Sim}}
  • 16:19 kharlan@deploy2002: kharlan: Continuing with sync
  • {{safesubst:SAL entry|1=16:17 kharlan@deploy2002: kharlan: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|SimpleCaptcha::canSk}}
  • 16:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • {{safesubst:SAL entry|1=16:10 kharlan@deploy2002: Started scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Simp}}
  • 16:07 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:07 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:57 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:51 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:35 claime: Finished eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:28 cgoubert@deploy2002: Finished scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 03m 16s)
  • 15:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1259 after maint T401906', diff saved to https://phabricator.wikimedia.org/P83573 and previous config saved to /var/cache/conftool/dbconfig/20251001-152620-ladsgroup.json
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:07 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 15:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:04 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:49 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 14:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:43 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:43 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:37 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 14:34 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 14:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 14:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 14:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 14:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 14:28 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 14:28 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 14:25 cgoubert@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 201m 05s)
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 14:22 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 14:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 14:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 14:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 14:20 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 14:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 14:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 14:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=codfw
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
  • 14:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 14:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 14:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 14:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 14:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83572 and previous config saved to /var/cache/conftool/dbconfig/20251001-140538-fceratto.json
  • 14:05 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 14:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
  • 14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83571 and previous config saved to /var/cache/conftool/dbconfig/20251001-140422-fceratto.json
  • 14:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83570 and previous config saved to /var/cache/conftool/dbconfig/20251001-140400-fceratto.json
  • 14:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 14:01 cgoubert@cumin1003: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=toolhub.*
  • 14:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 13:56 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs2016\.codfw\.wmnet
  • 13:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 13:51 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 239 hosts with reason: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 13:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83569 and previous config saved to /var/cache/conftool/dbconfig/20251001-134852-fceratto.json
  • 13:46 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 13:44 SandraEbele_: Deployed refinery-source using jenkins(weekly deployment train)
  • 13:44 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
  • 13:44 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
  • 13:35 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 13:35 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:34 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:34 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83568 and previous config saved to /var/cache/conftool/dbconfig/20251001-133344-fceratto.json
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:31 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:29 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:28 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83566 and previous config saved to /var/cache/conftool/dbconfig/20251001-131836-fceratto.json
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83565 and previous config saved to /var/cache/conftool/dbconfig/20251001-131719-fceratto.json
  • 13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 13:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 13:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83564 and previous config saved to /var/cache/conftool/dbconfig/20251001-131639-fceratto.json
  • 13:13 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1172 after upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83563 and previous config saved to /var/cache/conftool/dbconfig/20251001-131033-ladsgroup.json
  • 13:07 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1258* gradually with 4 steps - Work done
  • 13:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83561 and previous config saved to /var/cache/conftool/dbconfig/20251001-130131-fceratto.json
  • 12:56 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
  • 12:53 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
  • 12:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1172 for upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83559 and previous config saved to /var/cache/conftool/dbconfig/20251001-125120-ladsgroup.json
  • 12:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Upgrade to 10.11
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83558 and previous config saved to /var/cache/conftool/dbconfig/20251001-124622-fceratto.json
  • 12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83556 and previous config saved to /var/cache/conftool/dbconfig/20251001-123115-fceratto.json
  • 12:31 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
  • 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83555 and previous config saved to /var/cache/conftool/dbconfig/20251001-122959-fceratto.json
  • 12:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 12:29 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
  • 12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83554 and previous config saved to /var/cache/conftool/dbconfig/20251001-122936-fceratto.json
  • 12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 12:21 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1258* gradually with 4 steps - Work done
  • 12:21 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1258.eqiad.wmnet
  • 12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 12:19 mvernon@cumin2002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
  • 12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 12:15 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1258 - Upgrading db1258.eqiad.wmnet
  • 12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1258 - Upgrading db1258.eqiad.wmnet
  • 12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1258.eqiad.wmnet
  • 12:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83552 and previous config saved to /var/cache/conftool/dbconfig/20251001-121429-fceratto.json
  • 12:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1258 T406116', diff saved to https://phabricator.wikimedia.org/P83551 and previous config saved to /var/cache/conftool/dbconfig/20251001-121339-ladsgroup.json
  • 12:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:11 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1255 to x3 primary T406116', diff saved to https://phabricator.wikimedia.org/P83550 and previous config saved to /var/cache/conftool/dbconfig/20251001-120629-ladsgroup.json
  • 12:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:06 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 Amir1: Starting x3 eqiad failover from db1258 to db1255 - T406116
  • 12:05 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1255 with weight 0 T406116', diff saved to https://phabricator.wikimedia.org/P83549 and previous config saved to /var/cache/conftool/dbconfig/20251001-120140-ladsgroup.json
  • 12:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x3 T406116
  • 11:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83548 and previous config saved to /var/cache/conftool/dbconfig/20251001-115922-fceratto.json
  • 11:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:58 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:48 cgoubert@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83547 and previous config saved to /var/cache/conftool/dbconfig/20251001-114414-fceratto.json
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83546 and previous config saved to /var/cache/conftool/dbconfig/20251001-114259-fceratto.json
  • 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 11:42 hnowlan: manually bumped thumbor replicas in codfw to 140
  • 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83545 and previous config saved to /var/cache/conftool/dbconfig/20251001-114214-fceratto.json
  • 11:41 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
  • 11:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:39 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:29 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83544 and previous config saved to /var/cache/conftool/dbconfig/20251001-112707-fceratto.json
  • 11:25 Amir1: dropping two unused tables in phabricator db (T403542)
  • 11:18 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=codfw
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83542 and previous config saved to /var/cache/conftool/dbconfig/20251001-111159-fceratto.json
  • 11:05 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=toolhub.*
  • 11:04 cgoubert@cumin1003: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool toolhub in eqiad: maintenance
  • 11:04 cgoubert@cumin1003: START - Cookbook sre.discovery.service-route depool toolhub in eqiad: maintenance
  • 11:03 cgoubert@deploy2002: Locking from deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 11:03 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 11:00 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83541 and previous config saved to /var/cache/conftool/dbconfig/20251001-105652-fceratto.json
  • 10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83540 and previous config saved to /var/cache/conftool/dbconfig/20251001-105538-fceratto.json
  • 10:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83539 and previous config saved to /var/cache/conftool/dbconfig/20251001-105514-fceratto.json
  • 10:55 claime: Starting eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 10:45 hashar@deploy2002: Finished scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) (duration: 13m 47s)
  • 10:40 hashar@deploy2002: hashar, dreamyjazz: Continuing with sync
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83538 and previous config saved to /var/cache/conftool/dbconfig/20251001-104006-fceratto.json
  • 10:36 hashar@deploy2002: hashar, dreamyjazz: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:31 hashar@deploy2002: Started scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094)
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83537 and previous config saved to /var/cache/conftool/dbconfig/20251001-102458-fceratto.json
  • 10:11 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:11 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:11 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83536 and previous config saved to /var/cache/conftool/dbconfig/20251001-100951-fceratto.json
  • 10:09 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:08 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83535 and previous config saved to /var/cache/conftool/dbconfig/20251001-100837-fceratto.json
  • 10:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83534 and previous config saved to /var/cache/conftool/dbconfig/20251001-100814-fceratto.json
  • 09:59 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) (duration: 15m 47s)
  • 09:54 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83533 and previous config saved to /var/cache/conftool/dbconfig/20251001-095306-fceratto.json
  • 09:50 kharlan@deploy2002: kharlan: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:48 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 09:44 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239)
  • 09:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83532 and previous config saved to /var/cache/conftool/dbconfig/20251001-093758-fceratto.json
  • 09:28 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:28 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83531 and previous config saved to /var/cache/conftool/dbconfig/20251001-092251-fceratto.json
  • 09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83530 and previous config saved to /var/cache/conftool/dbconfig/20251001-092136-fceratto.json
  • 09:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83529 and previous config saved to /var/cache/conftool/dbconfig/20251001-092112-fceratto.json
  • 09:17 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:17 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:14 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:14 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:12 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83528 and previous config saved to /var/cache/conftool/dbconfig/20251001-090604-fceratto.json
  • 08:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 08:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83527 and previous config saved to /var/cache/conftool/dbconfig/20251001-085056-fceratto.json
  • 08:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83526 and previous config saved to /var/cache/conftool/dbconfig/20251001-083549-fceratto.json
  • 08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83525 and previous config saved to /var/cache/conftool/dbconfig/20251001-083435-fceratto.json
  • 08:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83524 and previous config saved to /var/cache/conftool/dbconfig/20251001-083412-fceratto.json
  • 08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83523 and previous config saved to /var/cache/conftool/dbconfig/20251001-081905-fceratto.json
  • 08:13 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 08:10 Emperor: restart swift on ms-fe2012 T360913
  • 08:08 bwojtowicz@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83522 and previous config saved to /var/cache/conftool/dbconfig/20251001-080357-fceratto.json
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83521 and previous config saved to /var/cache/conftool/dbconfig/20251001-074850-fceratto.json
  • 07:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83520 and previous config saved to /var/cache/conftool/dbconfig/20251001-074736-fceratto.json
  • 07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:10 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) (duration: 14m 09s)
  • 07:05 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:02 kharlan@deploy2002: kharlan: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:55 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239)
  • 06:40 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) (duration: 22m 34s)
  • 06:35 kharlan@deploy2002: kharlan: Continuing with sync
  • 06:22 kharlan@deploy2002: kharlan: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:17 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744)
  • 04:54 TimStarling: on x1 metawiki creating tables for CommunityRequests
  • 02:31 musikanimal@deploy2002: Finished scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage (duration: 12m 19s)
  • 02:26 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 02:26 musikanimal@deploy2002: musikanimal: Backport for AbstractRenderer: fix extistence dependency on Votes subpage synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:19 musikanimal@deploy2002: Started scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage
  • 01:52 musikanimal@deploy2002: Finished scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) (duration: 10m 47s)
  • 01:47 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:46 musikanimal@deploy2002: musikanimal: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:41 musikanimal@deploy2002: Started scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748)
  • 01:28 musikanimal@deploy2002: Finished scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) (duration: 10m 53s)
  • 01:23 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:22 musikanimal@deploy2002: musikanimal: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:17 musikanimal@deploy2002: Started scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234)
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 33s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 00:00 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikidata (T403510) (duration: 13m 23s)


Other archives

See Server Admin Log/Archives.