Page MenuHomePhabricator

Prepare our custom installer and the base layer for Trixie
Closed, ResolvedPublic

Description

We'll probably start with alpha2

Trixie is now fully released, on 2025-08-09.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1134187 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] uwsgi: trixie support

https://gerrit.wikimedia.org/r/1134187

Change #1134188 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] wmflib: postgresql_version add trixie

https://gerrit.wikimedia.org/r/1134188

Change #1134189 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] ruby: move to .exist?

https://gerrit.wikimedia.org/r/1134189

Change #1134188 merged by Filippo Giunchedi:

[operations/puppet@production] wmflib: postgresql_version add trixie

https://gerrit.wikimedia.org/r/1134188

Change #1134187 merged by Filippo Giunchedi:

[operations/puppet@production] uwsgi: trixie support

https://gerrit.wikimedia.org/r/1134187

Change #1134189 merged by Filippo Giunchedi:

[operations/puppet@production] ruby: move to .exist?

https://gerrit.wikimedia.org/r/1134189

Change #1138694 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add d-i config for Trixie

https://gerrit.wikimedia.org/r/1138694

Change #1138697 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add pxelinux config for Trixie

https://gerrit.wikimedia.org/r/1138697

Change #1138698 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/cookbooks@master] Add trixie to the list of supported OSes

https://gerrit.wikimedia.org/r/1138698

Change #1138694 merged by Muehlenhoff:

[operations/puppet@production] Add d-i config for Trixie

https://gerrit.wikimedia.org/r/1138694

Change #1138697 merged by Muehlenhoff:

[operations/puppet@production] Add pxelinux config for Trixie

https://gerrit.wikimedia.org/r/1138697

Change #1138698 merged by Muehlenhoff:

[operations/cookbooks@master] Add trixie to the list of supported OSes

https://gerrit.wikimedia.org/r/1138698

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host sretest1001.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host sretest1001.eqiad.wmnet with OS trixie executed with errors:

  • sretest1001 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202504251153_jmm_583340_sretest1001.out, asking the operator what to do
    • First Puppet run failed and the operator aborted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console sretest1001.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1139037 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add trixie to pbuilder setup

https://gerrit.wikimedia.org/r/1139037

Change #1139037 merged by Muehlenhoff:

[operations/puppet@production] Add trixie to pbuilder setup

https://gerrit.wikimedia.org/r/1139037

Mentioned in SAL (#wikimedia-operations) [2025-04-29T07:23:27Z] <moritzm> imported debdeploy 0.0.99.14-1+deb13u1 to apt.wikimedia.org/main for trixie-wikimedia T391083

Mentioned in SAL (#wikimedia-operations) [2025-04-29T07:50:08Z] <moritzm> copied wmf-certificates 1~20230906-1 from bookworm-wikimedia to trixie-wikimedia T391083

Mentioned in SAL (#wikimedia-operations) [2025-04-29T07:53:06Z] <moritzm> copied cadvisor 0.44.0+ds1-1~wmf1 from bookworm-wikimedia to trixie-wikimedia T391083

Change #1141585 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Install linux-sysctl-defaults on trixie

https://gerrit.wikimedia.org/r/1141585

Change #1141585 merged by Muehlenhoff:

[operations/puppet@production] Install linux-sysctl-defaults on trixie

https://gerrit.wikimedia.org/r/1141585

Change #1143703 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Stop installing prometheus-node-exporter on Trixie

https://gerrit.wikimedia.org/r/1143703

Mentioned in SAL (#wikimedia-operations) [2025-05-09T09:50:42Z] <moritzm> imported debmonitor-client 0.4.0-3+deb13u1 for trixie-wikimedia T391083

So the error for the debmonitor client is due by the fact that in /etc/os-release there is no line with VERSION_ID yet, so the client reports unknown as OS and a result the server errors out with:

400 OS name 'unknown' is not valid: {'name': ['The OS name needs to follow the following pattern: ^(Debian( \\d\\d)?|Ubuntu( \\d\\d\\.\\d\\d)?)$']}

Once that reports a proper OS it should be created automatically in the DB, if not we can always add it manually.

So the error for the debmonitor client is due by the fact that in /etc/os-release there is no line with VERSION_ID yet, so the client reports unknown as OS and a result the server errors out with:

400 OS name 'unknown' is not valid: {'name': ['The OS name needs to follow the following pattern: ^(Debian( \\d\\d)?|Ubuntu( \\d\\d\\.\\d\\d)?)$']}

Once that reports a proper OS it should be created automatically in the DB, if not we can always add it manually.

This is now fixed in Debian unstable and should progress to testing soon:
https://tracker.debian.org/news/1645404/accepted-base-files-138-source-into-unstable/

I manually installed that version on sretest1001 and that made debmonitor-client run fine: https://debmonitor.wikimedia.org/hosts/sretest1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-05-13T07:54:00Z] <moritzm> imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083

Change #1143703 merged by Muehlenhoff:

[operations/puppet@production] Stop installing prometheus-ethtool-exporter on Trixie

https://gerrit.wikimedia.org/r/1143703

Mentioned in SAL (#wikimedia-operations) [2025-05-13T08:04:07Z] <moritzm> imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083

Change #1145097 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/pywmflib@master] Add support for Python 3.13 and Debian Trixie

https://gerrit.wikimedia.org/r/1145097

Mentioned in SAL (#wikimedia-operations) [2025-05-13T08:12:03Z] <moritzm> copied prometheus-rsyslog-exporter 1.0.0+git20221110-1 from bookworm-wikimedia to trixie-wikimedia T391083

Change #1145100 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Stop installing dstat on Trixie

https://gerrit.wikimedia.org/r/1145100

Change #1145097 merged by jenkins-bot:

[operations/software/pywmflib@master] Add support for Python 3.13 and Debian Trixie

https://gerrit.wikimedia.org/r/1145097

Change #1145100 merged by Muehlenhoff:

[operations/puppet@production] Stop installing dstat on Trixie

https://gerrit.wikimedia.org/r/1145100

Mentioned in SAL (#wikimedia-operations) [2025-05-13T09:38:02Z] <moritzm> imported confd 0.16.0-1+deb13u0 to trixie-wikimedia T391083

There's some augeas-related output spam on Puppet runs, already reported as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1098696

Change #1145187 had a related patch set uploaded (by Volans; author: Volans):

[operations/puppet@production] debdeploy: add support for Debian Trixie

https://gerrit.wikimedia.org/r/1145187

Change #1145187 merged by Volans:

[operations/puppet@production] debdeploy: add support for Debian Trixie

https://gerrit.wikimedia.org/r/1145187

Change #1145833 had a related patch set uploaded (by Volans; author: Volans):

[operations/software/debmonitor-client@master] Add support for trixie

https://gerrit.wikimedia.org/r/1145833

Change #1145833 merged by jenkins-bot:

[operations/software/debmonitor-client@master] Add support for trixie

https://gerrit.wikimedia.org/r/1145833

Mentioned in SAL (#wikimedia-operations) [2025-05-23T10:31:02Z] <moritzm> importing ferm 2.5.1-4+wmf13u1 T391083

Change #1152246 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] standard_packages: Handle dnsutils/bind9-dnsutils correctly across all supported OSes

https://gerrit.wikimedia.org/r/1152246

Change #1152246 merged by Muehlenhoff:

[operations/puppet@production] standard_packages: Handle dnsutils/bind9-dnsutils correctly across all OSes

https://gerrit.wikimedia.org/r/1152246

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie executed with errors:

  • sretest1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507110802_jmm_1240797_sretest1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console sretest1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Mentioned in SAL (#wikimedia-operations) [2025-07-11T09:25:15Z] <moritzm> imported perccli for trixie-wikimedia T391083

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie

Change #1168145 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] late-command: Check whether qemu_fw_cfg.ko is present

https://gerrit.wikimedia.org/r/1168145

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie executed with errors:

  • sretest1003 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console sretest1003.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1168145 merged by Muehlenhoff:

[operations/puppet@production] late-command: Check whether qemu_fw_cfg.ko is present

https://gerrit.wikimedia.org/r/1168145

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin1003 for host sretest1003.eqiad.wmnet with OS trixie completed:

  • sretest1003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202507111238_jmm_1269185_sretest1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Installations with Trixie are now possible, which directly install the backport of Puppet 7, all known issues affecting the Puppet base classes have been fixed. I'm keeping this bug open to rebase the installer image to newer release candidates, but Trixie can now be used for pilot installations,

Change #1177335 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] debian: Add trixie as a valid codename

https://gerrit.wikimedia.org/r/1177335

Change #1177335 merged by Majavah:

[operations/puppet@production] debian: Add trixie as a valid codename

https://gerrit.wikimedia.org/r/1177335

Change #1178604 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] wmflib: add 8.4 as a valid PHP version string, for trixie support

https://gerrit.wikimedia.org/r/1178604

Change #1178604 merged by Dzahn:

[operations/puppet@production] wmflib: add 8.4 as a valid PHP version string, for trixie support

https://gerrit.wikimedia.org/r/1178604

MoritzMuehlenhoff claimed this task.

Trixie had the initial stable release on Aug 9 and the installer and base system works fine. Closing this task, all further role adaptions can happen via separate, new bugs.

Mentioned in SAL (#wikimedia-operations) [2025-10-08T11:09:47Z] <moritzm> imported megacli into thirdparty/hwraid (upstream repo doesn't cover trixie yet, copied over from bookworm) T391083