Page MenuHomePhabricator

Cleanup collaboration-services WMCS hiera config
Open, MediumPublic

Description

A lot of hiera config is duplicate for our devtools and gitlab-runners project. The hiera config is scattered across cloud.yaml, multiple common.yaml in the projects, host-specific yaml files and ad-hoc configuration in horizon. Some hiera config was added multiple times because of confusion with the dedicated puppet master sync delays.

This should all be consolidated and de-duplicated to make it less confusing and easier to maintain. Also this should make it more clear where WMCS config differs from production.

Remove one of the puppetmasters in devtools (probably puppetmaster-1004 but double check if it's used somewhere).

Event Timeline

I agree! Things that are valid for all projects should be in cloud.yaml, things common for all instances in a project should be in common.yaml of the project and hosts should only have things that are really specific only to this one host.

In my opinion nothing should permanently be in Horizon web UI, aside from very short-term testing. If something is to stay it should be moved to the repo.

The counter argument to that is that it makes things harder for users who don't have +2 in puppet, but for our team this should not matter.

Change #1133992 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] hiera: cleanup gitlab-runner docker gc settings

https://gerrit.wikimedia.org/r/1133992

Change #1133996 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] hiera: cleanup some gerrit and etherpad hiera values

https://gerrit.wikimedia.org/r/1133996

Change #1133996 merged by Dzahn:

[operations/puppet@production] hiera: cleanup some gerrit and etherpad hiera values

https://gerrit.wikimedia.org/r/1133996

Change #1133992 merged by Dzahn:

[operations/puppet@production] hiera: cleanup gitlab-runner docker gc settings

https://gerrit.wikimedia.org/r/1133992

Change #1134723 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] hieradata: add etherpad service_ensure key to devtools project level

https://gerrit.wikimedia.org/r/1134723

Change #1134723 merged by Dzahn:

[operations/puppet@production] hieradata: add etherpad service_ensure key to devtools project level

https://gerrit.wikimedia.org/r/1134723

Change #1135114 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] cloud: re-add gitlab runner docker_gc Hiera settings in cloud.yaml

https://gerrit.wikimedia.org/r/1135114

Change #1135114 merged by Dzahn:

[operations/puppet@production] cloud: re-add gitlab runner docker_gc Hiera settings in cloud.yaml

https://gerrit.wikimedia.org/r/1135114

I also noticed devtools has two puppetmasters: puppetmaster-1003 and puppetmaster-1004. This should also be reduced to one puppetmaster.

I also noticed devtools has two puppetmasters: puppetmaster-1003 and puppetmaster-1004. This should also be reduced to one puppetmaster.

It's a whole saga. also see T382960#10565214 which has a table with information which instance is using which master.

From there -> T382960#10565250 , T382960#10494065 etc

summary, options are:

  • switch all clients currently using -1003 to use -1004, verify things are not broken, shut down -1003
  • verify all clients using -1003 are ok and the expired CA cert issue is gone, shut down -1004 (but it is a newer version of the base image that is not deprecated yet, unlike 1003)
  • for each client question why it's using a local puppetmaster at all, and not just the centralized puppetmaster of cloud. switch those that dont need a local puppetmaster to central puppetmaster. If that happens to be all of them.. stop having ANY local puppetmasters and never deal with stuff like in the linked ticket again

Thanks for the summary! I'd also prefer not to have a local puppetmaster at all. But at least the GitLab runners have a sort-of secret token to register with the GitLab test server. We could try putting the token into Hiera (with an obvious comment so we don't trigger the security team) and see if that causes any issues. In theory, anyone could register a new runner on the GitLab test instance with this token. But the risk is limited and mostly comes down to "we might have to clean up and delete some runners" if it gets out of hand.

So I'd be open to trying a switch back to the default puppetserver in devtools for the runners. In the gitlab-runners WMCS project, we should keep the token secret (meaning in a dedicated puppetmaster).

As a test I tried switching deploy-1006 from puppetmaster-1003 to puppetmaster-1004.

It immediately failed with "certificate verify failed [self signed certificate in certificate chain for CN=Puppet CA".

So unfortunately that is not solved easily yet.

Both are puppet7, puppetserver 7.9.5-2+deb12u1.

Given that no clients currently use puppetmaster-1004 and we want to move away from local puppetmasters where possible, I would say let's just delete puppetmaster-1004 again.

Mentioned in SAL (#wikimedia-cloud) [2025-06-13T18:13:18Z] <mutante> - deleted puppetmaster-1004 after confirming no instance uses it, only puppetmaster-1003 is used and they are the same version (T390948)

We could try putting the token into Hiera (with an obvious comment so we don't trigger the security team) and see if that causes any issues. In theory, anyone could register a new runner on the GitLab test instance with this token. But the risk is limited and mostly comes down to "we might have to clean up and delete some runners" if it gets out of hand.

So I'd be open to trying a switch back to the default puppetserver in devtools for the runners. In the gitlab-runners WMCS project, we should keep the token secret (meaning in a dedicated puppetmaster).

Sounds like a plan! I see one of the runners is already switched back.

Also I deleted puppetmaster-1004. So we have some quota back as well.

Getting back to the origin of this ticket.. I think we need to first decide what we WANT to use. Do we prefer only using the repo and keeping the Horizon web UI Hiera clean? Or do we prefer to move it all to web UI? Since we all dislike the mix.. I guess it has to be one of those.

I am biased towards "repo only". The downside of that is usually that users without root/+2 in gerrit cant change the values easily but for the devtools project specifically that could be ok. I am aware of testers using services here but not so much that they change Hiera values for that.

LSobanski triaged this task as Medium priority.Sep 15 2025, 3:35 PM
LSobanski moved this task from Work in Progress to Backlog on the collaboration-services board.

Change #1190690 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] devtools: clean gitlab hiera data

https://gerrit.wikimedia.org/r/1190690

Change #1190690 merged by Jelto:

[operations/puppet@production] devtools: clean gitlab hiera data

https://gerrit.wikimedia.org/r/1190690