Page MenuHomePhabricator

dcaro (David Caro)
SRE & amauteur yak shaver

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2020, 11:59 AM (250 w, 4 d)
Availability
Available
IRC Nick
dcaro
LDAP User
David Caro
MediaWiki User
DCaro (WMF) [ Global Accounts ]

Recent Activity

Yesterday

dcaro created T402572: [components-api] handle non-passed arguments and defaults consistently.
Thu, Aug 21, 5:20 PM · Toolforge (Toolforge iteration 23)
dcaro created T402569: [jobs-api] handle non-passed arguments and defaults consistently.
Thu, Aug 21, 5:17 PM · Toolforge (Toolforge iteration 23)
dcaro triaged T402568: [components-api] Queue builds when the build queue is full as High priority.
Thu, Aug 21, 5:07 PM · Toolforge (Toolforge iteration 23)
dcaro created T402568: [components-api] Queue builds when the build queue is full.
Thu, Aug 21, 5:06 PM · Toolforge (Toolforge iteration 23)
dcaro added a comment to T401851: [components-api,beta] Image should only be build once when re-used in components.

A simpler option is also doing the queueing on the components-api side, that's probably easier too right now (and does not prevent the rest of solutions), I'll create a subtask for that.

Thu, Aug 21, 5:05 PM · Toolforge (Toolforge iteration 23)
dcaro added a comment to T401172: [jobs-api] make job status an enum, with clearly defined states.

After a live discussion, we agreed to the above with the following minor changes:

Thu, Aug 21, 5:00 PM · Toolforge (Toolforge iteration 23), User-Raymond_Ndibe, cloud-services-team
dcaro added a comment to T401875: [Build service] latest builder has old PHP.

I think we can try updating the 'latest-versions' builder image to add that support, I still have to do a full battery of tests and such.

Thu, Aug 21, 4:08 PM · cloud-services-team, Toolforge
dcaro added a comment to T400957: Job not restarting despite liveness probe failures.

If you have a reproducer would be great, if not, yes, if it happens again and you can leave it for us to inspect would be great, otherwise something like kubectl get pod deployment/itwiki-draftbot-continuous -o yaml, kubectl describe pod deployment/itwiki-draftbot-continuous -o yaml and kubectl get events -o yaml might be helpful for post-debugging.

Thu, Aug 21, 4:05 PM · Toolforge, cloud-services-team
dcaro triaged T401648: [components-api] exclude defaults when getting deployment as Medium priority.
Thu, Aug 21, 3:32 PM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro triaged T401851: [components-api,beta] Image should only be build once when re-used in components as High priority.
Thu, Aug 21, 3:31 PM · Toolforge (Toolforge iteration 23)
dcaro triaged T401374: [components-api] bump the openapi version on every change as Medium priority.
Thu, Aug 21, 3:31 PM · Toolforge (Toolforge iteration 23)
dcaro triaged T401893: [components-api] Allow reusing another component build as High priority.
Thu, Aug 21, 3:31 PM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro triaged T401894: [builds-api] Allow queuing builds as Medium priority.
Thu, Aug 21, 3:31 PM · Toolforge (Toolforge iteration 23)
dcaro changed the status of T401994: [components-api] support port protocol in config from Open to In Progress.
Thu, Aug 21, 3:30 PM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro assigned T401994: [components-api] support port protocol in config to Raymond_Ndibe.
Thu, Aug 21, 3:30 PM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro changed the status of T402377: [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30, a subtask of T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30), from Open to In Progress.
Thu, Aug 21, 3:30 PM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team
dcaro changed the status of T402377: [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30 from Open to In Progress.
Thu, Aug 21, 3:29 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro moved T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30 from In Progress to Next Up on the Toolforge (Toolforge iteration 23) board.
Thu, Aug 21, 3:29 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro changed the status of T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30, a subtask of T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30), from Open to In Progress.
Thu, Aug 21, 3:29 PM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team
dcaro changed the status of T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30 from Open to In Progress.
Thu, Aug 21, 3:29 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro added a comment to T402516: [cookbook,ceph] bootstrap_and_add ceph cookbook failed to add a new single osd 66 on host cloudcephosd1004.

Note that the osd was actually added and it's getting data in, but it did not clear the osd flags,

Thu, Aug 21, 1:00 PM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro created T402516: [cookbook,ceph] bootstrap_and_add ceph cookbook failed to add a new single osd 66 on host cloudcephosd1004.
Thu, Aug 21, 12:59 PM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

Manually zapping /dev/sdb on cloudcephosd1004, as the depool_and_destroy cookbook did not do it (see T402515: [cookbook,ceph] depool_and_destroy ceph cookbook failed to destroy a single osd):

root@cloudcephosd1004:~# ls -la /var/lib/ceph/osd/ceph-66/block 
lrwxrwxrwx 1 ceph ceph 93 Aug 21 03:30 /var/lib/ceph/osd/ceph-66/block -> /dev/ceph-62e49003-b3e0-4ecb-acbc-b82348164434/osd-block-06f40a8e-5b3c-4478-af57-739e819bddee
Thu, Aug 21, 12:56 PM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro created T402515: [cookbook,ceph] depool_and_destroy ceph cookbook failed to destroy a single osd.
Thu, Aug 21, 12:52 PM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro triaged T402499: [ceph] 2025-08-21 ceph issues bringing new osds up as High priority.
Thu, Aug 21, 12:16 PM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro updated the task description for T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.
Thu, Aug 21, 10:40 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

ceph-osd@69 came up ok too, only 66 is left down

Thu, Aug 21, 10:39 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

ceph-osd@68 came up ok

Thu, Aug 21, 10:36 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

ceph-osd@67 came up ok

Thu, Aug 21, 10:33 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

Hmm.... before crashing, it starts checking old peers:

Aug 21 09:55:36 cloudcephosd1004 ceph-osd[173450]: 2025-08-21T09:55:36.717+0000 7fcf28c85700  1 osd.66 pg_epoch: 72700590 pg[8.86b( v 72700589'93196338 (72700063'93192576,72700589'93196338] local-lis/les=72700588/72700589 n=4992 ec=25427536/271059 lis/c=72700588/72700588 les/c/f=72700589/72700589/0 sis=72700590) [66,161,319]/[66,319,117] r=0 lpr=72700590 pi=[72700588,72700590)/1 crt=72700589'93196338 lcod 0'0 mlcod 0'0 remapped mbc={}] start_peering_interval up [66,319] -> [66,161,319], acting [66,319,117] -> [66,319,117], acting_primary 66 -> 66, up_primary 66 -> 66, role 0 -> 0, features acting 4540138312169291775 upacting 4540138290693341183
Thu, Aug 21, 10:02 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

Still failing, the time it takes is Aug 21 09:54:26 cloudcephosd1004 systemd[1]: ceph-osd@66.service: Consumed 57.214s CPU time.

Thu, Aug 21, 9:54 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

I have tried extending the systemd unit start timeout to 5 min, see if that helps, though I think it's not getting to the 1m30s default :fingerscrossed:

Thu, Aug 21, 9:54 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

hmm... I wonder if it has some cache of the cloudcephosd1042 in the old v14 version, and when checking the check_prior_readable_down_osds it finds that it was that old version and crashes? (and at some point that cache gets cleared and then starts)

Thu, Aug 21, 9:49 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

first try to start ceph-osd@66 failed with the same error:

Aug 21 09:44:47 cloudcephosd1004 ceph-osd[168404]: ceph-osd: ./src/osd/PeeringState.cc:1255: bool PeeringState::check_prior_readable_down_osds(const OSDMapRef&): Assertion `HAVE_FEATURE(upacting_features, SERVER_OCTOPUS)' failed.
Thu, Aug 21, 9:46 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

Insisting a bit on starting ceph-osd@65 seemed to get it up and running, maybe there's some "start timeout"?

Thu, Aug 21, 9:43 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro updated the task description for T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.
Thu, Aug 21, 9:26 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

The loss of the pings towards 1004, and the current source for lost pings are cloudcephosd1043/44/47, they are not yet in the cluster so that should not be an issue.

Thu, Aug 21, 9:23 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added a comment to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.

Looking at the grafana dashboards, noticed that there's a relatively high loss of jumbo frames:

image.png (1×3 px, 1 MB)

Thu, Aug 21, 9:11 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro updated subscribers of T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.
Thu, Aug 21, 9:02 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro updated the task description for T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.
Thu, Aug 21, 9:02 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro added projects to T402499: [ceph] 2025-08-21 ceph issues bringing new osds up: Cloud-VPS, cloud-services-team (FY2025/26-Q1).
Thu, Aug 21, 8:58 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS
dcaro created T402499: [ceph] 2025-08-21 ceph issues bringing new osds up.
Thu, Aug 21, 8:57 AM · cloud-services-team (FY2025/26-Q1), Cloud-VPS

Wed, Aug 20

dcaro added a comment to T401875: [Build service] latest builder has old PHP.

It will depend on the buildpack itself, some buildpacks allow to install a wider range of versions for the language of choice, some would only support a small range, some also allow to try to install any available version (dotnet iirc) and fail only later if there are any incompatibilities.

Wed, Aug 20, 5:33 PM · cloud-services-team, Toolforge
dcaro added a comment to T358496: [toolforge,storage] Provide per-tool access to cloud-vps object storage.

I wasn't advocating we implement the protocol, that'd be unwise. I was just questioning the underlying assumption. Do our users really need to have direct access to s3 buckets or just a place to put stuffs? that is not in nfs

Wed, Aug 20, 3:40 PM · Toolforge, Patch-For-Review, cloud-services-team
dcaro triaged T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30 as High priority.
Wed, Aug 20, 2:50 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro triaged T402377: [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30 as High priority.
Wed, Aug 20, 2:50 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro triaged T401917: [build service] failure due to transient issue as Medium priority.
Wed, Aug 20, 2:50 PM · cloud-services-team, Toolforge
dcaro triaged T401075: Support installing packages from non-upstream repo and/or build pack for C/C++code as Medium priority.
Wed, Aug 20, 2:50 PM · cloud-services-team, Toolforge
dcaro triaged T401993: [components-api] Add a "description" field to the deployment as Medium priority.
Wed, Aug 20, 2:29 PM · Toolforge, cloud-services-team
dcaro added a comment to T358496: [toolforge,storage] Provide per-tool access to cloud-vps object storage.

What am I missing? why is this a bad approach? the upside is that all the problems of managing multiple auth tokens goes away. We just do things the same way we currently do it in toolforge.

Wed, Aug 20, 12:05 PM · Toolforge, Patch-For-Review, cloud-services-team
dcaro added a comment to T358496: [toolforge,storage] Provide per-tool access to cloud-vps object storage.

For some reason we don't seem to be discussing the possibility of making one toolforge object store and having a toolforge-storage to group and manage objects belonging to each tool. This seems more consistent with what a platform as a service is, less_flexibility+auto_management. If a tool needs access to it's own s3 bucket complete with keys and everything, aren't they better creating an openstack project, etc?

our users don't need to know anything about buckets or tokens or whatever. They just need to know they can store objects and retrieve them safely preferably via toolforge alone. Any other thing outside of toolforge seems out-of-scope for what toolforge is about.

Wed, Aug 20, 10:12 AM · Toolforge, Patch-For-Review, cloud-services-team
dcaro updated the task description for T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30.
Wed, Aug 20, 9:33 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro created T402378: [k8s,infra] Upgrade tools to Uwubernetes 1.30.
Wed, Aug 20, 9:32 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro updated the task description for T402377: [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30.
Wed, Aug 20, 9:30 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro created T402377: [k8s,infra] Upgrade toolsbeta to Uwubernetes 1.30.
Wed, Aug 20, 9:29 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro updated the task description for T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30).
Wed, Aug 20, 9:27 AM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team
dcaro updated the task description for T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30).
Wed, Aug 20, 8:18 AM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team

Tue, Aug 19

dcaro closed T401922: [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file as Resolved.
Tue, Aug 19, 9:37 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro closed T402194: toolforge jobs load does not update jobs when image is changed as Resolved.

This is deployed already, feel free to reopen if you still see the issue.

Tue, Aug 19, 9:36 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro updated the task description for T350687: [harbor] Move harbor data to object storage service.
Tue, Aug 19, 8:08 AM · Toolforge (Toolforge iteration 23), cloud-services-team (FY2025/26-Q1), User-Raymond_Ndibe, Goal
dcaro added a comment to T350687: [harbor] Move harbor data to object storage service.

Not all projects are currently replicated, on tools-harbor-2:

image.png (232×920 px, 42 KB)

Tue, Aug 19, 8:07 AM · Toolforge (Toolforge iteration 23), cloud-services-team (FY2025/26-Q1), User-Raymond_Ndibe, Goal
dcaro updated the task description for T350687: [harbor] Move harbor data to object storage service.
Tue, Aug 19, 8:06 AM · Toolforge (Toolforge iteration 23), cloud-services-team (FY2025/26-Q1), User-Raymond_Ndibe, Goal

Mon, Aug 18

dcaro updated the task description for T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30).
Mon, Aug 18, 5:07 PM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team
dcaro moved T402194: toolforge jobs load does not update jobs when image is changed from In Review to In Progress on the Toolforge (Toolforge iteration 23) board.
Mon, Aug 18, 5:05 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro triaged T401922: [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file as Medium priority.
Mon, Aug 18, 4:49 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro claimed T401922: [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file.
Mon, Aug 18, 4:48 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro triaged T402194: toolforge jobs load does not update jobs when image is changed as Medium priority.
Mon, Aug 18, 4:48 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro changed the status of T402194: toolforge jobs load does not update jobs when image is changed from Open to In Progress.
Mon, Aug 18, 4:43 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro added a comment to T401868: [components-api,beta] Config not updated from remote source.

This is weird, as both calls are to the same exact endpoint, so it's not likely a change in behavior between calls.

Mon, Aug 18, 4:29 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro added a comment to T401917: [build service] failure due to transient issue.

This might also be alleviated by having a caching proxy of sorts to avoid always hitting external services (that would also speed up some processes).

Mon, Aug 18, 9:35 AM · cloud-services-team, Toolforge
dcaro added a comment to T401917: [build service] failure due to transient issue.

Talking out loud a bit here :)

Mon, Aug 18, 9:34 AM · cloud-services-team, Toolforge
dcaro added a comment to T402032: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors.

I don't see anything on https://wikitech.wikimedia.org/wiki/Help:Toolforge/API about OAuth authentication.

Mon, Aug 18, 9:04 AM · Patch-For-Review, cloud-services-team, Toolforge

Fri, Aug 15

dcaro added a comment to T402032: https://api.svc.toolforge.org endpoint given in OpenAPI spec returns 403 forbidden errors.

Yep, that is the external endpoint, for which certificate -based Auth is not allowed, the other is internal, for which it works, if you were using the token aouth, or getting a non-authed endpoint to t would work. We should probably add that to the spec though, if it's not there (haven't checked).

Fri, Aug 15, 4:47 PM · Patch-For-Review, cloud-services-team, Toolforge
dcaro created T401994: [components-api] support port protocol in config.
Fri, Aug 15, 9:51 AM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro created T401993: [components-api] Add a "description" field to the deployment.
Fri, Aug 15, 9:23 AM · Toolforge, cloud-services-team
dcaro merged T401963: !log automated deployments so that a tool’s SAL records system changes into T393169: [components-api] optionally log deployments to SAL automatically.
Fri, Aug 15, 9:18 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro merged task T401963: !log automated deployments so that a tool’s SAL records system changes into T393169: [components-api] optionally log deployments to SAL automatically.
Fri, Aug 15, 9:18 AM · Toolforge, cloud-services-team

Thu, Aug 14

dcaro moved T401922: [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file from Backlog to Toolforge iteration 23 on the Toolforge board.
Thu, Aug 14, 1:56 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro created T401922: [jobs-api,jobs-cli] when creating a filelog based job, filelog-stderr gets populated with *.out file.
Thu, Aug 14, 1:56 PM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro added a comment to T401830: [loki] persist build logs for each tool on their loki namespace.

@dcaro Given a specific pod in the image-build namespace, is there some label that is guaranteed to link it to a particular tool-$NAME namespace?

Thu, Aug 14, 1:45 PM · cloud-services-team, Toolforge
dcaro closed T394787: [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support, a subtask of T362869: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30), as Resolved.
Thu, Aug 14, 9:45 AM · Toolforge (Toolforge iteration 23), Patch-For-Review, cloud-services-team
dcaro closed T394787: [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support as Resolved.
Thu, Aug 14, 9:45 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro moved T401868: [components-api,beta] Config not updated from remote source from Next Up to In Progress on the Toolforge (Toolforge iteration 23) board.
Thu, Aug 14, 9:45 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro changed the status of T401868: [components-api,beta] Config not updated from remote source, a subtask of T393564: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features, from Open to In Progress.
Thu, Aug 14, 9:44 AM · Toolforge (Toolforge iteration 21), cloud-services-team (FY2024/2025-Q3-Q4), Goal, User-dcaro, Cloud-Services-Origin-Team, Cloud-Services-Worktype-Project, Epic
dcaro changed the status of T401868: [components-api,beta] Config not updated from remote source from Open to In Progress.
Thu, Aug 14, 9:44 AM · Toolforge (Toolforge iteration 23), cloud-services-team
dcaro created T401894: [builds-api] Allow queuing builds.
Thu, Aug 14, 9:43 AM · Toolforge (Toolforge iteration 23)
dcaro created T401893: [components-api] Allow reusing another component build.
Thu, Aug 14, 9:42 AM · Patch-For-Review, Toolforge (Toolforge iteration 23)
dcaro added a comment to T401851: [components-api,beta] Image should only be build once when re-used in components.

Two ideas come right away to me:

Thu, Aug 14, 9:31 AM · Toolforge (Toolforge iteration 23)
dcaro added a comment to T314729: [jobs-cli,components-api] Provide YAML schema file for toolforge-jobs definition files.

fyi. The config schema for tool configuration was created in T397724: [components-api] Provide a standalone version of tool config schema

Thu, Aug 14, 9:20 AM · cloud-services-team, Toolforge, User-Raymond_Ndibe
dcaro moved T398285: Decision request - Reuse toolforge user tools central logging for toolforge infrastructure logging from Next Up to Done on the Toolforge (Toolforge iteration 23) board.
Thu, Aug 14, 9:16 AM · Toolforge (Toolforge iteration 23), cloud-services-team, Cloud Services Proposals
dcaro added a comment to T401851: [components-api,beta] Image should only be build once when re-used in components.

I'm actually not sure what would happen if I set different refs for the same repo here... I assume the last build wins and updates the latest tag (it doesn't matter in my use case, but that could be unexpectedly interesting).

Thu, Aug 14, 9:12 AM · Toolforge (Toolforge iteration 23)
dcaro closed T401846: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher as Resolved.
Thu, Aug 14, 9:05 AM · Toolforge (Toolforge iteration 23), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, cloud-services-team (FY2025/26-Q1), User-dcaro
dcaro added a comment to T401880: Ensure unique machine-id across Cloud VPS VMs.

On the NFS side, I checked the dbus ids (/var/lib/dbus/machine-id) and are all different, and the nfs-client ids are empty, so it should be using the default ("Linux NFS " + hostname), if we need to change that, here it explains a bit: https://docs.kernel.org/filesystems/nfs/client-identifier.html#selecting-an-appropriate-client-identifier

Thu, Aug 14, 9:05 AM · Cloud-VPS, cloud-services-team

Wed, Aug 13

dcaro created T401850: [jobs-api,logs-api] When listing logs without --follow, the logs are sorted first by pod, then by timestamp.
Wed, Aug 13, 5:21 PM · cloud-services-team, Toolforge
dcaro updated the task description for T401846: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher.
Wed, Aug 13, 4:19 PM · Toolforge (Toolforge iteration 23), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, cloud-services-team (FY2025/26-Q1), User-dcaro
dcaro triaged T401846: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher as High priority.
Wed, Aug 13, 4:18 PM · Toolforge (Toolforge iteration 23), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, cloud-services-team (FY2025/26-Q1), User-dcaro
dcaro created T401846: [jobs-api] buildservice-based jobs stopped prefixing the command with launcher.
Wed, Aug 13, 4:18 PM · Toolforge (Toolforge iteration 23), Cloud-Services-Worktype-Unplanned, Cloud-Services-Origin-User, cloud-services-team (FY2025/26-Q1), User-dcaro
dcaro added a comment to T401172: [jobs-api] make job status an enum, with clearly defined states.

unsure what you meant by system logs in this context. Is the plan to do this in logs-api instead? if that's the case I'd rather begin with that instead of doing it on jobs-api and having everything discarded later.

Wed, Aug 13, 2:16 PM · Toolforge (Toolforge iteration 23), User-Raymond_Ndibe, cloud-services-team
dcaro created T401830: [loki] persist build logs for each tool on their loki namespace.
Wed, Aug 13, 1:55 PM · cloud-services-team, Toolforge
dcaro added a comment to T401172: [jobs-api] make job status an enum, with clearly defined states.

@Raymond_Ndibe the list of status here is just a proposal, to be discussed/refined, so that's the first part of the task.

Wed, Aug 13, 11:49 AM · Toolforge (Toolforge iteration 23), User-Raymond_Ndibe, cloud-services-team