Skip to content

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Aug 21, 2025

The flake here had two causes:

  1. related to usage of time.Now() in MustWaitForProvisionersAvailable and
  2. the fact that UpdateProvisionerLastSeenAt can not use a time that is further in the past than the current LastSeenAt time

Previously the test here was calling coderdtest.MustWaitForProvisionersAvailable which was using time.Now rather than the next tick time like the real hasProvisionersAvailable function does. Additionally, when using UpdateProvisionerLastSeenAt the underlying db query enforces that the time we're trying to set LastSeenAt to cannot be older than the current value.

I was able to reliably reproduce the flake by executing both the UpdateProvisionerLastSeenAt call and tickCh <- next in their own goroutines, the former with a small sleep to reliably ensure we'd trigger the autobuild before we set the LastSeenAt time. That's when I also noticed that coderdtest.MustWaitForProvisionersAvailable was using time.Now instead of the tick time. When I updated that function to take in a tick time + added a 2nd call to UpdateProvisionerLastSeenAt to set an original non-stale time, we could then never get the test to pass because the later call to set the stale time would not actually modify LastSeenAt. On top of that, calling the provisioner daemons closer in the middle of the function doesn't really do anything of value in this test.

The fix for the flake is to keep the go routines, ensuring there would be a flake if there was not a relevant fix, but to include the fix which is to ensure that we explicitly wait for the provisioner to be stale before passing the time to tickCh.

related to usage of time.Now() in MustWaitForProvisionerAvailable and
the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Comment on lines 1737 to 1738
require.NoError(t, err, "Error getting provisioner for workspace")
require.Equal(t, p.LastSeenAt.Time.UnixNano(), staleTime.UnixNano())
Copy link
Member

@ethanndickson ethanndickson Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be mindful of the fact you can't call require methods in a goroutine other than the main test goroutine. They all call FailNow, and from the documentation:

FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test. Calling FailNow does not stop those other goroutines.

https://pkg.go.dev/testing#F.FailNow

You can call assert functions, however!

cstyan added 3 commits August 22, 2025 17:25
Signed-off-by: Callum Styan <callumstyan@gmail.com>
provisioner

Signed-off-by: Callum Styan <callumstyan@gmail.com>
transaction does not go through immediately

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Comment on lines +1660 to +1680
require.Eventually(t, func() bool {
// Use the same logic as hasValidProvisioner but expect false
provisionerDaemons, err := db.GetProvisionerDaemonsByOrganization(ctx, database.GetProvisionerDaemonsByOrganizationParams{
OrganizationID: workspace.OrganizationID,
WantTags: tags,
})
if err != nil {
return false
}

// Check if NO provisioners are active (all are stale or gone)
for _, pd := range provisionerDaemons {
if pd.LastSeenAt.Valid {
age := checkTime.Sub(pd.LastSeenAt.Time)
if age <= provisionerdserver.StaleInterval {
return false // Found an active provisioner, keep waiting
}
}
}
return true // No active provisioners found
}, testutil.WaitMedium, testutil.IntervalFast)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest using methods in testutil/chan.go and passing in a parent context in the method signature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants