Prevent panic on zero DNS negative TTL during backoff #3888

s-starostin · 2025-04-29T07:09:50Z

This change addresses a panic that can occur in the control plane client's DNS resolution backoff logic within linkerd/app/core/src/control.rs.

Problem:

When resolving the control plane address, if a dns::ResolveError occurs and provides a negative TTL via negative_ttl(), this TTL is used to schedule the next resolution attempt using tokio::time::interval(ttl).

However, if the negative_ttl() returns Some(Duration::ZERO), passing a zero duration to time::interval causes a panic:

thread 'main' panicked at linkerd/app/core/src/control.rs:87:49:
`period` must be non-zero.
   0:     0x563f974dd323 - <unknown>
   1:     0x563f9680ed2c - <unknown>
   2:     0x563f974b030d - <unknown>
   3:     0x563f974de99e - <unknown>
   4:     0x563f974de564 - <unknown>
   5:     0x563f974df5c5 - <unknown>
   6:     0x563f974eecc9 - <unknown>
   7:     0x563f974eec96 - <unknown>
   8:     0x563f96737dfa - <unknown>
   9:     0x563f974f116c - <unknown>
  10:     0x563f97233597 - <unknown>
  11:     0x563f9715cc1a - <unknown>
  12:     0x563f9727a651 - <unknown>
  13:     0x563f971a91af - <unknown>
  14:     0x563f96c5a01c - <unknown>
  15:     0x563f967902b3 - <unknown>
  16:     0x563f96c56ba5 - <unknown>
  17:     0x7fc1ccd2624a - <unknown>
  18:     0x7fc1ccd26305 - __libc_start_main
  19:     0x563f9674f711 - <unknown>
  20:                0x0 - <unknown>

This panic was observed in production environments, particularly during restarts or issues with the linkerd-destination service. When the proxy sidecar panicked due to this error, it resulted in service unavailability for meshed applications, requiring manual restarts of deployments to recover connectivity.

Solution:

This commit introduces a minimum backoff duration (min_duration = 100ms) for cases where a negative TTL is provided by the DNS resolver. It uses std::cmp::max(ttl, min_duration) to ensure that the duration passed to time::interval is never zero.

This prevents the panic and ensures the proxy gracefully handles zero TTLs by applying a minimal delay before the next resolution attempt, improving resilience during control plane discovery issues.

Signed-off-by: StarostinSY <sergejj.starostin@vitech.team>

cratelyn · 2025-04-29T15:19:21Z

hi @s-starostin, can you confirm what version of the proxy you are currently using?

the line number included in the panic message above, thread 'main' panicked at linkerd/app/core/src/control.rs:87:49, no longer points to a line of code that could panic, as of today:

linkerd2-proxy/linkerd/app/core/src/control.rs

Line 87 in f657cea

impl Metrics {

i believe this issue may have already been fixed in #3807, which also added a lower-bound TTL when refreshing DNS records, further down in the linkerd-dns and linkerd-dns-resolve components.

s-starostin · 2025-04-29T23:29:01Z

Hello,
Ah, yes - we're currently on v2.214.0.
I noticed that block still looked unchanged and wanted to suggest a fix, but if it's already been addressed, then never mind.

s-starostin requested a review from a team as a code owner April 29, 2025 07:09

prevent panic on zero DNS negative TTL

f7f06ca

Signed-off-by: StarostinSY <sergejj.starostin@vitech.team>

s-starostin force-pushed the fix-prevent-interval-panic-zero-ttl branch from 5793629 to f7f06ca Compare April 29, 2025 07:12

s-starostin closed this Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent panic on zero DNS negative TTL during backoff #3888

Prevent panic on zero DNS negative TTL during backoff #3888

Uh oh!

s-starostin commented Apr 29, 2025

Uh oh!

cratelyn commented Apr 29, 2025

Uh oh!

s-starostin commented Apr 29, 2025

Uh oh!

Uh oh!

Prevent panic on zero DNS negative TTL during backoff #3888

Prevent panic on zero DNS negative TTL during backoff #3888

Uh oh!

Conversation

s-starostin commented Apr 29, 2025

Uh oh!

cratelyn commented Apr 29, 2025

Uh oh!

s-starostin commented Apr 29, 2025

Uh oh!

Uh oh!