Skip to content

[grid] Improve readTimeout in handle session between Router and Node #16163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 19, 2025

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Aug 12, 2025

User description

🔗 Related Issues

💥 What does this PR do?

HttpClient cache issue described in #12978.
However, the HttpClient created still uses the default ClientConfig
In order to improve the connection between Router and Node for routing commands, Router will fetch the session timeout that Node exposed to set readTimeout in ClientConfig before creating HttpClient (if no session timeout found, silent fallback to default of ClientConfig).

FINE logs as below

16:25:36.597 DEBUG [JdkHttpClient.execute0] - Executing request: (GET) /se/grid/node/status

16:25:36.647 DEBUG [JdkHttpClient.execute0] - Ending request (GET) /se/grid/node/status in 140ms

16:25:36.649 DEBUG [HandleSession.fetchNodeSessionTimeout] - Fetched session timeout from node status (read timeout: 300 seconds) for http://172.18.0.12:5555

16:25:36.661 DEBUG [JdkHttpClient.execute0] - Ending request (GET) /se/grid/node/status in 64ms

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

  • Cleanup (formatting, renaming)
  • Bug fix (backwards compatible)
  • New feature (non-breaking change which adds functionality and tests!)
  • Breaking change (fix or feature that would cause existing functionality to change)

PR Type

Enhancement


Description

  • Improve Router-Node connection timeout handling

  • Fetch session timeout from Node status endpoint

  • Configure HttpClient readTimeout dynamically

  • Add error handling for timeout configuration


Diagram Walkthrough

flowchart LR
  Router["Router"] -- "GET /status" --> Node["Node"]
  Node -- "NodeStatus with sessionTimeout" --> Router
  Router -- "Configure HttpClient with timeout" --> HttpClient["HttpClient"]
  HttpClient -- "Execute session commands" --> Node
Loading

File Walkthrough

Relevant files
Enhancement
HandleSession.java
Dynamic timeout configuration for Router-Node communication

java/src/org/openqa/selenium/grid/router/HandleSession.java

  • Add method to fetch session timeout from Node status endpoint
  • Replace hardcoded ClientConfig with dynamic timeout configuration
  • Add JSON parsing for NodeStatus response
  • Include error handling with fallback to default timeout
+51/-2   

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1234 - Partially compliant

Compliant requirements:

  • None

Non-compliant requirements:

  • Investigate/regress behavior where click() on a link with javascript in href no longer triggers JS in Selenium 2.48.x compared to 2.47.1 on Firefox.
  • Provide a fix or mitigation ensuring click() triggers href javascript as before.
  • Validate specifically with Firefox.

Requires further human verification:

  • N/A

5678 - Partially compliant

Compliant requirements:

  • None

Non-compliant requirements:

  • Diagnose intermittent "ConnectFailure (Connection refused)" when instantiating multiple ChromeDriver instances on Ubuntu.
  • Ensure stable ChromeDriver instantiation without connection refused errors after the first instance.
  • Potentially adjust driver/client connection handling or timeouts if relevant.

Requires further human verification:

  • N/A
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

If the Node /status payload structure changes (e.g., "node" moved or absent), the parser may not set a timeout and silently fall back, potentially masking configuration issues. Consider stricter validation and logging at INFO/WARNING when "node" or timeout is missing.

private ClientConfig fetchNodeSessionTimeout(URI uri) {
  ClientConfig config = ClientConfig.defaultConfig().baseUri(uri).withRetries();
  Duration sessionTimeout = config.readTimeout();
  try (HttpClient httpClient = httpClientFactory.createClient(config)) {
    HttpRequest statusRequest = new HttpRequest(GET, "/status");
    HttpResponse res = httpClient.execute(statusRequest);
    Reader reader = reader(res);
    Json JSON = new Json();
    JsonInput in = JSON.newInput(reader);
    in.beginObject();
    // Skip everything until we find "value"
    while (in.hasNext()) {
      if ("value".equals(in.nextName())) {
        in.beginObject();
        while (in.hasNext()) {
          if ("node".equals(in.nextName())) {
            NodeStatus nodeStatus = in.read(NodeStatus.class);
            sessionTimeout = nodeStatus.getSessionTimeout();
            LOG.fine(
                "Fetched session timeout from node status (read timeout: "
                    + sessionTimeout.toSeconds()
                    + " seconds) for "
                    + uri);
          } else {
            in.skipValue();
          }
        }
        in.endObject();
      } else {
        in.skipValue();
      }
    }
  } catch (Exception e) {
    LOG.fine(
        "Use default from ClientConfig (read timeout: "
            + config.readTimeout().toSeconds()
            + " seconds) for "
            + uri);
  }
  config = config.readTimeout(sessionTimeout);
  return config;
Resource Management

The temporary HttpClient created in fetchNodeSessionTimeout uses the same base URI and retries; ensure retries and read timeout values here cannot deadlock or significantly delay session acquisition. Consider a shorter dedicated timeout for the status call.

private ClientConfig fetchNodeSessionTimeout(URI uri) {
  ClientConfig config = ClientConfig.defaultConfig().baseUri(uri).withRetries();
  Duration sessionTimeout = config.readTimeout();
  try (HttpClient httpClient = httpClientFactory.createClient(config)) {
    HttpRequest statusRequest = new HttpRequest(GET, "/status");
    HttpResponse res = httpClient.execute(statusRequest);
    Reader reader = reader(res);
    Json JSON = new Json();
    JsonInput in = JSON.newInput(reader);
    in.beginObject();
    // Skip everything until we find "value"
    while (in.hasNext()) {
      if ("value".equals(in.nextName())) {
        in.beginObject();
        while (in.hasNext()) {
          if ("node".equals(in.nextName())) {
            NodeStatus nodeStatus = in.read(NodeStatus.class);
            sessionTimeout = nodeStatus.getSessionTimeout();
            LOG.fine(
                "Fetched session timeout from node status (read timeout: "
                    + sessionTimeout.toSeconds()
                    + " seconds) for "
                    + uri);
          } else {
            in.skipValue();
          }
        }
        in.endObject();
      } else {
        in.skipValue();
      }
    }
  } catch (Exception e) {
    LOG.fine(
        "Use default from ClientConfig (read timeout: "
            + config.readTimeout().toSeconds()
            + " seconds) for "
            + uri);
  }
  config = config.readTimeout(sessionTimeout);
  return config;
Logging Level

Both success and failure paths log at FINE; operational visibility might benefit from INFO when falling back to default read timeout to aid diagnosing timeout issues in production.

} catch (Exception e) {
  LOG.fine(
      "Use default from ClientConfig (read timeout: "
          + config.readTimeout().toSeconds()
          + " seconds) for "
          + uri);
}

@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings labels Aug 12, 2025
Copy link
Contributor

qodo-merge-pro bot commented Aug 12, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Bound status probe timeout
Suggestion Impact:The commit refactored the status fetching logic and removed retries from the initial ClientConfig, but more importantly, it changed how the node status is retrieved and then set the read timeout based on the node's session timeout. While it did not explicitly set a fixed short timeout (e.g., 5s) for the status request, it separated the initial config (without retries) to fetch status and then applied the timeout afterward. This aligns with the suggestion’s intent to prevent long hangs during the status probe by not using the main configured timeout/retries for the probe.

code diff:

@@ -253,44 +251,17 @@
   }
 
   private ClientConfig fetchNodeSessionTimeout(URI uri) {
-    ClientConfig config = ClientConfig.defaultConfig().baseUri(uri).withRetries();
+    ClientConfig config = ClientConfig.defaultConfig().baseUri(uri);
     Duration sessionTimeout = config.readTimeout();
     try (HttpClient httpClient = httpClientFactory.createClient(config)) {
-      HttpRequest statusRequest = new HttpRequest(GET, "/status");
+      HttpRequest statusRequest = new HttpRequest(GET, "/se/grid/node/status");
       HttpResponse res = httpClient.execute(statusRequest);
-      Reader reader = reader(res);
-      Json JSON = new Json();
-      JsonInput in = JSON.newInput(reader);
-      in.beginObject();
-      // Skip everything until we find "value"
-      while (in.hasNext()) {
-        if ("value".equals(in.nextName())) {
-          in.beginObject();
-          while (in.hasNext()) {
-            if ("node".equals(in.nextName())) {
-              NodeStatus nodeStatus = in.read(NodeStatus.class);
-              sessionTimeout = nodeStatus.getSessionTimeout();
-              LOG.fine(
-                  "Fetched session timeout from node status (read timeout: "
-                      + sessionTimeout.toSeconds()
-                      + " seconds) for "
-                      + uri);
-            } else {
-              in.skipValue();
-            }
-          }
-          in.endObject();
-        } else {
-          in.skipValue();
-        }
+      NodeStatus nodeStatus = Values.get(res, NodeStatus.class);
+      if (nodeStatus != null) {
+        sessionTimeout = nodeStatus.getSessionTimeout();
       }
-    } catch (Exception e) {
-      LOG.fine(
-          "Use default from ClientConfig (read timeout: "
-              + config.readTimeout().toSeconds()
-              + " seconds) for "
-              + uri);
-    }
+    }
+    LOG.fine("Set read timeout: " + sessionTimeout.toSeconds() + " seconds for " + uri);
     config = config.readTimeout(sessionTimeout);
     return config;

Add a short read timeout specifically for the /status probe to avoid hanging on
an unresponsive node and blocking session creation. Use a bounded timeout (e.g.,
a few seconds) independent from the main client timeout.

java/src/org/openqa/selenium/grid/router/HandleSession.java [258-293]

-try (HttpClient httpClient = httpClientFactory.createClient(config)) {
+try (HttpClient httpClient = httpClientFactory.createClient(
+    ClientConfig.defaultConfig()
+        .baseUri(uri)
+        .readTimeout(Duration.ofSeconds(5))
+        .withRetries())) {
   HttpRequest statusRequest = new HttpRequest(GET, "/status");
   HttpResponse res = httpClient.execute(statusRequest);
-  Reader reader = reader(res);
-  Json JSON = new Json();
-  JsonInput in = JSON.newInput(reader);
-  in.beginObject();
-  // Skip everything until we find "value"
-  while (in.hasNext()) {
-    if ("value".equals(in.nextName())) {
+  if (res.getStatus() == 200) {
+    try (Reader rdr = reader(res); JsonInput in = new Json().newInput(rdr)) {
       in.beginObject();
       while (in.hasNext()) {
-        if ("node".equals(in.nextName())) {
-          NodeStatus nodeStatus = in.read(NodeStatus.class);
-          sessionTimeout = nodeStatus.getSessionTimeout();
-          ...
+        String name = in.nextName();
+        if ("value".equals(name)) {
+          in.beginObject();
+          while (in.hasNext()) {
+            String inner = in.nextName();
+            if ("node".equals(inner)) {
+              NodeStatus nodeStatus = in.read(NodeStatus.class);
+              Duration parsed = nodeStatus.getSessionTimeout();
+              if (parsed != null && !parsed.isNegative() && !parsed.isZero()) {
+                sessionTimeout = parsed;
+              }
+            } else {
+              in.skipValue();
+            }
+          }
+          in.endObject();
         } else {
           in.skipValue();
         }
       }
-      in.endObject();
-    } else {
-      in.skipValue();
     }
   }
 } catch (Exception e) {
-  LOG.fine(
-      "Use default from ClientConfig (read timeout: "
-          + config.readTimeout().toSeconds()
-          + " seconds) for "
-          + uri);
+  LOG.fine("Use default from ClientConfig (read timeout: "
+      + config.readTimeout().toSeconds() + " seconds) for " + uri);
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion addresses a potential performance issue where fetching the node status could hang indefinitely if the node is unresponsive, by proposing a separate, short timeout for this specific request, which is a critical improvement for stability.

Medium
Validate parsed session timeout

Guard against null or non-positive timeouts from the parsed status to avoid
setting an invalid read timeout. Fallback to the existing configuration when the
parsed value is missing or zero/negative.

java/src/org/openqa/selenium/grid/router/HandleSession.java [270-280]

 if ("node".equals(in.nextName())) {
   NodeStatus nodeStatus = in.read(NodeStatus.class);
-  sessionTimeout = nodeStatus.getSessionTimeout();
-  LOG.fine(
-      "Fetched session timeout from node status (read timeout: "
-          + sessionTimeout.toSeconds()
-          + " seconds) for "
-          + uri);
+  Duration parsed = nodeStatus.getSessionTimeout();
+  if (parsed != null && !parsed.isNegative() && !parsed.isZero()) {
+    sessionTimeout = parsed;
+    LOG.fine("Fetched session timeout from node status (read timeout: "
+        + sessionTimeout.toSeconds() + " seconds) for " + uri);
+  } else {
+    LOG.fine("Invalid or missing session timeout from node status for " + uri + ", using default.");
+  }
 } else {
   in.skipValue();
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the session timeout received from the node could be invalid (null, zero, or negative), and adds necessary validation to prevent setting an invalid timeout on the HttpClient.

Medium
Possible issue
Close parser and validate response
Suggestion Impact:The commit removed manual Reader/JsonInput parsing and used Values.get(res, NodeStatus.class), which implicitly avoids the resource leak and parsing complexity. It also changed the endpoint. While it did not explicitly add status/content-type checks, switching to Values.get addresses safe parsing and resource handling.

code diff:

@@ -253,44 +251,17 @@
   }
 
   private ClientConfig fetchNodeSessionTimeout(URI uri) {
-    ClientConfig config = ClientConfig.defaultConfig().baseUri(uri).withRetries();
+    ClientConfig config = ClientConfig.defaultConfig().baseUri(uri);
     Duration sessionTimeout = config.readTimeout();
     try (HttpClient httpClient = httpClientFactory.createClient(config)) {
-      HttpRequest statusRequest = new HttpRequest(GET, "/status");
+      HttpRequest statusRequest = new HttpRequest(GET, "/se/grid/node/status");
       HttpResponse res = httpClient.execute(statusRequest);
-      Reader reader = reader(res);
-      Json JSON = new Json();
-      JsonInput in = JSON.newInput(reader);
-      in.beginObject();
-      // Skip everything until we find "value"
-      while (in.hasNext()) {
-        if ("value".equals(in.nextName())) {
-          in.beginObject();
-          while (in.hasNext()) {
-            if ("node".equals(in.nextName())) {
-              NodeStatus nodeStatus = in.read(NodeStatus.class);
-              sessionTimeout = nodeStatus.getSessionTimeout();
-              LOG.fine(
-                  "Fetched session timeout from node status (read timeout: "
-                      + sessionTimeout.toSeconds()
-                      + " seconds) for "
-                      + uri);
-            } else {
-              in.skipValue();
-            }
-          }
-          in.endObject();
-        } else {
-          in.skipValue();
-        }
+      NodeStatus nodeStatus = Values.get(res, NodeStatus.class);
+      if (nodeStatus != null) {
+        sessionTimeout = nodeStatus.getSessionTimeout();
       }
-    } catch (Exception e) {
-      LOG.fine(
-          "Use default from ClientConfig (read timeout: "
-              + config.readTimeout().toSeconds()
-              + " seconds) for "
-              + uri);
-    }
+    }
+    LOG.fine("Set read timeout: " + sessionTimeout.toSeconds() + " seconds for " + uri);
     config = config.readTimeout(sessionTimeout);
     return config;

Ensure the JsonInput is always closed to avoid resource leaks. Additionally,
verify the HTTP response status and content type before parsing to prevent
parsing errors on non-200 responses or non-JSON bodies.

java/src/org/openqa/selenium/grid/router/HandleSession.java [255-296]

 private ClientConfig fetchNodeSessionTimeout(URI uri) {
   ClientConfig config = ClientConfig.defaultConfig().baseUri(uri).withRetries();
   Duration sessionTimeout = config.readTimeout();
   try (HttpClient httpClient = httpClientFactory.createClient(config)) {
     HttpRequest statusRequest = new HttpRequest(GET, "/status");
     HttpResponse res = httpClient.execute(statusRequest);
-    Reader reader = reader(res);
-    Json JSON = new Json();
-    JsonInput in = JSON.newInput(reader);
-    in.beginObject();
-    // Skip everything until we find "value"
-    while (in.hasNext()) {
-      if ("value".equals(in.nextName())) {
+    if (res.getStatus() == 200 && res.getHeader("Content-Type") != null && res.getHeader("Content-Type").contains("application/json")) {
+      try (Reader rdr = reader(res); JsonInput in = new Json().newInput(rdr)) {
         in.beginObject();
         while (in.hasNext()) {
-          if ("node".equals(in.nextName())) {
-            NodeStatus nodeStatus = in.read(NodeStatus.class);
-            sessionTimeout = nodeStatus.getSessionTimeout();
-            LOG.fine(
-                "Fetched session timeout from node status (read timeout: "
-                    + sessionTimeout.toSeconds()
-                    + " seconds) for "
-                    + uri);
+          String name = in.nextName();
+          if ("value".equals(name)) {
+            in.beginObject();
+            while (in.hasNext()) {
+              String inner = in.nextName();
+              if ("node".equals(inner)) {
+                NodeStatus nodeStatus = in.read(NodeStatus.class);
+                sessionTimeout = nodeStatus.getSessionTimeout();
+                LOG.fine("Fetched session timeout from node status (read timeout: "
+                    + sessionTimeout.toSeconds() + " seconds) for " + uri);
+              } else {
+                in.skipValue();
+              }
+            }
+            in.endObject();
           } else {
             in.skipValue();
           }
         }
-        in.endObject();
-      } else {
-        in.skipValue();
       }
+    } else {
+      LOG.fine("Non-OK or non-JSON status response from " + uri + " for /status, using default read timeout.");
     }
   } catch (Exception e) {
-    LOG.fine(
-        "Use default from ClientConfig (read timeout: "
-            + config.readTimeout().toSeconds()
-            + " seconds) for "
-            + uri);
+    LOG.fine("Use default from ClientConfig (read timeout: "
+        + config.readTimeout().toSeconds() + " seconds) for " + uri);
   }
-  config = config.readTimeout(sessionTimeout);
-  return config;
+  return config.readTimeout(sessionTimeout);
 }

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential resource leak by not closing the JsonInput and Reader, and improves robustness by adding a check for the HTTP response status and content type before parsing.

Medium
  • Update

@VietND96
Copy link
Member Author

@joerg1985, ignore the change to bring back Cache builder in session handler. Here I just add to fetch session timeout and set to ClientConfig. Can you review it makes sense?
@diemol, Can you also review this, it would be better if we can deliver this release.

@joerg1985
Copy link
Member

@VietND96 i did not look too deep into the code, but my first approach would be returning the timeout (and other config) to the response the node does receive when registering it self to the grid. In this case we should have no need for fallbacks or additional execption handling.

What happens in case someone did set a different timeout to the node via the cli? Do we need to handle this? I am not sure ...

@VietND96
Copy link
Member Author

@joerg1985, actually Distributor also gets the node session timeout to set session timeout in GridModel.
There is an implement in NodeStatus itself, it always has session timeout (set by user via CLI option or default 300s). So I can remove the additional exception handling here.

What happens in case someone did set a different timeout to the node via the cli

As I mentioned above on NodeStatus, timeout would be different between Node via its config. By implementing this, we will have a unified timeout between sessions, the connection of the router to a particular Node.

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
@VietND96
Copy link
Member Author

@joerg1985 I fixed one of your review in another PR

We should call /se/grid/node/status to get only the status this will allow to use org.openqa.selenium.grid.web.Values.get(...); to read the NodeStatus response.

The usage of org.openqa.selenium.grid.web.Values.get(...); require response a Map start with key value. However, /se/grid/node/status does not respond the same contract, so I updated NodeStatus under key value. I searched around, and it looks like nowhere uses /se/grid/node/status, so it would be fine for backward compatibility.

@VietND96 VietND96 requested review from joerg1985 and diemol August 17, 2025 16:27
@VietND96 VietND96 added this to the Selenium 4.36 milestone Aug 18, 2025
Copy link
Member

@joerg1985 joerg1985 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the try catch is not needed, but this should not hurt.

ps: The test i mentioned we should add is allready in place NettyServerTest.doesInterruptPending.

@VietND96
Copy link
Member Author

@joerg1985 after checking again, I removed the catch block, and added null check since Values.get() might return null when not 200 or error response.

@VietND96 VietND96 merged commit 55f02a9 into trunk Aug 19, 2025
31 checks passed
@VietND96 VietND96 deleted the router-cache-2 branch August 19, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-grid Everything grid and server related C-java Java Bindings Review effort 2/5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants