Skip to content

Conversation

varun-edachali-dbx
Copy link
Contributor

@varun-edachali-dbx varun-edachali-dbx commented Jun 27, 2025

What type of PR is this?

  • Feature

Description

Enhance the current, simplified SEA Http Client by introducing retries. Most of the auth related functionality is pulled from the Thrift backend. All of the retry-related tests are expected to pass since they represent user-facing functionality (in terms of exceptions to be thrown in particular cases).

We continue using a low-level connection pool instead of using the higher level requests module because the latter does not have first-class support for passing a password to a client certificate, which we want to support. There are workarounds involving subclassing the HttpAdapter and invoking init_poolmanager, but the docstring of the method states that it should not be invoked from client code, so we do not consider it a long term solution.

The exceptions thrown by the SEA and Thrift backends are not completely the same (yet), because this PR passes None to the context and session id of the RequestError object. This misparity will be logged and addressed later. Here, the context is expected to have the following keys as per the existing class definition:

class RequestError(OperationalError):
    """Thrown if there was a error during request to the server.
    Its context will have the following keys:
    "method": The RPC method name that failed
    "session-id": The Thrift session guid
    "query-id": The Thrift query guid (if available)
    "http-code": HTTP response code to RPC request (if available)
    "error-message": Error message from the HTTP headers (if available)
    "original-exception": The Python level original exception
    "no-retry-reason": Why the request wasn't retried (if available)
    "bounded-retry-delay": The maximum amount of time an error will be retried before giving up
    "attempt": current retry number / maximum number of retries
    "elapsed-seconds": time that has elapsed since first attempting the RPC request
    """

    pass

How?

In _make_request(), the client sets up the retry context before making the HTTP request:

  • Determines the operation type (EXECUTE_STATEMENT, GET_OPERATION_STATUS, etc.) based on the API path and HTTP method
  • Sets the command type on the DatabricksRetryPolicy - this influences retry decisions (e.g., EXECUTE_STATEMENT is only retried for 429/503 status codes)
  • Starts the retry timer to track total request duration across all retry attempts
  • Makes the HTTP request through urllib3 with the retry policy attached

The retry policy then handles all retry logic automatically - determining when to retry based on status codes and command types, calculating backoff delays, and enforcing attempt/duration limits. If retries are exhausted, urllib3 raises a MaxRetryError which gets caught and wrapped in a RequestError.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

Design Doc

varun-edachali-dbx and others added 30 commits June 18, 2025 05:46
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
* [squash from exec-sea] bring over execution phase changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess test

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add docstring

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remvoe exec func in sea backend

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess files

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess models

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess sea backend tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* cleanup

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* re-introduce get_schema_desc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove SeaResultSet

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* clean imports and attributes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* pass CommandId to ExecResp

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove changes in types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add back essential types (ExecResponse, from_sea_state)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix fetch types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* excess imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff by maintaining logs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix int test types

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* [squashed from exec-sea] init execution func

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove irrelevant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove ResultSetFilter functionality

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove more irrelevant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove more irrelevant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* even more irrelevant changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove sea response as init option

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* exec test example scripts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* [squashed from sea-exec] merge sea stuffs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess removed docstring

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess changes in backend

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove accidentally removed _get_schema_desc

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unnecessary init with sea_response tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* rmeove unnecessary changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* improved models and filters from cloudfetch-sea branch

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* filters stuff (align with JDBC)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* backend from cloudfetch-sea

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove filtering, metadata ops

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* raise NotImplementedErrror for metadata ops

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* change to valid table name

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes

covered by #588

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* simplify test module

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* logging -> debug level

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* change table name in log

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary backend cahnges

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-needed GetChunksResponse

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-needed GetChunksResponse

only relevant in Fetch phase

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce code duplication in response parsing

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce code duplication

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* more clear docstrings

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* introduce strongly typed ChunkInfo

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove is_volume_operation from response

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add is_volume_op and more ResultData fields

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add test scripts

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* Revert "Merge branch 'exec-models-sea' into exec-phase-sea"

This reverts commit be1997e, reversing
changes made to 37813ba.

* change logging level

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove excess changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove _get_schema_bytes (for now)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* redundant comments

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove fetch phase methods

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce code repetititon + introduce gaps after multi line pydocs

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* move description extraction to helper func

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* formatting (black)

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add more unit tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* streamline unit tests

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* test getting the list of allowed configurations

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* reduce diff

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* house constants in enums for readability and immutability

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add note on hybrid disposition

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove redundant note on arrow_schema_bytes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove invalid import

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add strong typing for manifest in _extract_description

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove un-necessary column skipping

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* remove parsing in backend

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* fix: convert sea statement id to CommandId type

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* make polling interval a separate constant

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* align state checking with Thrift implementation

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* update unit tests according to changes

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add unit tests for added methods

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add spec to description extraction docstring, add strong typing to params

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

* add strong typing for backend parameters arg

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>

---------

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
…nd forward refs, remove some unused imports

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Co-authored-by: jayant <167047871+jayantsing-db@users.noreply.github.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
…ions

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
This reverts commit dabba55, reversing
changes made to dd7dc6a.

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
This reverts commit 3a999c0, reversing
changes made to a1f9b9c.
This reverts commit dabba55, reversing
changes made to dd7dc6a.
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
@databricks databricks deleted a comment from github-actions bot Jul 28, 2025
@databricks databricks deleted a comment from github-actions bot Jul 28, 2025
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

@jayantsing-db
Copy link
Contributor

The exceptions thrown by the SEA and Thrift backends are not completely the same (yet), because this PR passes None to the context and session id of the RequestError object. This misparity will be logged and addressed later.

Could you add details on this? Is there a follow-up PR? what is the context and session id?

Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@databricks databricks deleted a comment from github-actions bot Aug 4, 2025
@varun-edachali-dbx varun-edachali-dbx merged commit 3b0c882 into main Aug 4, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants