Skip to content

How to insert into VARIANT column? #681

@excavator-matt

Description

@excavator-matt

As we all know, Databricks has a VARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column using databricks-sql-connector==4.0.5 though.

If you want to write to a PostgreSQL JSON column in psycopg2, you use json dump. This trick did not seem to work here. I also tried custom escaped JSON and simply dict, but also no.

import databricks.sql
import json

# Replace with your Databricks SQL warehouse details
server_hostname = "<SERVER_HOSTNAME>"
http_path = "<HTTP_PATH>"
access_token = "<ACCESS_TOKEN>"

# Example data to insert
content = {
    "age": 29,
    "city": "New York"
}
data = [
    (1, "Alice", json.dumps(content)),
    2, "Bob", None), # (2, "Bob", data) # Raises error
    (3, "Charlie", """{"age": 29, "city": "New York"}"""),
]

# Connect to Databricks SQL
with databricks.sql.connect(
    server_hostname=server_hostname,
    http_path=http_path,
    access_token=access_token
) as connection:
    with connection.cursor() as cursor:
        # Insert data into Delta table
        cursor.execute("CREATE OR REPLACE TABLE test_catalog.ad_hoc.variant_test (id INT, name STRING, content VARIANT) USING DELTA")
        cursor.executemany(
            "INSERT INTO prod_catalog.ad_hoc.variant_test (id, name, content) VALUES (?, ?, ?)",
            data
        )

If I query the resulting table,

SELECT *, content:age, parse_json("{\"age\": 29, \"city\": \"New York\"}") FROM ad_hoc.variant_test ORDER BY id;

1 and 3 succeeds, but it seems to end up as a "flat"STRING (schema_of_variant(content)) instead of OBJECT<age: BIGINT, city: STRING>.

Image

Attempt 2 (insert a dict) raises the error

databricks.sql.exc.ServerOperationError: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "content" due to data type mismatch: cannot cast "MAP<VOID, VOID>" to "VARIANT". SQLSTATE: 42K09; line 1 pos 0

Could we add documentation for how this is supposed to work? I guess MAP, STRUCT and VARIANT all map to Python type dict in some sense. It is only that VARIANT don't have a predefined schema like MAP or STRUCT, and unlike MAP can be nested?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions