Skip to content

JSON (output) support for sqlite3 #137974

@ssmycelium

Description

@ssmycelium

Feature or enhancement

Proposal:

Here's some code/benchmark that demonstrates the concept.

import sqlite3, json
import time
import subprocess

def dict_factory(cursor, row):
    d = {}
    for idx, col in enumerate(cursor.description):
        # col[0] is the column name
        d[col[0]] = row[idx]
    return d

def get_data_to_json():
    conn = sqlite3.connect("/home/user/database.sqlite")
    conn.row_factory = dict_factory
    c = conn.cursor()
    c.execute("SELECT * FROM server")
    rst = c.fetchall() # rst is a list of dict
    return rst

start = time.time()
rst = get_data_to_json()
print(f"normal data load: {time.time() - start}")

start = time.time()
out = subprocess.run(["sqlite3", "--readonly", "/home/user/database.sqlite", ".mode json", "select * from server"], check=True, capture_output=True).stdout
print(f"sqlite3 command: {time.time() - start}")

start = time.time()
loadit = json.loads(out)
print(f"load json from command: {time.time() - start}")
start = time.time()
json.dumps(loadit)
print(f"dump loaded json: {time.time() - start}")
$ python benchmark.py 
normal data load: 0.7838420867919922
sqlite3 command: 0.19471359252929688
load json from command: 0.21944022178649902
dump loaded json: 0.16810369491577148

I recently found out that sqlite3's CLI has a JSON mode. This JSON mode is considerably faster than any Python option that I'm aware of.

There's lots of cases where this might be useful. Let's say you have an API that needs to dump a decent bit of data from your database. There's no reason to load the JSON, then dump it to send it. In this case, it's about 5x faster to let sqlite dump the JSON and pass that along.

What's crazy is that it's even 2x faster to call sqlite3 and load JSON, than it is to fetchall().

I am not sure how the sqlite interface works and if this would require work on sqlite's part, but to me this is a substantial improvement.

I imagine this might have an interface like:

session.exec(query, mode=json) or session.exec(query, mode=dict)

Curious what you all think.

Thanks!

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Update: There's now a discussion for it: https://discuss.python.org/t/json-output-support-for-sqlite3/103048

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions