Skip to content

the dbm docs are vague about what encoding is used when a str is stored #122996

@cameron-simpson

Description

@cameron-simpson

Documentation

The documentation for the dbm module, eg at https://docs.python.org/3.13/library/dbm.html include an example storing keys and values which are of type str. The documentation says about strings:

Key and values are always stored as [bytes](https://docs.python.org/3.13/library/stdtypes.html#bytes).
This means that when strings are used they are implicitly converted
to the default encoding before being stored.

It is not at all clear to me what "the default encoding" means. For example, one might assume it was the encoding from locale.getenconding() but I think not. Looking at the dbm.sqlite code one sees CAST(? AS BLOB) as the insert parameter placeholder. That says to me that the encoding is whatever the database is using, and that is not apparent to me from looking at the code. And I imagine that the other dbm backends may use other, different, choices for the default encoding.

Ideally I'd like:

  • an explicit statement about how the encoding is chosen if that is possible, or a statement that this is backend and possibly current-locale dependent otherwise
  • a statement that users should probably always do their own conversion to bytes before storing values if they want control
  • possibly an optional encoding parameter for the dbm.open calls to provide an encoding, with the current (vague but historicly compatible) behaviour if unset, but if provided then an explicit catch of str values in __setitem__ if set, and conversion according to what was provided

I can probably make a PR for the second and third items, and the "backend dependent" flavour of the first one.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dirextension-modulesC modules in the Modules dirstdlibPython modules in the Lib dirtriagedThe issue has been accepted as valid by a triager.type-featureA feature request or enhancement

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions