Skip to content

unified runtime type checking for our json data #3285

@d-v-b

Description

@d-v-b

Reading array metadata documents means consuming JSON-formatted data and type-checking them. We are doing this in a clunky way right now: for each field f, we essentially have a parse_f function that checks if the input data is compatible with f.

A big improvement would be unifying the parse_x functions into something with the following form:

def check_literal(value, type_annotation) -> type_annotation:
    if value in get_args(type_annotation):
        return value
    raise UsefulException

def parse_union(value, type_annotation) -> type_annotation:
    # check if the value is in the union type, return it if so

def parse_tuple(value, type_annotation) -> type_annotation:
    # check if the value is consistent with the tuple type annotation, return the input as a tuple if so

...

parse_json(value, type_annotation) -> type_annotation:
    # categorize the type annotation into Mapping, tuple, Sequence, union, literal, and call out to the relevant 
    # parsing routine
    # return data 

i.e., functions that take a value and a type annotation, and return data assignable to that type annotation or raise a useful exception. In my thinking these are not strict type checks, because these functions are allowed to transform the input, e.g. parse_tuple([1,2,3], tuple[int, int, int]) would return (1,2,3).

This would remove a lot of redundant code. We would keep the scope narrow by only concerning ourselves with the types relevant for creating array metadata documents, namely:

  • primitive types None, str, int, float, bool
  • Sequences (they should come out as tuples)
  • unions
  • TypedDict (essential for handling JSON form of dtypes, codecs, chunk grids, and metadata itself)
  • Mapping[str, T], where T is any of the other types in this list

I have have had LLMs whip up implementations of this on like 3 separate occasions, and each time it wasn't more than a few hundred LOC, so I think this would not be a huge maintenance burden.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good-first-issueGood place to get started as a new contributor.help wantedIssue could use help from someone with familiarity on the topic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions