Optimize getitem with empty chunks #3379

TomAugspurger · 2025-08-17T18:36:11Z

This optimizes our read performance by using NDBuffer.empty instead of NDBuffer.create with a fill value. All chunks are filled, either with the values from the Store if that chunk was initialized, or the fill value if the Store returns None (the
chunk wasn't initialized). By not memsetting values that will be overwritten anyway, things will be a bit faster.

I didn't see any documentation about CodecPipeline in the docs. Technically, users might see a behavior change, if the user's codec pipeline happened to rely on receiving an array filled with the fill_value. Do we know of anyone writing / using custom codec pipelines?

Here's a mini benchmark:

import asyncio
import statistics
import time

import zarr.api.asynchronous
import zarr.storage


async def main():
    store = zarr.storage.MemoryStore()
    array = await zarr.api.asynchronous.create_array(
        store, shape=(125_000_000,), dtype="float64", chunks=(125_000,), fill_value=1.0
    )
    times = []
    for _ in range(10):
        t0 = time.monotonic()
        await array.getitem(slice(None))
        t1 = time.monotonic()
        times.append(t1 - t0)

    print("Unitialized", statistics.median(times))

    await array.setitem(slice(None), 2.0)
    times = []
    for _ in range(10):
        t0 = time.monotonic()
        await array.getitem(slice(None))
        t1 = time.monotonic()
        times.append(t1 - t0)
    print("Initialized", statistics.median(times))


if __name__ == "__main__":
    asyncio.run(main())

main:

❯ python bench.py
Unitialized 0.6652364450274035
Initialized 0.9079880800272804

HEAD:

❯ python bench.py
Unitialized 0.5070544500194956
Initialized 0.8395943949872162

Closes #3368

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

codecov · 2025-08-17T18:48:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.55%. Comparing base (c2c4e21) to head (349a651).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3379   +/-   ##
=======================================
  Coverage   94.55%   94.55%           
=======================================
  Files          79       79           
  Lines        9447     9447           
=======================================
  Hits         8933     8933           
  Misses        514      514

Files with missing lines	Coverage Δ
src/zarr/abc/codec.py	`95.06% <ø> (ø)`
src/zarr/codecs/sharding.py	`94.47% <100.00%> (ø)`
src/zarr/core/array.py	`97.10% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TomAugspurger · 2025-08-18T18:26:15Z

@ilan-gold do you know if the zarrs-python codec pipeline relies on the out_buffer being an ndarray initialized with the fill_value? Somewhere around https://github.com/zarr-developers/zarr-python/pull/3379/files#diff-f488098fc197721bdc0ec9afd3ec04dc63a5bf1ca34078727306d6ee91c837f7L1366 is where the codec pipeline's read method is called.

ilan-gold · 2025-08-19T11:16:26Z

Hmmm I'm not sure here. I think we do the same as the zarr codec pipeline:

All chunks are filled, either with the values from the Store if that chunk was initialized, or the fill value if the Store returns None (the
chunk wasn't initialized).

i.e., this description matches the code https://github.com/zarrs/zarrs-python/blob/main/src/lib.rs#L342-L361 so I think we're good. Unless there were some other failure mode I could test, this looks reasonable (and our tests pass against this branch).

Optimize getitem with empty chunks

8808f91

d-v-b approved these changes Aug 18, 2025

View reviewed changes

Merge branch 'main' into tom/use-empty

aa274b1

Merge branch 'main' into tom/use-empty

349a651

d-v-b enabled auto-merge (squash) August 19, 2025 11:17

d-v-b merged commit 4b26501 into zarr-developers:main Aug 19, 2025
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize getitem with empty chunks #3379

Optimize getitem with empty chunks #3379

Uh oh!

TomAugspurger commented Aug 17, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 17, 2025 •

edited

Loading

Uh oh!

TomAugspurger commented Aug 18, 2025

Uh oh!

ilan-gold commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize getitem with empty chunks #3379

Optimize getitem with empty chunks #3379

Uh oh!

Conversation

TomAugspurger commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Aug 18, 2025

Uh oh!

ilan-gold commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

TomAugspurger commented Aug 17, 2025 •

edited

Loading

codecov bot commented Aug 17, 2025 •

edited

Loading