-
-
Notifications
You must be signed in to change notification settings - Fork 356
chore/handle numcodecs codecs #3376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3376 +/- ##
==========================================
+ Coverage 94.69% 94.73% +0.04%
==========================================
Files 79 81 +2
Lines 9520 9708 +188
==========================================
+ Hits 9015 9197 +182
- Misses 505 511 +6
🚀 New features to boost your workflow:
|
src/zarr/codecs/_numcodecs.py
Outdated
|
||
def _encode(self, chunk_data: Buffer, prototype: BufferPrototype) -> Buffer: | ||
encoded = self._codec.encode(chunk_data.as_array_like()) | ||
if isinstance(encoded, np.ndarray): # Required for checksum codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we know statically which are checksum codecs without the isinstance check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.b., this was copy + pasted from numcodecs, but I think the answer is "no"
src/zarr/codecs/_numcodecs.py
Outdated
codec_name: str | ||
codec_config: dict[str, JSON] | ||
|
||
def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would a codec definition look like without this magic? I'd be fine with repeating a few things if it meant we could avoid this (and IIUC some of the complexity in __repr__
and __init__
would go away too?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I have rebased #3332 off of this PR, I am 100% going to demagic these codecs in that effort.
src/zarr/codecs/_numcodecs.py
Outdated
def __init__(self, **codec_config: JSON) -> None: | ||
super().__init__(**codec_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this do?
> Can you explain the different cases here? Spinning this question out into the main thread -- from me, the general answer to questions like this will be "no", since I am only copy+pasting stuff from numcodecs. I haven't spent too much time figuring out what this code is doing. I do think @normanrz and @TomNicholas might be able to answer some of these questions though. |
Ah, I didn't realize this was mostly from numcodecs. I think that moots most of my comments aside from where in the public API we put these. |
yeah I should have made more clear that this is nearly all directly copy + pasted from |
…v-b/zarr-python into chore/handle-numcodecs-codecs
…ore/handle-numcodecs-codecs
I think this is ready to go in (and it's necessary for #3332) |
and important recent addition: I moved all of the invocations of |
@d-v-b have you tested this with different numcodecs versions to make sure there's no unexpected issues with clobbering of codec registration? |
I'm don't think I expect any behavior to depend on numcodecs versions, happy to be corrected though. My understanding is that the codecs in we have a test that checks our compatibility with these codecs, defined as dicts. In main these tests will pick up the codec class from numcodecs, but in this PR the version of the codec defined in zarr python is used instead. Is that kind of thing you are worried about? |
This PR brings in all the codecs defined in
numcodecs.zarr3
. After this PR is merged, we can safely replace thenumcodecs.zarr3
module with reexports from zarr python, or removenumcodecs.zarr3
entirely, thereby fixing our circular dependency problem.This PR also changes the default config to ensure that the locally-defined codecs take priority over the same codec found in the numcodecs registry.