Skip to content

ENH: add a casting option 'same_value' and use it in np.astype #29129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

mattip
Copy link
Member

@mattip mattip commented Jun 5, 2025

Users have asked for an option to raise an error when casting overflows. This is hard to do with the NEP-50 based casting, which only offers blanket rules for casting one dtype to another, without taking the actual values into account. This PR adds a new same_value argument to the casting kwarg (implemented in PyArray_CastingConverter), and extends the casting loop functions to raise a ValueError if same_value casting and the value would be changed by the cast. So far this is only implemented for ndarray.astype, i.e. np.array([1000]).astype(np.int8, casting='same_value') will now raise an error.

Performance: since the PR touches the assignment deep in the inner loop, I checked early on that it does not impact performance when not using same_value casting. The loop pseudo code now looks like this, and when compiled with -O3 (via a gcc pragma specific to the loop), it seems the compiler is smart enough to make the if condition not impact performance.

int same_value_casting = <check some condition>
type2 dst_value; type1 src_value;
while (<condition>) {
    <setup dst_value, src_value>
    dst_value = (cast)src_value;
    if (same_value_casting) {
        <do some checking>;
    }
    <iterate>;
}

Protection: This PR only changes ndarray.astype. Future PRs may implement more. I tried to guard the places that use PyArray_CastingConverter to raise an error if 'same_value' is passed in:

  • array_datetime_as_string (by disallowing 'same_value' in can_cast_datetime64_units)
  • array_copyto
  • array_concatenate (by disallowing 'same_value' in PyArray_ConcatenateInto)
  • array_einsum (by disallowing 'same_value' in PyArray_EinsteinSum)
  • ufunc_generic_fastcall

'same_value' is allowed in

  • array_astype (the whole goal of this PR)
  • array_can_cast_safely
  • npyiter_init (I am pretty sure that is OK?)
  • NpyIter_NestedIters (I am pretty sure that is OK?)

Testing: I added tests for ndarray.astype() covering all combinations of built-in types, and also some tests for properly blocking casting='same_value'.

TODO:

  • disallow astype(..., casting='same_value') with datetime64 or user-defined dtypes?
  • rerun benchmarks against main

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty cool that we can do it! That said, I would prefer some more tinkering about the approach.

Basically, I think making it work everywhere is not nearly as hard as it may seem and so I think we have to at least try.

I think the only good way to do this is that the new cast level(s) are reported by the loops (via resolve_descriptors) and integrated into PyArray_MinCastSafety. By doing so, we will achieve two things:

  1. Things should just work everywhere as long as you also set the context.flag correctly everywhere (should also not be too much).
  2. I am very sure that currently things are broken for datatypes that don't implement this flag yet. And unfortunately, that is not great. If a user does casting="same_value" a "safe" cast is fine but we need to reject "unsafe" and "same_kind" casts.

So, I think we really need something for 2, and honestly think the easiest way to solve it is this same path that gives us full support everywhere.
(And yes, I don't love having two "same_value" levels, but I doubt it can be helped.)

(All other things are just nitpicks at this time.)

@@ -144,6 +150,11 @@ typedef struct {
#define NPY_METH_contiguous_indexed_loop 9
#define _NPY_METH_static_data 10

/*
* Constants for same_value casting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be constant for special array method return values, I think (i.e. not just applicable to casts).

Is it really worth to just keep this inside the inner-loop? Yes, you need to grab the GIL there, but on the plus side, you could actually consider reporting the error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of having a return value from the inner loop function is so we can use it to report errors. In general I prefer a code style that tries as much as possible to separate programming metaphors: keep the python c-api calls separate from the "pure" C functions.

Copy link
Member

@seberg seberg Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be convinced to add this. But it is public API and we call these loops from a lot of places so we need machinery to make sure that all of these places give the same errors.

That may be a great refactor! E.g. we could have a single function that does HandleArrayMethodError("name", method_result, method_flags).
But I guess without such a refactor it feels very bolted on to me, because whether or not we check for this return depends on where we call/expect it.

EDIT: I.e. basically, wherever we have PyUFunc_GiveFloatinpointErrors() we would also pass the actual return value and do the needed logic there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to do this as a separate PR.

cast_info.context.flags |= NPY_SAME_VALUE_CASTING;
PyErr_SetString(PyExc_NotImplementedError, "'same_value' casting not implemented yet");
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we really shouldn't need any of these checks.

What you need to do is modify:

PyArray_MinCastSafety()

to correctly propagate the new cast level. If we can correctly propagate cast levels, everything else will just drop out for free. Although, you have to also report the correct cast level in the cast loops (that is the *_resolve_descriptors functions.)

What I am not quite sure about is how same-value and same-kind interact, unfortunately.
For the user, this doesn't matter, but we do the opposite internally, we say:
This cast is "SAFETY", which is why I suggested a flag, but maybe it is just two levels: same-value-unsafe and same-value-same-kind.

Before you think what madness begot Sebastian... The two levels would not to be user exposed to python users at all! The user would just use same_value (translating to same-value-unsafe). But a cast will be able to report same-value-same-kind to know that both can_cast(x, y, "same_value") and can_cast(x, y, "same_kind") is valid.

(I might be tempted to make this a "flag", in the sense of making sure there is one bit involved that isn't used by other cast levels.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I want to expand the scope of this PR to support other things. Here are the other functions that support a casting kwarg via PyArray_CastingConverter, which do you imagine should support same_value in this PR?

  • datetime_as_string
  • copyto
  • concatenate
  • einsum
  • can_cast (agreed, it should "support" same_value like it does unsafe)
  • npyiter_init, NpyIter_NestedIters (??? anyhow, should NOT support it in this PR)
  • convert_ufunc_arguments, py_resolve_dtypes_generic (should NOT support it in this PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was asking for basically two things:

  1. Fix the MinCastSafety logic and return the (unfortunately two) "same_value" casting levels from the resolve_dtypes functions. This:
    • Fixes user-dtype and other-dtype support.
    • Is the only non "just bolt it on" way to fix np.can_cast (which this will automatically fix).
  2. We should pass cast safety to PyArray_GetDTypeTransferFunction so that we can correctly initialize the context there already. This would:
    • Fix any problems with chained casts (this is a lingering bug, but dunno if it is worth much in practice; casting a structured dtype with one field is probably broken with "same_value" without this).
    • I honestly think that if you do this, you will solve everything else. (except I suggest to not do that magic return in this PR, because doing it right would probably touch more code than doing this right.)

Now 1. seems just necessary to me, to be honest?
2. seems hard to avoid long-term, because we want to support the level in more places, I think and also because of lingering small bugs.

I suspect that doing 2. is mostly easy. The one thing that is probably annoying is threading it through the internal/recrusive calls to PyArray_GetDTypeTransferFunction.

convert_ufunc_arguments, py_resolve_dtypes_generic (should NOT support it in this PR)

Well, while I agree it is not important for them to do so. It also would be completely fine if it drops out. I don't think there is an API reason to dislike it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually be happy to try and split out 1. into it's own PR, although we would have to revert it if we don't follow-up. I.e. add support for can_cast() and correct reporting for it, but just ignore it in practice when used?

(Just to keep this more focused and since I think it's very unlikely we won't see this to it's end.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this stand? I think you fixed 1 in the PR to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip i don't understand: you rebased away my PR to this PR and just kept a tight check within astype() itself for builtin numerical casts instead.

I thought that was likely intentional, but maybe it was not?

Which means e.g. that:

In [4]: a.astype(str, casting="same_value")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 a.astype(str, casting="same_value")

ValueError: 'same_value' casting only supported on built-in numerical dtypes

In [5]: a.astype(str, casting="safe")
Out[5]: array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype='<U21')

Which yeah, I don't really like as an indefinite solution, both due to the above behavior and because it means user dtypes can't possible support this level.

Except for casts to bools a safe cast would be expected to be a valid same-value cast also, I think. (Casts to bool are a bit fun, because we use them as a definition for __nonzero__.)

@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jun 5, 2025
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add this option to

_CastingKind: TypeAlias = L["no", "equiv", "safe", "same_kind", "unsafe"]

@mattip
Copy link
Member Author

mattip commented Jun 5, 2025

It seems I touched something in the float -> complex astype. Running

$ spin bench --compare -t bench_ufunc.NDArrayAsType.time_astype

gives me

Change Before [d52bccb] After [23dda18] Ratio Benchmark (Parameter)
+ 2.60±0.04μs 6.45±0.05μs 2.48 bench_ufunc.NDArrayAsType.time_astype(('float32', 'complex64'))
+ 4.08±0.01μs 9.21±0.3μs 2.26 bench_ufunc.NDArrayAsType.time_astype(('float64', 'complex128'))
+ 3.99±0.04μs 8.62±0.04μs 2.16 bench_ufunc.NDArrayAsType.time_astype(('float32', 'complex128'))
+ 15.8±0.4μs 17.8±0.2μs 1.13 bench_ufunc.NDArrayAsType.time_astype(('float16', 'float32'))
+ 16.0±0.1μs 18.0±0.3μs 1.13 bench_ufunc.NDArrayAsType.time_astype(('float16', 'int64'))
+ 2.14±0μs 2.39±0.03μs 1.11 bench_ufunc.NDArrayAsType.time_astype(('int32', 'float32'))
+ 18.0±0.2μs 19.9±0.2μs 1.1 bench_ufunc.NDArrayAsType.time_astype(('float16', 'complex128'))
- 2.20±0.03μs 1.98±0.04μs 0.9 bench_ufunc.NDArrayAsType.time_astype(('int16', 'int32'))

@seberg
Copy link
Member

seberg commented Jun 6, 2025

It seems I touched something in the float -> complex astype.

Maybe the compiler just didn't decide to lift the branch for some reason? I have to say that complex -> real casts, wouldn't be my biggest concern, there is this weird ComplexWarning on it (in some branches at least), for a reason...

@mattip
Copy link
Member Author

mattip commented Jun 15, 2025

The linting failures are from #29197 and PR to fix them in #29210.

The OpenSUSE failure seems to be due to network failures downloading a package.

@mattip
Copy link
Member Author

mattip commented Jul 21, 2025

Somehow this caused test_nditer.py::test_iter_copy_casts to fail for [e-f], [e-d], [e-F], [e-D], [e-g], [F-e], [D-e], [G-e], but only on the debug build, which would suggest I touched something in the half casting.

Edit: it was due to the debug build exposing a missing initialization of context.flags

@mattip
Copy link
Member Author

mattip commented Jul 22, 2025

CI is passing

@mattip
Copy link
Member Author

mattip commented Jul 22, 2025

Hmm. I wonder what we should do with casting float-to-int or float-to-smaller-float? I think the second example should raise, but what about the third?

>>> np.array([1.0, 2.0, 100.0]).astype('int64', casting='same_value')
array([  1,   2, 100])
>>> np.array([1.2, 2.45, 100.0]).astype('int64', casting='same_value')
array([  1,   2, 100])

>>> a = np.array([1.2, 2.45, 3.14156])
>>> a == a.astype(np.half, casting='same_value')
[False, False, False]

@seberg
Copy link
Member

seberg commented Jul 23, 2025

I had a list of 5-6 different possible definitions, but for now skipping it. I am tempted to go with your a == a.astype(new_dtype, casting='same_value') must be true definition (one caveat below).
(This implies that a.astype(new_dtype, casting='same_value').astype(a.dtype) round-trips, but is stronger due to == indicating that the two dtype must be promotable int he context of comparing.)

Unless we go all out and say that a same_value cast must also be a same_kind one and disallow float to integer casts entirely here, the float must clearly be an integer.
That allows the use-case of wanting to use a float array as an integer one safely. If this is not desired, the best solution may be an "is_integral" check for the dtype.

  • One additional subtlety: The float value 2e60 lacks the mantissa to exactly represent an integral value. Even though a == a.astype(new_dtype, casting='same_value') is true one could consider it unsafe from this perspective.

It may actually be worth bringing it up on the mailing list and meeting and sketching at least the main alternatives. (I don't usually expect much input, but sketching the alternatives tends to clarify things a bit either way.)

I'll try to look over the recent changes here soon, but ping me here or on slack if I forget!

@mattip
Copy link
Member Author

mattip commented Jul 23, 2025

I think it makes sense to do the most conservative thing and require "ability to accurately round-trip" for 'same_value' in this first implementation.

@seberg
Copy link
Member

seberg commented Jul 24, 2025

I think it makes sense to do the most conservative thing and require "ability to accurately round-trip" for 'same_value' in this first implementation.

I suppose we have at least a brief window where we could still broaden the definition slightly.
There is one use-case where a different definition might be nice, and that is if we were to consider moving towards this cast mode as a default one output parameters/assignments:

arr += something
arr[...] = other_array

That is currently same-kind I think. In that context I think it is best to forgive all float inaccuracies (including over and underflows).

But, I don't want to get lost too much in this. Unless we are sure that we are only interested in one of these two use-cases for floats (or float to int), it seems fine to just focus on one now and add the other if we want to later.

(I suppose I wouldn't mind asking around if there is a preference about the insufficient mantissa for float -> int casts, but it is a detail. -- but more of a sanity check because I don't have a strong gut feeling.)

This comment has been minimized.

This comment has been minimized.

@mattip
Copy link
Member Author

mattip commented Jul 28, 2025

@jorenham I added the new value as requested. There is some message about mypy_primer, do I need to do more to fix typing?

This comment has been minimized.

@jorenham
Copy link
Member

@jorenham I added the new value as requested. There is some message about mypy_primer, do I need to do more to fix typing?

Primer is just reporting that there are some error messages that changed, so nothing to worry about.

This comment has been minimized.

This comment has been minimized.

@mattip
Copy link
Member Author

mattip commented Aug 3, 2025

CI is passing

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine with that caveat that we only support numerical casts in NumPy.
I am not actually sure about bools, but making them not round-trip (current state) seems probably OK, if just to say that a safe cast is generally also a same-kind one... and we consider that a safe cast (because it's defined as __nonzer__() basically).

I don't love getting rid of logic to make can_cast work (at least indefinitely so), but OK, should just let others give an opinion once.

@@ -227,6 +227,8 @@ typedef enum {
NPY_SAME_KIND_CASTING=3,
/* Allow any casts */
NPY_UNSAFE_CASTING=4,
/* Allow any casts, check that no values overflow/change */
NPY_SAME_VALUE_CASTING=5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that it really works exactly for builtin numeric types only, do you care for making it public on the C-API?

I still expect the only reasonable way to make this public will be to have the two states, which somewhat makes me want to give this a value of 8+4=12.

.. c:enumerator:: NPY_SAME_VALUE_CASTING

Allow any cast, but error if any values change during the cast. Currently
supported only in ``ndarray.astype(... casting='same_value')``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this public on the C-API, I think we should be clear that this is only supported by cast safety requests and not to define a cast safety.
Should also add .. versionadded:: ... with a note that on previous numpy version it will behave the same as "unsafe".
(or you guard the define with an NPY_FEATURE_VERSION or both. I would be fine with just a note that it is effectively ignored on older versions also.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a versionadded and gated the new values with FEATURE_VERSION

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot yesterday. If you pull out my changes, then we either keep this as a flagable value (e.g. 8 or 16), or we don't make it public API yet, IMO.

(which I don't care to be strict about make it _... with a code comment.

"'same_value' casting only supported on built-in numerical dtypes");
return -1;
}
cast_info.context.flags |= NPY_SAME_VALUE_CASTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this wasn't effectively private API right now, I would be opposed to re-using the NPY_SAME_VALUE_CASTING flag in a context where all the other cast levels don't make any sense (but other flags probably do in the future).

(I would like if we could note that somewhere. Probably here, since this is where the error for unsupported casts happens.)

cast_info.context.flags |= NPY_SAME_VALUE_CASTING;
PyErr_SetString(PyExc_NotImplementedError, "'same_value' casting not implemented yet");
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip i don't understand: you rebased away my PR to this PR and just kept a tight check within astype() itself for builtin numerical casts instead.

I thought that was likely intentional, but maybe it was not?

Which means e.g. that:

In [4]: a.astype(str, casting="same_value")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 a.astype(str, casting="same_value")

ValueError: 'same_value' casting only supported on built-in numerical dtypes

In [5]: a.astype(str, casting="safe")
Out[5]: array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype='<U21')

Which yeah, I don't really like as an indefinite solution, both due to the above behavior and because it means user dtypes can't possible support this level.

Except for casts to bools a safe cast would be expected to be a valid same-value cast also, I think. (Casts to bool are a bit fun, because we use them as a definition for __nonzero__.)

@@ -367,7 +367,7 @@ PyArray_LegacyCanCastTypeTo(PyArray_Descr *from, PyArray_Descr *to,
* field; recurse just in case the single field is itself structured.
*/
if (!PyDataType_HASFIELDS(to) && !PyDataType_ISOBJECT(to)) {
if (casting == NPY_UNSAFE_CASTING &&
if ((casting == NPY_UNSAFE_CASTING || (casting == NPY_SAME_VALUE_CASTING)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking closer: I am very sure all of this is dead code and you don't have to modify this function at all.
I think I would slightly prefer to just not do this, but OTOH, it doesn't really matter either way and it looks probably correct in a sense.

(I half suspect that this code has been dead for ~15 years, but not 100% sure; I probably didn't modify it on account of just not touching the code is easiest to avoid accidental regressions.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand: you rebased away my PR to this PR and just kept a tight check within astype() itself for builtin numerical casts instead.

That was a mistake, sorry, thanks for catching it. I added the changeset back to the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very sure all of this is dead code and you don't have to modify this function at all.

Should we do a little PR to delete it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe. Just need to make sure to really make all the paths we don't actually use a hard error and not touching the other ones.
(We only use the function with inputs of NPY_SAME_KIND_CASTING and NPY_SAFE_CASTING.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip I'll make an iteration tomorrow or so. But if you prefer, we could also just put this in as is and make sure to follow up with my changes in a different PR.

(Since it adds public API and also should allow supporting can_cast, it may be a bit better to do one or two iterations and this is pretty big.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you prefer, we could also just put this in as is and make sure to follow up with my changes in a different PR.

Which changes? I can open a follow-on issue...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean my changes. I want them, but thought it might be easier to split out.

Anyway, let me go through and then do an issue. Since can-cast should just work now, we should probably activate it, but it doesn't need to be here.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@mattip mattip removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Aug 9, 2025
Copy link

Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code:

spark (https://github.com/apache/spark)
- python/pyspark/ml/functions.py:244: note:     def [_ScalarT: generic[Any]] vstack(tup: Sequence[_SupportsArray[dtype[_ScalarT]] | _NestedSequence[_SupportsArray[dtype[_ScalarT]]]], *, dtype: None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
+ python/pyspark/ml/functions.py:244: note:     def [_ScalarT: generic[Any]] vstack(tup: Sequence[_SupportsArray[dtype[_ScalarT]] | _NestedSequence[_SupportsArray[dtype[_ScalarT]]]], *, dtype: None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
- python/pyspark/ml/functions.py:244: note:     def [_ScalarT: generic[Any]] vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[_ScalarT] | dtype[_ScalarT] | _SupportsDType[dtype[_ScalarT]], casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
+ python/pyspark/ml/functions.py:244: note:     def [_ScalarT: generic[Any]] vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[_ScalarT] | dtype[_ScalarT] | _SupportsDType[dtype[_ScalarT]], casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
- python/pyspark/ml/functions.py:244: note:     def vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[Any]]
+ python/pyspark/ml/functions.py:244: note:     def vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[Any]]

@mattip
Copy link
Member Author

mattip commented Aug 18, 2025

In discussions in #29558 and offline, @seberg suggested making exposing same_value as a flag layered on top of the NPY_CASTING enums, and adjusting the casting resolution appropriately. I made that change in f4e657a. CI is passing, except for the SIMD runs which were fixed by #29585

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants