MAINT,ENH: Possible improvements related to buffered iteration

gh-27883 moved setup of buffering to the beginning of the iterator construction to avoid duplicate work later on (and just simplify things a lot).

There are a few related improvements that can be done now:
* The `FixedStridesArray` is now always identical to the actual "inner strides".  There are two things we can do here:
  1. Internally, just stop using the fixed strides (and use the strides immediately) as a slight simplification/optimization.
  2. We could advertise this fact promising to not change it back (I don't _think_ there is much of a reason to consider changing back, but I am not 100% sure.).
* We now know clearly at setup time if we need any buffering at all.
  * We could skip all buffer setup (overhead cost) and use a faster `iternext()` function in principle.  There are two things to keep in mind:
    1. The `BUFFERED` flag is used to reject some API calls, we must keep rejecting the API even if we don't need to buffer internally.
    2. The iterator struct is different when buffering, simply unsetting the flag would break offsets.  (i.e. may need a new flag to just skip steps). 
* The buffered `iternext()` always uses `goto_iterindex`.  This function is heavy weight.  During normal iteration advancing the iterator is much easier and faster (mainly, no need to look at all dimensions).
* The buffer setup is currently unable to realize that it may be beneficial to use a "reduce style" iteration (a double loop) even when not required because doing so may mean fewer operands need to be buffered. [discussion](https://github.com/numpy/numpy/pull/27883#discussion_r1865800046)
* The code tries to guess the best buffer-size, but this method is crude and can probably be improved.  A small constraint is that we should err a bit on larger buffers (or we have to deal with floating point precision changes with float16 in the einsum tests).
  The code optimizes "overheads" (very crudely), for small buffers that dominates, but for largish buffers, the buffer copy itself might also make a difference.
  I am sure this can be better, but it is OK if it isn't ideal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT,ENH: Possible improvements related to buffered iteration #28018

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

MAINT,ENH: Possible improvements related to buffered iteration #28018

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions