-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Open
Description
gh-27883 moved setup of buffering to the beginning of the iterator construction to avoid duplicate work later on (and just simplify things a lot).
There are a few related improvements that can be done now:
- The
FixedStridesArray
is now always identical to the actual "inner strides". There are two things we can do here:- Internally, just stop using the fixed strides (and use the strides immediately) as a slight simplification/optimization.
- We could advertise this fact promising to not change it back (I don't think there is much of a reason to consider changing back, but I am not 100% sure.).
- We now know clearly at setup time if we need any buffering at all.
- We could skip all buffer setup (overhead cost) and use a faster
iternext()
function in principle. There are two things to keep in mind:- The
BUFFERED
flag is used to reject some API calls, we must keep rejecting the API even if we don't need to buffer internally. - The iterator struct is different when buffering, simply unsetting the flag would break offsets. (i.e. may need a new flag to just skip steps).
- The
- We could skip all buffer setup (overhead cost) and use a faster
- The buffered
iternext()
always usesgoto_iterindex
. This function is heavy weight. During normal iteration advancing the iterator is much easier and faster (mainly, no need to look at all dimensions). - The buffer setup is currently unable to realize that it may be beneficial to use a "reduce style" iteration (a double loop) even when not required because doing so may mean fewer operands need to be buffered. discussion
- The code tries to guess the best buffer-size, but this method is crude and can probably be improved. A small constraint is that we should err a bit on larger buffers (or we have to deal with floating point precision changes with float16 in the einsum tests).
The code optimizes "overheads" (very crudely), for small buffers that dominates, but for largish buffers, the buffer copy itself might also make a difference.
I am sure this can be better, but it is OK if it isn't ideal.
Metadata
Metadata
Assignees
Labels
No labels