The host task launching a kernel should yield repeated while waiting for the kernel

Runtime has the wiring in place to make that happen in: https://github.com/chapel-lang/chapel/blob/main/runtime/src/chpl-gpu.c#L162-L177

However, as noted in the comment, basic performance tests showed that that wasn't beneficial. @mppf pointed out that we can probably do a hybrid approach, where we can busy-wait for 1000 or so iterations while the stream is not ready, and then yield instead of yielding after every check on the stream.

That makes good sense to me, and it is something that can be tried for better performance when host/device overlap is important.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The host task launching a kernel should yield repeated while waiting for the kernel #26905

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The host task launching a kernel should yield repeated while waiting for the kernel #26905

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions