Skip to content

The host task launching a kernel should yield repeated while waiting for the kernel #26905

@e-kayrakli

Description

@e-kayrakli

Runtime has the wiring in place to make that happen in: https://github.com/chapel-lang/chapel/blob/main/runtime/src/chpl-gpu.c#L162-L177

However, as noted in the comment, basic performance tests showed that that wasn't beneficial. @mppf pointed out that we can probably do a hybrid approach, where we can busy-wait for 1000 or so iterations while the stream is not ready, and then yield instead of yielding after every check on the stream.

That makes good sense to me, and it is something that can be tried for better performance when host/device overlap is important.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions