-
Notifications
You must be signed in to change notification settings - Fork 649
Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…#13373) Summary: Hardcode embedding/head dims which are fixed and use -1 for the sequence length dimension when reshaping, this allows us to quantize the graph once and export it to different sequence lengths. Differential Revision: D80181012
…stration Summary: `inverse_ion_registered_map_` stores the mapping from an external ION buffer to a registered memory handle, it's only populate by `RegisterIonMem`. I don't quite understand the use case for `RegisterCustomMem` and `PreRegisterCustomMemHandle` so I didn't attempt to change the existing `registered_map_`. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference. By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend. Rollback Plan: Differential Revision: D80267040
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13421
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit ce8fbd3 with merge base 9cfb684 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D80267040 |
This PR needs a
|
Thank you for the PR. |
@@ -31,6 +31,16 @@ Error QnnMemManager::RegisterIonMem( | |||
tensor_wrapper->GetDataType(), | |||
QNN_MEM_TYPE_ION, | |||
{{mem_fd}}}; | |||
if (auto it = inverse_ion_registered_map_.find(mem_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using mem_ptr as the key is sufficient to determine which handle you need. There could be two tensors that share the same base mem_ptr but have different shapes and data types. Therefore, they shouldn't share the same mem_handle.
I suggest using CustomMemTensorInfo to wrap tensor information as the key. For custom_mem, set it to the same value as tensor_addr, and set pos to zero.
Here's a brief explanation of the custom memory use case: QNN HTP provides several memory types. In Qnn Executorch, we support both QNN_MEM_TYPE_ION and QNN_MEM_TYPE_CUSTOM. The main difference between them is the relationship between the file descriptor and memory handles. You can find more details in the QNN documentation (Tutorials -> Advanced).



There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shewu-quic Sorry for the late reply.
Are you saying there are multiple graphs, which would mean multiple methods, and that some I/O tensors would like to be shared?
Yes.
Also, is the external buffer you mentioned not allocated by the QnnExecuTorchAllocCustomMem API?
No. ION and mapping same physical memory on both CPU and HTP is not specific to QNN, it can be done directly using rpcmem, and AFAIK once that's done the FastRPC driver will automatically skip copying the data in those buffers. Does QNN's registration bring any additional benefits? ION allocation can happen outside of the immediate surrounding of model inference code, it's not feasible to make every allocation site to depend on QNN/Executorch if there's a remote chance the allocated memory will be used for inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does QNN's registration bring any additional benefits?
We also allocate using rpcmem. You can use QnnExecuTorchAllocCustomMem and QnnExecuTorchFreeCustomMem to manage ION buffer. For QNN, we need to clarify whether this address refers to an ION buffer or a client buffer. Therefore, we will complete the information for Qnn_MemDescriptor_t and register the buffer with QNN.
Can I check don't you return here regarding external ION allocation? In the original design, we assumed the user would use QNN executorch to allocate the shared buffer. We will make sure this shared buffer is provided by the rpc lib, not from other sources.
Summary:
inverse_ion_registered_map_
stores the mapping from an external ION buffer to a registered memory handle, it's only populate byRegisterIonMem
. I don't quite understand the use case forRegisterCustomMem
andPreRegisterCustomMemHandle
so I didn't attempt to change the existingregistered_map_
. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference.By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend.
Rollback Plan:
Differential Revision: D80267040