Skip to content

Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sxu
Copy link
Contributor

@sxu sxu commented Aug 14, 2025

Summary:
inverse_ion_registered_map_ stores the mapping from an external ION buffer to a registered memory handle, it's only populate by RegisterIonMem. I don't quite understand the use case for RegisterCustomMem and PreRegisterCustomMemHandle so I didn't attempt to change the existing registered_map_. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference.

By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend.

Rollback Plan:

Differential Revision: D80267040

sxu added 2 commits August 14, 2025 09:09
…#13373)

Summary:

Hardcode embedding/head dims which are fixed and use -1 for the sequence length dimension when reshaping, this allows us to quantize the graph once and export it to different sequence lengths.

Differential Revision: D80181012
…stration

Summary:
`inverse_ion_registered_map_` stores the mapping from an external ION buffer to a registered memory handle, it's only populate by `RegisterIonMem`. I don't quite understand the use case for `RegisterCustomMem` and `PreRegisterCustomMemHandle` so I didn't attempt to change the existing `registered_map_`. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference.

By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend.

Rollback Plan:

Differential Revision: D80267040
Copy link

pytorch-bot bot commented Aug 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13421

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit ce8fbd3 with merge base 9cfb684 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80267040

Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@shewu-quic
Copy link
Collaborator

Thank you for the PR.
I'm trying to understand your scenario. Are you saying there are multiple graphs, which would mean multiple methods, and that some I/O tensors would like to be shared?
Also, is the external buffer you mentioned not allocated by the QnnExecuTorchAllocCustomMem API?

@@ -31,6 +31,16 @@ Error QnnMemManager::RegisterIonMem(
tensor_wrapper->GetDataType(),
QNN_MEM_TYPE_ION,
{{mem_fd}}};
if (auto it = inverse_ion_registered_map_.find(mem_ptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using mem_ptr as the key is sufficient to determine which handle you need. There could be two tensors that share the same base mem_ptr but have different shapes and data types. Therefore, they shouldn't share the same mem_handle.

I suggest using CustomMemTensorInfo to wrap tensor information as the key. For custom_mem, set it to the same value as tensor_addr, and set pos to zero.

Here's a brief explanation of the custom memory use case: QNN HTP provides several memory types. In Qnn Executorch, we support both QNN_MEM_TYPE_ION and QNN_MEM_TYPE_CUSTOM. The main difference between them is the relationship between the file descriptor and memory handles. You can find more details in the QNN documentation (Tutorials -> Advanced).

image image image

Copy link
Contributor Author

@sxu sxu Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shewu-quic Sorry for the late reply.

Are you saying there are multiple graphs, which would mean multiple methods, and that some I/O tensors would like to be shared?

Yes.

Also, is the external buffer you mentioned not allocated by the QnnExecuTorchAllocCustomMem API?

No. ION and mapping same physical memory on both CPU and HTP is not specific to QNN, it can be done directly using rpcmem, and AFAIK once that's done the FastRPC driver will automatically skip copying the data in those buffers. Does QNN's registration bring any additional benefits? ION allocation can happen outside of the immediate surrounding of model inference code, it's not feasible to make every allocation site to depend on QNN/Executorch if there's a remote chance the allocated memory will be used for inference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does QNN's registration bring any additional benefits?

We also allocate using rpcmem. You can use QnnExecuTorchAllocCustomMem and QnnExecuTorchFreeCustomMem to manage ION buffer. For QNN, we need to clarify whether this address refers to an ION buffer or a client buffer. Therefore, we will complete the information for Qnn_MemDescriptor_t and register the buffer with QNN.

Can I check don't you return here regarding external ION allocation? In the original design, we assumed the user would use QNN executorch to allocate the shared buffer. We will make sure this shared buffer is provided by the rpc lib, not from other sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants