Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

sxu · 2025-08-14T16:09:37Z

Summary:
inverse_ion_registered_map_ stores the mapping from an external ION buffer to a registered memory handle, it's only populate by RegisterIonMem. I don't quite understand the use case for RegisterCustomMem and PreRegisterCustomMemHandle so I didn't attempt to change the existing registered_map_. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference.

By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend.

Rollback Plan:

Differential Revision: D80267040

…#13373) Summary: Hardcode embedding/head dims which are fixed and use -1 for the sequence length dimension when reshaping, this allows us to quantize the graph once and export it to different sequence lengths. Differential Revision: D80181012

…stration Summary: `inverse_ion_registered_map_` stores the mapping from an external ION buffer to a registered memory handle, it's only populate by `RegisterIonMem`. I don't quite understand the use case for `RegisterCustomMem` and `PreRegisterCustomMemHandle` so I didn't attempt to change the existing `registered_map_`. The goal is to support a same external ION buffer to be used for inputs of different graphs. It will be registered for the first graph and the same handle will be used for the other graphs. Without this duplicated registration errors will be logged on every inference. By external ION buffer we I'm referring to buffers allocated outside of the immediate surrounding of the inference code not using any helpers from the QNN backend. Rollback Plan: Differential Revision: D80267040

pytorch-bot · 2025-08-14T16:09:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13421

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit ce8fbd3 with merge base 9cfb684 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / macos / macos-job (gh) (similar failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-buck / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-08-14T16:09:48Z

This pull request was exported from Phabricator. Differential Revision: D80267040

github-actions · 2025-08-14T16:10:26Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

shewu-quic · 2025-08-15T02:41:09Z

Thank you for the PR.
I'm trying to understand your scenario. Are you saying there are multiple graphs, which would mean multiple methods, and that some I/O tensors would like to be shared?
Also, is the external buffer you mentioned not allocated by the QnnExecuTorchAllocCustomMem API?

shewu-quic · 2025-08-15T02:59:46Z

backends/qualcomm/runtime/backends/QnnMemManager.cpp

@@ -31,6 +31,16 @@ Error QnnMemManager::RegisterIonMem(
      tensor_wrapper->GetDataType(),
      QNN_MEM_TYPE_ION,
      {{mem_fd}}};
+  if (auto it = inverse_ion_registered_map_.find(mem_ptr);


I don't think using mem_ptr as the key is sufficient to determine which handle you need. There could be two tensors that share the same base mem_ptr but have different shapes and data types. Therefore, they shouldn't share the same mem_handle.

I suggest using CustomMemTensorInfo to wrap tensor information as the key. For custom_mem, set it to the same value as tensor_addr, and set pos to zero.

Here's a brief explanation of the custom memory use case: QNN HTP provides several memory types. In Qnn Executorch, we support both QNN_MEM_TYPE_ION and QNN_MEM_TYPE_CUSTOM. The main difference between them is the relationship between the file descriptor and memory handles. You can find more details in the QNN documentation (Tutorials -> Advanced).

@shewu-quic Sorry for the late reply.

Are you saying there are multiple graphs, which would mean multiple methods, and that some I/O tensors would like to be shared?

Yes.

Also, is the external buffer you mentioned not allocated by the QnnExecuTorchAllocCustomMem API?

No. ION and mapping same physical memory on both CPU and HTP is not specific to QNN, it can be done directly using rpcmem, and AFAIK once that's done the FastRPC driver will automatically skip copying the data in those buffers. Does QNN's registration bring any additional benefits? ION allocation can happen outside of the immediate surrounding of model inference code, it's not feasible to make every allocation site to depend on QNN/Executorch if there's a remote chance the allocated memory will be used for inference.

Does QNN's registration bring any additional benefits?

We also allocate using rpcmem. You can use QnnExecuTorchAllocCustomMem and QnnExecuTorchFreeCustomMem to manage ION buffer. For QNN, we need to clarify whether this address refers to an ION buffer or a client buffer. Therefore, we will complete the information for Qnn_MemDescriptor_t and register the buffer with QNN.

Can I check don't you return here regarding external ION allocation? In the original design, we assumed the user would use QNN executorch to allocate the shared buffer. We will make sure this shared buffer is provided by the rpc lib, not from other sources.

sxu added 2 commits August 14, 2025 09:09

sxu requested review from lucylq, jackzhxng and cccclai as code owners August 14, 2025 16:09

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2025

facebook-github-bot added the fb-exported label Aug 14, 2025

sxu mentioned this pull request Aug 14, 2025

Qualcomm AI Engine Direct - Fix mem_handel register twice issue #13410

Open

sxu requested a review from shewu-quic August 14, 2025 16:19

sxu added the topic: not user facing label Aug 14, 2025

shewu-quic reviewed Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

Uh oh!

sxu commented Aug 14, 2025

Uh oh!

pytorch-bot bot commented Aug 14, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

shewu-quic commented Aug 15, 2025

Uh oh!

shewu-quic Aug 15, 2025

Uh oh!

sxu Aug 20, 2025 •

edited

Loading

Uh oh!

shewu-quic Aug 20, 2025

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

Are you sure you want to change the base?

Qualcomm AI Engine Direct - Avoid duplicated external ION memory registration #13421

Uh oh!

Conversation

sxu commented Aug 14, 2025

Uh oh!

pytorch-bot bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13421

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

facebook-github-bot commented Aug 14, 2025

Uh oh!

github-actions bot commented Aug 14, 2025

This PR needs a release notes: label

Uh oh!

shewu-quic commented Aug 15, 2025

Uh oh!

shewu-quic Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

sxu Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shewu-quic Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 14, 2025 •

edited

Loading

This PR needs a `release notes:` label

sxu Aug 20, 2025 •

edited

Loading