Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) #2234

IlyasMoutawwakil · 2025-04-19T14:42:20Z

What does this PR do?

This PR tries to further extend the io binding feature to multi part pipelines and enables it more generally through two classes:

ORTSessionMixin: should be used whenever a class has an underlying inference session, it enables inference with and without io/binding, and handles the torch/onnxruntime interface like provider/device/dtype.
ORTSessionsWrapper: should be used when a class has many models/parts (like encoder-decoders or diffusion pipelines), it usinfies the interface to these parts without compromising flexibility, for example one on can set pipeline.use_io_binding to True/False which will set it to all parts, but can also target a specific part pipeline.unet.use_io_binding differently. Parts can also be on different devices.

I also added a special feature to diffusion pipelines and especially Unet/Tranformers component so that it reuses the same input tensor as output buffer (instead of creating a new one), this makes the diffusion operation more of an in-place operation, which should make it faster (theoretically), or at least uses less memory.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

…o binding as a result

HuggingFaceDocBuilderDev · 2025-04-19T15:06:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

… model)

…nt across

…ConditionalGeneration with positional arguments.

…timum into ort-session-mixin

Copilot

Pull Request Overview

This PR extends the IO binding functionality for multi‐part pipelines by introducing new mixin classes (ORTSessionMixin and ORTSessionsWrapper) and refactoring existing tests and utility functions for both diffusion and non‐diffusion workflows.

Updates tests for onnxruntime modeling and diffusion pipelines to use updated function names and parameters.
Refactors utility and exporter modules to adopt new patterns for provider and session handling, and removes deprecated or redundant code.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/onnxruntime/test_modeling.py	Removed deprecated parameters and updated assertions to reflect changes in the IO binding interface and model naming.
tests/onnxruntime/test_diffusion.py	Renamed helper functions and added tests for IO binding functionality with diffusion pipelines.
optimum/utils/constant.py	Added and reorganized constants used across the codebase.
optimum/onnxruntime/utils.py	Refactored provider handling and added a new helper for inferring a session’s dtype.
optimum/onnxruntime/modeling_decoder.py	Updated deprecated code patterns to use instance configuration attributes consistently and improved IO binding calls.
Other files (e.g., optimization.py, exporters tasks, workflows)	Similar refactoring to align with the new multi-part pipeline design and clean up unused code.

Comments suppressed due to low confidence (3)

tests/onnxruntime/test_modeling.py:140

The 'use_io_binding' parameter has been removed in favor of a default behavior. Please ensure that this API change is clearly documented in the migration guide to avoid confusion among users.

model = model_cls.from_pretrained(self.LOCAL_MODEL_PATH, use_cache=False, use_io_binding=False)

optimum/onnxruntime/modeling_decoder.py:249

Updating the condition to use 'self.config' instead of relying on a local 'config' variable improves maintainability. Confirm that similar updates are applied consistently throughout the file.

if self.config.model_type != "gpt_bigcode":

optimum/onnxruntime/utils.py:446

[nitpick] Consider adding tests and a detailed docstring example for 'get_dtype_from_session' to verify that the correct torch dtype is returned based on the ONNX input/output types.

def get_dtype_from_session(session: ort.InferenceSession) -> torch.dtype:

tests/onnxruntime/test_diffusion.py

echarlaix

LGTM! Thanks a lot for the great PR @IlyasMoutawwakil 🔥

optimum/exporters/onnx/model_configs.py

echarlaix · 2025-05-16T13:30:52Z

optimum/onnxruntime/base.py

+        self.output_dtypes = {output.name: output.type for output in session.get_outputs()}
+
+        self.model_path = Path(session._model_path)
+        self.model_name = self.model_path.name


yes talking about model_name which seems to be used in the tests (and can be easily extracted from model_path) so we can remove imo, not specific to this PR though so feel free to close!

optimum/onnxruntime/modeling_decoder.py

optimum/onnxruntime/modeling_diffusion.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

…timum into ort-session-mixin

… cuda device, with specific dtype, etc)

IlyasMoutawwakil added 4 commits April 19, 2025 16:41

refactor ort session mixin and enable integral and simple diffusers i…

6be3f36

…o binding as a result

style

2216aa7

distribute onnxruntime tests

95a229e

no need to clean disk for fast tests

fd7485b

IlyasMoutawwakil added 9 commits April 20, 2025 00:39

move diffusion tests to diffusion

10b46c7

fix

c9be99a

test

4ff11be

providers

b902a8b

fix

5167db6

fix and get rid of io bindign helpers

5fbf8db

get rid of the models folder in onnxruntime

65559b3

style

20dcd4e

comments

b78dedc

IlyasMoutawwakil changed the title ~~Refactor ort session mixin and enable integral and simple diffusers io binding~~ Refactor ort session mixin and enable general io binding (works for diffusers) Apr 29, 2025

IlyasMoutawwakil added 15 commits April 29, 2025 12:35

remove _from_transformers

301cb3a

fix trust remote code

341d632

fix local model test (use_cache only should be pased to causal lm)

37ab2b0

reuse input buffers as output buffers in diffusion models

fbdfd32

fix sess_option

b4f90b9

override libarary for sentence transformers feature etraction

a1fd94a

fix no cross attention

421a8d4

remove lib and default to transformers

1e09fe5

better

25370ae

flaky decoder

0558d4c

add saving session utils

a9c4b40

test

14360d0

more refactoring attempt (seperating encoder/decode parts deom parent…

471deba

… model)

remove print

5c9f0db

fix

ea00156

IlyasMoutawwakil added 3 commits May 9, 2025 12:15

review suggestions and added warning when properties are not consiste…

972a2fd

…nt across

ORTParentMixin

9e04b44

deprecate instantiating ORTModel, ORTModelForCausalLM and ORTModelFor…

3d7b976

…ConditionalGeneration with positional arguments.

IlyasMoutawwakil force-pushed the ort-session-mixin branch from 420f59c to 3d7b976 Compare May 9, 2025 11:09

IlyasMoutawwakil requested a review from echarlaix May 9, 2025 16:17

IlyasMoutawwakil and others added 4 commits May 16, 2025 10:42

Merge branch 'main' into ort-session-mixin

6e83ee7

keyword arguments

010d55a

Merge branch 'ort-session-mixin' of https://github.com/huggingface/op…

f698088

…timum into ort-session-mixin

diffusion

beeae70

IlyasMoutawwakil requested a review from Copilot May 16, 2025 09:37

Copilot AI reviewed May 16, 2025

View reviewed changes

tests/onnxruntime/test_diffusion.py Show resolved Hide resolved

IlyasMoutawwakil added 6 commits May 16, 2025 12:47

known output buffers

4dd6b3a

style

130f14c

typo

5ae9701

id

55ce757

fix

d7d18b1

more typos

2a672bc

echarlaix approved these changes May 16, 2025

View reviewed changes

IlyasMoutawwakil and others added 10 commits May 16, 2025 19:20

Update optimum/onnxruntime/modeling_diffusion.py

40763db

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

remove task argument from _export

50f6159

Merge branch 'ort-session-mixin' of https://github.com/huggingface/op…

7d16e6e

…timum into ort-session-mixin

deprecate and fix

c90b7b4

fix

a6ea977

Merge branch 'main' into ort-session-mixin

d5d049b

style

895866a

slim later

52ebd45

misc fixes and extensions in testing

e097bc4

allow passing export arguments to diffusion pipeline (e.g. exports on…

df39daf

… cuda device, with specific dtype, etc)

IlyasMoutawwakil merged commit 16618fc into main May 18, 2025
32 of 33 checks passed

IlyasMoutawwakil deleted the ort-session-mixin branch May 18, 2025 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) #2234

Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) #2234

Uh oh!

IlyasMoutawwakil commented Apr 19, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

echarlaix left a comment

Uh oh!

Uh oh!

echarlaix May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) #2234

Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) #2234

Uh oh!

Conversation

IlyasMoutawwakil commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

echarlaix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

echarlaix May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil commented Apr 19, 2025 •

edited

Loading