enable auto-round quantization model #6226

WeiweiZhang1 · 2025-05-12T10:05:42Z

This pr is to support models quantized by AutoRound github paper,

AutoRound delivers significantly higher accuracy at extremely low bit-widths (e.g., 2-bit) and offers broader compatibility across models (LLMs and VLMs), quantization formats, and configurations. You can check out our github/paper or this blog post.

AutoRound has been integrated into vllm, pytorch/ao and Hugging Face Transformers. Several Hugging Face Spaces offer models quantized with AutoRound, including OPEA, Kaitchup, and fbaldassarri.

Known issues
Mixed bits support is limited
Mixed-bit quantization is currently limited. Since vLLM fuses layers (e.g., QKV), applying different bit-widths to components within the same fused layer can lead to incompatibility issues.

Quantized MOE model support is limited
Qwen3-30B-A3B: accuracy is close to zero, for gptq format has the 'Capture CUDA graph failed: Apply router weight on input is not supported forfused Marlin MoE method' issue, while for awq format , sym quant reports 'KeyError: 'model.layers.13.mlp.experts.w2_qzeros'', and asym also has accuracy close to zero problem.

deepseek-moe-16b-base: ’ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size‘ , Same issues are exists for awq and gptq

Quantized vlms support is limited
Qwen2.5-VL-7B : auto_round:auto_gptq format accuracy is close to zero. gptq model has the ‘The output size is not aligned with the quantized weight shape’ issue. auto_round:auto_awq and awq format are fine

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

…com/WeiweiZhang1/sglang into enable_autoround_quantization_model

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 · 2025-05-23T04:29:36Z

please kindly have a review when you are free

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 · 2025-05-28T02:59:07Z

How can I start the test CI process? BTW please kindly have a review when you are free. Thanks!
@BBuf @HaiShaw @Ying1123 @ch-wan @ispobock @merrymercy @zhyncs

docs/backend/quantization.md

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 · 2025-07-11T03:34:35Z

Thank you for your thorough review. I've updated the code based on your comments. Any additional feedback or suggestions? @AniZpZ

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

AniZpZ · 2025-07-11T08:00:41Z

Thank you for your thorough review. I've updated the code based on your comments. Any additional feedback or suggestions? @AniZpZ

no more concerns from me

wenhuach21 · 2025-07-11T10:13:01Z

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR?
We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.

Thank you in advance!

AniZpZ · 2025-07-14T09:51:19Z

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR? We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.

Thank you in advance!

I think it is ok. please fix the lint introduced by resloving the conflicts

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 · 2025-07-14T10:59:13Z

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR? We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.
Thank you in advance!

I think it is ok. please fix the lint introduced by resloving the conflicts

Great! I've addressed the lint issue

wenhuach21 · 2025-07-15T03:53:12Z

@AniZpZ @yinfan98 @zhyncs Thank you for your kind review and support. If there are any remaining issues, please let us know. Otherwise, could you kindly help with the merge? Thanks!

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 · 2025-07-25T01:40:42Z

@AniZpZ @zhyncs If there are any remaining issues, please let me know. Otherwise, could you help with the merge? TKS

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

…able_autoround_quantization_model

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

enable auto-round quantization model

d7d44a1

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 requested review from merrymercy, Ying1123, zhyncs, ispobock, HaiShaw and ch-wan as code owners May 12, 2025 10:05

WeiweiZhang1 marked this pull request as draft May 13, 2025 01:33

WeiweiZhang1 added 4 commits May 13, 2025 22:05

fix lm_head quantization

3892b82

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

rm ipex format, fixtypo

c04a8f6

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

c9b128a

Merge branch 'enable_autoround_quantization_model' of https://github.…

c574721

…com/WeiweiZhang1/sglang into enable_autoround_quantization_model

WeiweiZhang1 marked this pull request as ready for review May 21, 2025 11:20

WeiweiZhang1 requested a review from BBuf as a code owner May 21, 2025 11:20

WeiweiZhang1 added 9 commits May 21, 2025 23:29

fix UT, rm comments

8e73dd7

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

db494a2

Merge branch 'main' into enable_autoround_quantization_model

527cefc

fix import bug

8f05342

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

fix linting issue

648073d

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

f5c59d5

fix FusedMoE bug

8de7412

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

7495d9f

fix moe issue

25f281e

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 added 3 commits May 23, 2025 12:31

Merge branch 'main' into enable_autoround_quantization_model

9310550

reifne name matching

3786683

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

c8cb726

mingfeima added the intel label Jun 3, 2025

Merge branch 'main' into enable_autoround_quantization_model

a436d53

wenhuach21 reviewed Jul 11, 2025

View reviewed changes

docs/backend/quantization.md Outdated Show resolved Hide resolved

WeiweiZhang1 added 3 commits July 10, 2025 22:30

refine doc

9192515

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

add reference

466941f

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

851519f

WeiweiZhang1 and others added 2 commits July 11, 2025 03:01

fix Lint issue

455e723

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

412c8d9

AniZpZ approved these changes Jul 11, 2025

View reviewed changes

AniZpZ added 2 commits July 12, 2025 21:53

Merge branch 'main' into enable_autoround_quantization_model

555b0da

Merge branch 'main' into enable_autoround_quantization_model

f28dc81

WeiweiZhang1 added 2 commits July 14, 2025 18:54

Merge branch 'main' into enable_autoround_quantization_model

71d91cf

fix Lint introduced by resloving the conflicts

7e077b7

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

WeiweiZhang1 added 3 commits July 18, 2025 16:56

Merge branch 'main' into enable_autoround_quantization_model

8999580

Merge branch 'main' into enable_autoround_quantization_model

754d8ee

fix lint

b7e670b

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

AniZpZ and others added 5 commits July 25, 2025 10:49

Merge branch 'main' into enable_autoround_quantization_model

334cd19

fix conflicts, merge main

0851037

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Remove workflow file from branch (not needed for PR)

88f9562

fix typo

75df79a

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' of https://github.com/WeiweiZhang1/sglang into en…

62b0d51

…able_autoround_quantization_model

WeiweiZhang1 requested review from kushanam and Edwardf0t1 as code owners August 20, 2025 02:44

WeiweiZhang1 added 2 commits August 20, 2025 09:40

rm vllm import from auto_round

f01e5e0

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

Merge branch 'main' into enable_autoround_quantization_model

7c32c2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable auto-round quantization model #6226

enable auto-round quantization model #6226

WeiweiZhang1 commented May 12, 2025 •

edited

Loading

Uh oh!

WeiweiZhang1 commented May 23, 2025

Uh oh!

WeiweiZhang1 commented May 28, 2025

Uh oh!

Uh oh!

WeiweiZhang1 commented Jul 11, 2025

Uh oh!

AniZpZ commented Jul 11, 2025

Uh oh!

wenhuach21 commented Jul 11, 2025

Uh oh!

AniZpZ commented Jul 14, 2025

Uh oh!

WeiweiZhang1 commented Jul 14, 2025

Uh oh!

wenhuach21 commented Jul 15, 2025

Uh oh!

WeiweiZhang1 commented Jul 25, 2025

Uh oh!

Uh oh!

enable auto-round quantization model #6226

Are you sure you want to change the base?

enable auto-round quantization model #6226

Conversation

WeiweiZhang1 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WeiweiZhang1 commented May 23, 2025

Uh oh!

WeiweiZhang1 commented May 28, 2025

Uh oh!

Uh oh!

WeiweiZhang1 commented Jul 11, 2025

Uh oh!

AniZpZ commented Jul 11, 2025

Uh oh!

wenhuach21 commented Jul 11, 2025

Uh oh!

AniZpZ commented Jul 14, 2025

Uh oh!

WeiweiZhang1 commented Jul 14, 2025

Uh oh!

wenhuach21 commented Jul 15, 2025

Uh oh!

WeiweiZhang1 commented Jul 25, 2025

Uh oh!

Uh oh!

WeiweiZhang1 commented May 12, 2025 •

edited

Loading