Skip to content

enable auto-round quantization model #6226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

WeiweiZhang1
Copy link

@WeiweiZhang1 WeiweiZhang1 commented May 12, 2025

This pr is to support models quantized by AutoRound github paper,

AutoRound delivers significantly higher accuracy at extremely low bit-widths (e.g., 2-bit) and offers broader compatibility across models (LLMs and VLMs), quantization formats, and configurations. You can check out our github/paper or this blog post.

AutoRound has been integrated into vllm, pytorch/ao and Hugging Face Transformers. Several Hugging Face Spaces offer models quantized with AutoRound, including OPEA, Kaitchup, and fbaldassarri.

Known issues
Mixed bits support is limited
Mixed-bit quantization is currently limited. Since vLLM fuses layers (e.g., QKV), applying different bit-widths to components within the same fused layer can lead to incompatibility issues.

Quantized MOE model support is limited
Qwen3-30B-A3B: accuracy is close to zero, for gptq format has the 'Capture CUDA graph failed: Apply router weight on input is not supported forfused Marlin MoE method' issue, while for awq format , sym quant reports 'KeyError: 'model.layers.13.mlp.experts.w2_qzeros'', and asym also has accuracy close to zero problem.

deepseek-moe-16b-base: ’ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size‘ , Same issues are exists for awq and gptq

Quantized vlms support is limited
Qwen2.5-VL-7B : auto_round:auto_gptq format accuracy is close to zero. gptq model has the ‘The output size is not aligned with the quantized weight shape’ issue. auto_round:auto_awq and awq format are fine

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@WeiweiZhang1 WeiweiZhang1 marked this pull request as draft May 13, 2025 01:33
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@WeiweiZhang1 WeiweiZhang1 marked this pull request as ready for review May 21, 2025 11:20
@WeiweiZhang1 WeiweiZhang1 requested a review from BBuf as a code owner May 21, 2025 11:20
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@WeiweiZhang1
Copy link
Author

please kindly have a review when you are free

@WeiweiZhang1
Copy link
Author

How can I start the test CI process? BTW please kindly have a review when you are free. Thanks!
@BBuf @HaiShaw @Ying1123 @ch-wan @ispobock @merrymercy @zhyncs

@mingfeima mingfeima added the intel label Jun 3, 2025
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@WeiweiZhang1
Copy link
Author

Thank you for your thorough review. I've updated the code based on your comments. Any additional feedback or suggestions? @AniZpZ

WeiweiZhang1 and others added 2 commits July 11, 2025 03:01
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
@AniZpZ
Copy link
Collaborator

AniZpZ commented Jul 11, 2025

Thank you for your thorough review. I've updated the code based on your comments. Any additional feedback or suggestions? @AniZpZ

no more concerns from me

@wenhuach21
Copy link

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR?
We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.

Thank you in advance!

@AniZpZ
Copy link
Collaborator

AniZpZ commented Jul 14, 2025

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR? We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.

Thank you in advance!

I think it is ok. please fix the lint introduced by resloving the conflicts

@WeiweiZhang1
Copy link
Author

@AniZpZ Hi, would it be possible for you to take a look at the unit test failures and help identify whether any of them are related to this PR? We're not very familiar with SGlang, and I noticed similar failures have also appeared in other PRs.
Thank you in advance!

I think it is ok. please fix the lint introduced by resloving the conflicts

Great! I've addressed the lint issue

@wenhuach21
Copy link

@AniZpZ @yinfan98 @zhyncs Thank you for your kind review and support. If there are any remaining issues, please let us know. Otherwise, could you kindly help with the merge? Thanks!

@WeiweiZhang1
Copy link
Author

@AniZpZ @zhyncs If there are any remaining issues, please let me know. Otherwise, could you help with the merge? TKS

AniZpZ and others added 5 commits July 25, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants