Skip to content

[megatron] support export lora to_mcore #5445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 19, 2025

Conversation

Jintao-Huang
Copy link
Collaborator

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the capability to export LoRA (Low-Rank Adaptation) models, specifically those loaded as Megatron Core adapters, into a full Megatron Core model format. This enhances the model conversion utilities by allowing users to consolidate fine-tuned LoRA weights into a base Megatron Core model for deployment or further use. Additionally, it includes several robustness improvements and minor code cleanups across the codebase.

Highlights

  • LoRA to Megatron Core Export: The convert_mcore2hf function has been extended to support saving the Megatron Core model when mcore_adapters are present and the to_mcore flag is set, effectively enabling the export of LoRA-adapted models to the Megatron Core format.
  • Conditional Model Preparation: The prepare_model_template function now conditionally applies model adaptations, improving robustness by preventing operations on a None model.
  • Megatron Core Debugging Enhancement: The repr method for TELinear (Transformer Engine Linear) modules has been patched to include Tensor Parallelism (TP) size, aiding in debugging and inspection of Megatron Core models.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for exporting LoRA adapters to the mcore format, which involves merging LoRA weights with the base model and saving the result as a new mcore model. The changes are primarily in swift/megatron/utils/convert.py to handle this new export path, with related adjustments in swift/llm/export/export.py to trigger the new functionality and in swift/llm/infer/utils.py to accommodate model loading scenarios. My review focuses on the clarity of the new logic and potential issues with handling concurrent export flags.

Comment on lines +226 to +250
if args.to_hf:
megatron_model_meta.convert_mcore2hf(hf_model, mg_model)
if args.test_convert_precision:
test_convert_precision(hf_model, mg_model, template)
del mg_model
logger.info('Successfully transferred MG model weights to HF model.')
ckpt_dir = megatron_args.load if megatron_args.adapter_load is None else megatron_args.adapter_load
save_checkpoint(
hf_model,
processor,
args.output_dir,
safe_serialization=args.safe_serialization,
model_dirs=[ckpt_dir, args.model_dir],
max_shard_size=args.max_shard_size,
additional_saved_files=hf_model.model_meta.additional_saved_files)
logger.info(f'Successfully saved HF model weights in `{args.output_dir}`.')
elif args.to_mcore:
if args.thread_count is None:
checkpoint_size = sum(get_n_params_grads(mg_model)[0]) * torch.finfo(args.torch_dtype).bits // 8e9
args.thread_count = max(math.ceil(checkpoint_size / 10), 2) # 10GB
patch_torch_dist_shard(args.thread_count)

args.save_args()
mg_save_checkpoint(1, [mg_model], None, None, 0)
logger.info(f'Successfully saved Megatron model weights in `{args.output_dir}`.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of elif args.to_mcore: makes the to_hf and to_mcore operations mutually exclusive. If a user provides both --to_hf and --to_mcore flags, only the to_hf operation will be performed. This might not be the intended behavior. If both operations should be supported in a single run, this elif should be an if.

However, changing elif to if will cause a NameError because del mg_model is inside the if args.to_hf: block.

To support both flags concurrently, you could refactor this section to perform both conversions if requested, and then delete the model as suggested below.

    if args.to_hf:
        megatron_model_meta.convert_mcore2hf(hf_model, mg_model)
        if args.test_convert_precision:
            test_convert_precision(hf_model, mg_model, template)
        logger.info('Successfully transferred MG model weights to HF model.')
        ckpt_dir = megatron_args.load if megatron_args.adapter_load is None else megatron_args.adapter_load
        save_checkpoint(
            hf_model,
            processor,
            args.output_dir,
            safe_serialization=args.safe_serialization,
            model_dirs=[ckpt_dir, args.model_dir],
            max_shard_size=args.max_shard_size,
            additional_saved_files=hf_model.model_meta.additional_saved_files)
        logger.info(f'Successfully saved HF model weights in `{args.output_dir}`.')
    if args.to_mcore:
        if args.thread_count is None:
            checkpoint_size = sum(get_n_params_grads(mg_model)[0]) * torch.finfo(args.torch_dtype).bits // 8e9
            args.thread_count = max(math.ceil(checkpoint_size / 10), 2)  # 10GB
        patch_torch_dist_shard(args.thread_count)

        args.save_args()
        mg_save_checkpoint(1, [mg_model], None, None, 0)
        logger.info(f'Successfully saved Megatron model weights in `{args.output_dir}`.')
    del mg_model

@@ -32,12 +32,12 @@ def run(self):
export_to_ollama(args)
elif args.to_cached_dataset:
export_cached_dataset(args)
elif args.to_hf or args.mcore_adapters and args.to_mcore:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The boolean logic args.to_hf or args.mcore_adapters and args.to_mcore can be hard to read due to operator precedence. While and has higher precedence than or, adding parentheses would make the intent much clearer and prevent potential misinterpretations.

Suggested change
elif args.to_hf or args.mcore_adapters and args.to_mcore:
elif args.to_hf or (args.mcore_adapters and args.to_mcore):

@Jintao-Huang Jintao-Huang force-pushed the support_lora_to_mcore branch from e510a6b to 997cd2c Compare August 19, 2025 02:46
@Jintao-Huang Jintao-Huang merged commit 9a6ac83 into modelscope:main Aug 19, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants