-
Notifications
You must be signed in to change notification settings - Fork 832
[megatron] support export lora to_mcore #5445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[megatron] support export lora to_mcore #5445
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces the capability to export LoRA (Low-Rank Adaptation) models, specifically those loaded as Megatron Core adapters, into a full Megatron Core model format. This enhances the model conversion utilities by allowing users to consolidate fine-tuned LoRA weights into a base Megatron Core model for deployment or further use. Additionally, it includes several robustness improvements and minor code cleanups across the codebase.
Highlights
- LoRA to Megatron Core Export: The convert_mcore2hf function has been extended to support saving the Megatron Core model when mcore_adapters are present and the to_mcore flag is set, effectively enabling the export of LoRA-adapted models to the Megatron Core format.
- Conditional Model Preparation: The prepare_model_template function now conditionally applies model adaptations, improving robustness by preventing operations on a None model.
- Megatron Core Debugging Enhancement: The repr method for TELinear (Transformer Engine Linear) modules has been patched to include Tensor Parallelism (TP) size, aiding in debugging and inspection of Megatron Core models.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for exporting LoRA adapters to the mcore format, which involves merging LoRA weights with the base model and saving the result as a new mcore model. The changes are primarily in swift/megatron/utils/convert.py
to handle this new export path, with related adjustments in swift/llm/export/export.py
to trigger the new functionality and in swift/llm/infer/utils.py
to accommodate model loading scenarios. My review focuses on the clarity of the new logic and potential issues with handling concurrent export flags.
if args.to_hf: | ||
megatron_model_meta.convert_mcore2hf(hf_model, mg_model) | ||
if args.test_convert_precision: | ||
test_convert_precision(hf_model, mg_model, template) | ||
del mg_model | ||
logger.info('Successfully transferred MG model weights to HF model.') | ||
ckpt_dir = megatron_args.load if megatron_args.adapter_load is None else megatron_args.adapter_load | ||
save_checkpoint( | ||
hf_model, | ||
processor, | ||
args.output_dir, | ||
safe_serialization=args.safe_serialization, | ||
model_dirs=[ckpt_dir, args.model_dir], | ||
max_shard_size=args.max_shard_size, | ||
additional_saved_files=hf_model.model_meta.additional_saved_files) | ||
logger.info(f'Successfully saved HF model weights in `{args.output_dir}`.') | ||
elif args.to_mcore: | ||
if args.thread_count is None: | ||
checkpoint_size = sum(get_n_params_grads(mg_model)[0]) * torch.finfo(args.torch_dtype).bits // 8e9 | ||
args.thread_count = max(math.ceil(checkpoint_size / 10), 2) # 10GB | ||
patch_torch_dist_shard(args.thread_count) | ||
|
||
args.save_args() | ||
mg_save_checkpoint(1, [mg_model], None, None, 0) | ||
logger.info(f'Successfully saved Megatron model weights in `{args.output_dir}`.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of elif args.to_mcore:
makes the to_hf
and to_mcore
operations mutually exclusive. If a user provides both --to_hf
and --to_mcore
flags, only the to_hf
operation will be performed. This might not be the intended behavior. If both operations should be supported in a single run, this elif
should be an if
.
However, changing elif
to if
will cause a NameError
because del mg_model
is inside the if args.to_hf:
block.
To support both flags concurrently, you could refactor this section to perform both conversions if requested, and then delete the model as suggested below.
if args.to_hf:
megatron_model_meta.convert_mcore2hf(hf_model, mg_model)
if args.test_convert_precision:
test_convert_precision(hf_model, mg_model, template)
logger.info('Successfully transferred MG model weights to HF model.')
ckpt_dir = megatron_args.load if megatron_args.adapter_load is None else megatron_args.adapter_load
save_checkpoint(
hf_model,
processor,
args.output_dir,
safe_serialization=args.safe_serialization,
model_dirs=[ckpt_dir, args.model_dir],
max_shard_size=args.max_shard_size,
additional_saved_files=hf_model.model_meta.additional_saved_files)
logger.info(f'Successfully saved HF model weights in `{args.output_dir}`.')
if args.to_mcore:
if args.thread_count is None:
checkpoint_size = sum(get_n_params_grads(mg_model)[0]) * torch.finfo(args.torch_dtype).bits // 8e9
args.thread_count = max(math.ceil(checkpoint_size / 10), 2) # 10GB
patch_torch_dist_shard(args.thread_count)
args.save_args()
mg_save_checkpoint(1, [mg_model], None, None, 0)
logger.info(f'Successfully saved Megatron model weights in `{args.output_dir}`.')
del mg_model
@@ -32,12 +32,12 @@ def run(self): | |||
export_to_ollama(args) | |||
elif args.to_cached_dataset: | |||
export_cached_dataset(args) | |||
elif args.to_hf or args.mcore_adapters and args.to_mcore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boolean logic args.to_hf or args.mcore_adapters and args.to_mcore
can be hard to read due to operator precedence. While and
has higher precedence than or
, adding parentheses would make the intent much clearer and prevent potential misinterpretations.
elif args.to_hf or args.mcore_adapters and args.to_mcore: | |
elif args.to_hf or (args.mcore_adapters and args.to_mcore): |
e510a6b
to
997cd2c
Compare
No description provided.