Pulse · modelscope/ms-swift · GitHub

August 20, 2025 – August 23, 2025

Overview

16 Active pull requests

63 Active issues

1 Release published by 1 person

v3.7.2 Patch release v3.7.2
published Aug 21, 2025

15 Pull requests merged by 3 people

[bugfix] fix from_dict
#5506 merged Aug 23, 2025
fix seed
#5503 merged Aug 22, 2025
support intern-s1
#5500 merged Aug 22, 2025
fix ulysses
#5501 merged Aug 22, 2025
Add seed oss
#5499 merged Aug 22, 2025
[megatron] Support deepseek v3.1
#5498 merged Aug 22, 2025
fix think template prepend nothink_prefix
#5492 merged Aug 22, 2025
[bugfix] fix sp & loss_scale
#5497 merged Aug 22, 2025
[template] refactor extra_kwargs
#5491 merged Aug 22, 2025
[train] support Ovis2.5 padding_free
#5486 merged Aug 21, 2025
[bugfix] fix grpoargs check server_base_url
#5483 merged Aug 21, 2025
support deepseek-V3.1 & add no_think_prefix for hybrid thinking models
#5463 merged Aug 21, 2025
Fix test bugs
#5484 merged Aug 21, 2025
[megatron] Fix ref_adapter_load
#5480 merged Aug 21, 2025
[bugfix] fix megatron load/finetune
#5481 merged Aug 21, 2025

1 Pull request opened by 1 person

[WIP] [megatron] support multimodal model
#5502 opened Aug 22, 2025

49 Issues closed by 7 people

Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 closed Aug 23, 2025
GTX Cards Support?
#4171 closed Aug 23, 2025
可以传入多个loss scale吗
#4175 closed Aug 23, 2025
dpo训练，多机多卡，deepseed会导致显存逐渐增加，最后显存不足，训练出错。
#4191 closed Aug 23, 2025
怎么保存性能最好的几个checkpoint
#4538 closed Aug 22, 2025
qwen2.5-vl grounding任务里同时有分类，是否支持？
#4614 closed Aug 22, 2025
per_device_train_batch_size 变大代码报错
#4858 closed Aug 22, 2025
有没有训练function call的强化学习示例
#5434 closed Aug 22, 2025
多机多卡megatron训练Qwen3-30B-A3B-Instruct-2507
#5458 closed Aug 22, 2025
Megatron-SWIFT训练增加report_to参数支持swanlab
#4212 closed Aug 22, 2025
是否支持微调Janus-pro的text2image
#4231 closed Aug 22, 2025
训练的速度很慢，但是NPU算力利用率大部分处于空闲态
#4233 closed Aug 22, 2025
CorDA on ms-swift
#4239 closed Aug 22, 2025
关于agent-grpo数据集和训练的咨询
#5490 closed Aug 21, 2025
自定义多模态模型如何注册
#5489 closed Aug 21, 2025
请问该如何定义新的数据格式
#5394 closed Aug 21, 2025
如何设置system prompt
#5431 closed Aug 21, 2025
Qwen2.5-VL-3B 在 2卡a100中推理会爆显存
#4471 closed Aug 21, 2025
支持gif图像的训练数据吗
#5475 closed Aug 21, 2025
export qwen2.5-vl-3b 的lora模型存在问题
#4511 closed Aug 21, 2025
huggingface上下载的数据集在finetuning的时候还需要重新下载
#4510 closed Aug 21, 2025
LLama-omni进行audio微调索引报错
#4101 closed Aug 21, 2025
无法单服务器多卡训练
#4334 closed Aug 21, 2025
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍，原因为何？
#4609 closed Aug 21, 2025
vllm不支持微调的qwen2.5-omni模型
#4542 closed Aug 21, 2025
deploy后client无法连接
#4879 closed Aug 21, 2025
How to use existing assistant content as a part of prompt
#4978 closed Aug 21, 2025
How to resample data during training?
#5050 closed Aug 21, 2025
推理时卡住
#5148 closed Aug 21, 2025
reranker 0.6B训练每到验证环境出现卡住不动的情况
#5254 closed Aug 21, 2025
obb数据集应该怎么微调大模型
#5251 closed Aug 21, 2025
examples/infer/demo_lora.py vllm后端执行报错
#5252 closed Aug 21, 2025
多模态训练加速支持
#5263 closed Aug 21, 2025
Qwen2.5-VL-7B fp8量化报错
#5286 closed Aug 21, 2025
packing遇到超长的应该打印一些信息出来，这里静默处理，啥也不知道不太好
#5300 closed Aug 21, 2025
How to customize plugins externally and how to use them? 如何在外部定义plugin以及如何使用plugin
#5304 closed Aug 21, 2025
Does ms-swift support GRPO fine tuning of gpt oss models and other MOE's
#5460 closed Aug 21, 2025
使用swift eval配合vllm 0.10版本报错
#5301 closed Aug 21, 2025
`ms-swift` Is Not Available on Conda Forge
#5376 closed Aug 21, 2025
关于能否快速实现基于qwen2.5_VL衍生模型的自定义设置
#5317 closed Aug 21, 2025
针对序列分类问题，如何修改其默认的交叉熵损失
#5342 closed Aug 21, 2025
The swift script not builds after installing
#5351 closed Aug 21, 2025
Megatron格式数据集
#5349 closed Aug 21, 2025
请问wandb如何自定义项目名、当前的实验名？
#5366 closed Aug 21, 2025
tokenizer是被封装到了engine中吗？
#5380 closed Aug 21, 2025
reward model dataset inference
#4864 closed Aug 21, 2025
Qwen2.5-omni GRPO训练出现内存OOM
#4739 closed Aug 21, 2025
kto训练后，使用lora和merge权重后预测效果差异很大
#4215 closed Aug 21, 2025
训练时建议打印一些参数信息
#4216 closed Aug 21, 2025

14 Issues opened by 13 people

qwen3 sft的时候的think
#5507 opened Aug 23, 2025
单机多卡训练卡死，每次卡的位置都一样
#5505 opened Aug 23, 2025
可以提供一个用于数学推理的SFT训练模版吗
#5504 opened Aug 22, 2025
Update Wechat QR Code plz
#5496 opened Aug 22, 2025
修改InternVL图像num_image_token后多卡训练速度异常
#5495 opened Aug 22, 2025
预训练自定义模型
#5494 opened Aug 22, 2025
grpo没有办法在output中保存images的文件夹。
#5493 opened Aug 22, 2025
建议引入通义实验室Trinity-RFT团队提出的CHORD框架
#5488 opened Aug 21, 2025
有没有大神可以解释一下grpo——internal下的工程原理啊
#5487 opened Aug 21, 2025
train_type为full时冻结llm失败问题
#5485 opened Aug 21, 2025
report_to 参数设置
#5482 opened Aug 21, 2025
是否有计划支持Voxtral系列模型的微调？
#5479 opened Aug 21, 2025
--packing在SFT训练中是否会切断语料，破坏语料训练的上下文完整性。
#5478 opened Aug 21, 2025
infonce loss 在 world_size > 1 时的计算逻辑
#5477 opened Aug 21, 2025

20 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[model] support glm-4.5 agent
#5305 commented on Aug 22, 2025 • 1 new comment
[BREAKING] Refactor Scheduler and GRPOTrainer for Flexible Multi-Turn Training
#5307 commented on Aug 22, 2025 • 0 new comments
swift框架使用lora微调Qwen3-Embedding-0.6B，lora merge后结果和没微调的时候一样
#5450 commented on Aug 23, 2025 • 0 new comments
微调qwen3-embedding-0.6b模型的时候报错
#5088 commented on Aug 23, 2025 • 0 new comments
Qwen3-235b-a22b-instruct lora微调后输出有奇怪字符
#5258 commented on Aug 22, 2025 • 0 new comments
使用deepspeed zer3训练Qwen3-30B-A3B-Instruct-2507时加载完模型和数据训练进度条会卡住不动
#5400 commented on Aug 22, 2025 • 0 new comments
GKD训练中断，如何加载checkpoint恢复训练
#5449 commented on Aug 22, 2025 • 0 new comments
KIMI VL SFT ERROR
#5218 commented on Aug 22, 2025 • 0 new comments
support train bert from scratch?
#4195 commented on Aug 22, 2025 • 0 new comments
如何在自定义奖励模型中使用 vllmengine？
#4327 commented on Aug 22, 2025 • 0 new comments
How to use custom chat template defined in jinja template format for SFT training
#5443 commented on Aug 21, 2025 • 0 new comments
embedding模型训练支持MRL吗
#5447 commented on Aug 21, 2025 • 0 new comments
支持训练时在多个验证集上独立评估
#5467 commented on Aug 21, 2025 • 0 new comments
使用多卡训练分类模型出现tensor对齐情况
#5469 commented on Aug 21, 2025 • 0 new comments
swift量化qwen2.5-vl模型报错Qwen2_5_VLModel' object has no attribute 'layers'
#5472 commented on Aug 21, 2025 • 0 new comments
swift infer 推理结果和 merge后模型的推理结果输出差距过大，swift infer有思考，merge后模型无法思考
#5196 commented on Aug 21, 2025 • 0 new comments
qwen3 reranker模型训练
#5256 commented on Aug 21, 2025 • 0 new comments
About Qwen2.5 VL finetuning
#5280 commented on Aug 21, 2025 • 0 new comments
请问对于ovis模型，怎么正确将MAX_PARTITION传入？依靠环境变量vllm serve启动不生效
#4881 commented on Aug 21, 2025 • 0 new comments
LLaVA-OV-chat系列模型支持
#4187 commented on Aug 21, 2025 • 0 new comments