Pulse · modelscope/ms-swift · GitHub

August 16, 2025 – August 23, 2025

Overview

35 Active pull requests

153 Active issues

1 Release published by 1 person

v3.7.2 Patch release v3.7.2
published Aug 21, 2025

34 Pull requests merged by 3 people

fix seed
#5503 merged Aug 22, 2025
support intern-s1
#5500 merged Aug 22, 2025
fix ulysses
#5501 merged Aug 22, 2025
Add seed oss
#5499 merged Aug 22, 2025
[megatron] Support deepseek v3.1
#5498 merged Aug 22, 2025
fix think template prepend nothink_prefix
#5492 merged Aug 22, 2025
[bugfix] fix sp & loss_scale
#5497 merged Aug 22, 2025
[template] refactor extra_kwargs
#5491 merged Aug 22, 2025
[train] support Ovis2.5 padding_free
#5486 merged Aug 21, 2025
[bugfix] fix grpoargs check server_base_url
#5483 merged Aug 21, 2025
support deepseek-V3.1 & add no_think_prefix for hybrid thinking models
#5463 merged Aug 21, 2025
Fix test bugs
#5484 merged Aug 21, 2025
[megatron] Fix ref_adapter_load
#5480 merged Aug 21, 2025
[bugfix] fix megatron load/finetune
#5481 merged Aug 21, 2025
[grpo] fix apply template to tool call dataset
#5471 merged Aug 20, 2025
[bugfix] compat vllm 0.10.1
#5474 merged Aug 20, 2025
[bugfix] fix vllm qwen2_5_vl
#5473 merged Aug 20, 2025
[megatron] Support dpo adapters
#5451 merged Aug 20, 2025
fix paired metrics
#5468 merged Aug 20, 2025
[rlhf] support ref_adapters
#5459 merged Aug 19, 2025
add gemma3-270m
#5454 merged Aug 19, 2025
[train] support dpo/kto/grpo adapters
#5452 merged Aug 19, 2025
[megatron] support lora router
#5437 merged Aug 19, 2025
[megatron] support export lora to_mcore
#5445 merged Aug 19, 2025
[train] support target_parameters
#5340 merged Aug 18, 2025
[model] Support ovis2.5
#5426 merged Aug 18, 2025
[bugfix] fix megatron pp4 max_epochs
#5432 merged Aug 18, 2025
[docs] update docs base64
#5425 merged Aug 18, 2025
update rope_scaling
#5421 merged Aug 18, 2025
Update grounding docs
#5419 merged Aug 17, 2025
[megatron] support qwen3_thinking
#5417 merged Aug 17, 2025
fix vllm embedding
#5413 merged Aug 17, 2025
[bugfix] fix megatron convert
#5416 merged Aug 17, 2025
update swift image
#5412 merged Aug 16, 2025

1 Pull request opened by 1 person

[WIP] [megatron] support multimodal model
#5502 opened Aug 22, 2025

108 Issues closed by 17 people

Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 closed Aug 23, 2025
GTX Cards Support?
#4171 closed Aug 23, 2025
可以传入多个loss scale吗
#4175 closed Aug 23, 2025
dpo训练，多机多卡，deepseed会导致显存逐渐增加，最后显存不足，训练出错。
#4191 closed Aug 23, 2025
怎么保存性能最好的几个checkpoint
#4538 closed Aug 22, 2025
qwen2.5-vl grounding任务里同时有分类，是否支持？
#4614 closed Aug 22, 2025
per_device_train_batch_size 变大代码报错
#4858 closed Aug 22, 2025
有没有训练function call的强化学习示例
#5434 closed Aug 22, 2025
多机多卡megatron训练Qwen3-30B-A3B-Instruct-2507
#5458 closed Aug 22, 2025
Megatron-SWIFT训练增加report_to参数支持swanlab
#4212 closed Aug 22, 2025
是否支持微调Janus-pro的text2image
#4231 closed Aug 22, 2025
训练的速度很慢，但是NPU算力利用率大部分处于空闲态
#4233 closed Aug 22, 2025
CorDA on ms-swift
#4239 closed Aug 22, 2025
关于agent-grpo数据集和训练的咨询
#5490 closed Aug 21, 2025
自定义多模态模型如何注册
#5489 closed Aug 21, 2025
请问该如何定义新的数据格式
#5394 closed Aug 21, 2025
如何设置system prompt
#5431 closed Aug 21, 2025
Qwen2.5-VL-3B 在 2卡a100中推理会爆显存
#4471 closed Aug 21, 2025
支持gif图像的训练数据吗
#5475 closed Aug 21, 2025
export qwen2.5-vl-3b 的lora模型存在问题
#4511 closed Aug 21, 2025
huggingface上下载的数据集在finetuning的时候还需要重新下载
#4510 closed Aug 21, 2025
LLama-omni进行audio微调索引报错
#4101 closed Aug 21, 2025
无法单服务器多卡训练
#4334 closed Aug 21, 2025
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍，原因为何？
#4609 closed Aug 21, 2025
vllm不支持微调的qwen2.5-omni模型
#4542 closed Aug 21, 2025
deploy后client无法连接
#4879 closed Aug 21, 2025
How to use existing assistant content as a part of prompt
#4978 closed Aug 21, 2025
How to resample data during training?
#5050 closed Aug 21, 2025
推理时卡住
#5148 closed Aug 21, 2025
reranker 0.6B训练每到验证环境出现卡住不动的情况
#5254 closed Aug 21, 2025
obb数据集应该怎么微调大模型
#5251 closed Aug 21, 2025
examples/infer/demo_lora.py vllm后端执行报错
#5252 closed Aug 21, 2025
多模态训练加速支持
#5263 closed Aug 21, 2025
Qwen2.5-VL-7B fp8量化报错
#5286 closed Aug 21, 2025
packing遇到超长的应该打印一些信息出来，这里静默处理，啥也不知道不太好
#5300 closed Aug 21, 2025
How to customize plugins externally and how to use them? 如何在外部定义plugin以及如何使用plugin
#5304 closed Aug 21, 2025
Does ms-swift support GRPO fine tuning of gpt oss models and other MOE's
#5460 closed Aug 21, 2025
使用swift eval配合vllm 0.10版本报错
#5301 closed Aug 21, 2025
`ms-swift` Is Not Available on Conda Forge
#5376 closed Aug 21, 2025
关于能否快速实现基于qwen2.5_VL衍生模型的自定义设置
#5317 closed Aug 21, 2025
针对序列分类问题，如何修改其默认的交叉熵损失
#5342 closed Aug 21, 2025
The swift script not builds after installing
#5351 closed Aug 21, 2025
Megatron格式数据集
#5349 closed Aug 21, 2025
请问wandb如何自定义项目名、当前的实验名？
#5366 closed Aug 21, 2025
tokenizer是被封装到了engine中吗？
#5380 closed Aug 21, 2025
reward model dataset inference
#4864 closed Aug 21, 2025
Qwen2.5-omni GRPO训练出现内存OOM
#4739 closed Aug 21, 2025
kto训练后，使用lora和merge权重后预测效果差异很大
#4215 closed Aug 21, 2025
训练时建议打印一些参数信息
#4216 closed Aug 21, 2025
[Bug]: [WARNING:swift] Please install the package: pip install "decord" -U
#4709 closed Aug 20, 2025
关于resume_from_checkpoint加载deepspeed
#4765 closed Aug 20, 2025
相似度指标返回结果有一个小bug
#5446 closed Aug 20, 2025
Questions about the loss calculation during SFT
#5453 closed Aug 20, 2025
vllm support ascend, how use ms-swift deploy ascend base on vllm?
#4284 closed Aug 20, 2025
ms-swift框架当前支持5090显卡微调训练么？
#5464 closed Aug 20, 2025
Support model_type='gpt-oss'
#5291 closed Aug 20, 2025
评测GSM8k失败
#4196 closed Aug 20, 2025
Ovis2 加图片时微调loss为0.0，不加图片时损失正常训练
#5238 closed Aug 19, 2025
Qwen3-Reranker-8B 做sft时加载自定义数据集如何使用自定义的instruct？
#5189 closed Aug 19, 2025
reward margin
#3603 closed Aug 19, 2025
多轮对话重复使用一张图片
#5247 closed Aug 19, 2025
liger_kernel with Qwen3 Embedding fine-tuning - KeyError: 'last_hidden_state'
#5219 closed Aug 19, 2025
swift微调后进行提示工程
#5260 closed Aug 19, 2025
关于Support deepspeed-AutoTP的疑问
#5217 closed Aug 19, 2025
lora微调qwen2_5vloom
#5199 closed Aug 19, 2025
qwen reranker训练数据能带think么
#5205 closed Aug 19, 2025
多gpu环境找不到gpu
#5162 closed Aug 19, 2025
推理出现core错误
#5179 closed Aug 19, 2025
只使用accelerate数据并行，不用模型并行
#5181 closed Aug 19, 2025
继续checkpoint训练时，数据能否继续训而不是重头开始
#5197 closed Aug 19, 2025
多模态reward模型是否支持正负样本为图片的数据格式？
#5201 closed Aug 19, 2025
I finetune the Omni thinker by LLaMA-Factory and ms-swift, but the inference effect of talker is affected.
#5071 closed Aug 19, 2025
支持硬件设备
#5092 closed Aug 19, 2025
微调模型使用多个数据集报错
#5123 closed Aug 19, 2025
使用sft-lora微调Qwen2.5-vl-7B,推理时得到的输出明显只有一半，这是什么原因？
#5138 closed Aug 19, 2025
Prefix prompt for Embedding training
#4911 closed Aug 19, 2025
多模态GRPO训练视频模型耗时相关问题
#4943 closed Aug 19, 2025
About loss_type=None
#4937 closed Aug 19, 2025
qwen3 微调发现的问题
#4939 closed Aug 19, 2025
DPO Qwen2.5VL issue
#4940 closed Aug 19, 2025
swift export OOM
#4963 closed Aug 19, 2025
embedding 模型微调出错
#5046 closed Aug 19, 2025
支持使用云存储中的大型多模态数据集进行训练
#4179 closed Aug 19, 2025
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
#5439 closed Aug 18, 2025
GLM-4.1V-9B-Thinking微调环境
#4987 closed Aug 18, 2025
模型支持Ovis2.5
#5436 closed Aug 18, 2025
How to construct my fine-tuning dataset?
#4891 closed Aug 18, 2025
使用ms-swift sft之后模型的config.json文件变了，导致我不能直接使用vllm部署模型
#4844 closed Aug 18, 2025
grpo微调deepseek_coder模型填充信息有误
#4808 closed Aug 18, 2025
设置packing_cache后，第二次训练没有从cache读取数据，又重新packing了。
#4803 closed Aug 18, 2025
test kimi vl thinking meet error!
#4780 closed Aug 18, 2025
awq量化qwen2.5-vl-7b报错
#4828 closed Aug 18, 2025
qwen2.5-vl的awq量化问题
#4762 closed Aug 18, 2025
swift rlhf --vllm_mode server， rollout报错： NCCL error
#5428 closed Aug 18, 2025
基于本地加载数据集进行多卡并行训练，停在Init COMPLETE... 无法进入train阶段
#4743 closed Aug 18, 2025
输入多图的编号问题
#4742 closed Aug 18, 2025
Does the packing feature block attention score between different samples?
#4736 closed Aug 18, 2025
agent推理时是否还不支持实际的工具调用，参考demo_agent.py
#4764 closed Aug 18, 2025
ms swift如何加入early stop
#4741 closed Aug 18, 2025
swift rollout 出现OOM，相对deploy显存要求较高
#5424 closed Aug 18, 2025
微调DeepSeek模型报错：AssertionError: noaux_tc not supported for training
#4737 closed Aug 18, 2025
Packing and lazy_tokenize are incompatible in v3.8.0
#5402 closed Aug 18, 2025
lora微调qwen3 embedding模型弹出警告find_unused_parameters
#4698 closed Aug 18, 2025
a question for rl
#4735 closed Aug 18, 2025
SFT训练一个回归任务后，推理使用vllm加速，模型load会报错，有办法解决吗
#4676 closed Aug 18, 2025
swift3.8.0dev版本GRPO训练报错TypeError: Qwen2_5_VLModel.forward() got an unexpected keyword argument 'pixel_values'
#5377 closed Aug 17, 2025
请问现在十分支持部署基座qwen2.5-VL + 多个lora 这样的服务
#4153 closed Aug 17, 2025
glm4.5/dsv3 agent训练时, 似乎没有正确放置system prompt
#5414 closed Aug 16, 2025

45 Issues opened by 42 people

单机多卡训练卡死，每次卡的位置都一样
#5505 opened Aug 23, 2025
可以提供一个用于数学推理的SFT训练模版吗
#5504 opened Aug 22, 2025
Update Wechat QR Code plz
#5496 opened Aug 22, 2025
修改InternVL图像num_image_token后多卡训练速度异常
#5495 opened Aug 22, 2025
预训练自定义模型
#5494 opened Aug 22, 2025
grpo没有办法在output中保存images的文件夹。
#5493 opened Aug 22, 2025
建议引入通义实验室Trinity-RFT团队提出的CHORD框架
#5488 opened Aug 21, 2025
有没有大神可以解释一下grpo——internal下的工程原理啊
#5487 opened Aug 21, 2025
train_type为full时冻结llm失败问题
#5485 opened Aug 21, 2025
report_to 参数设置
#5482 opened Aug 21, 2025
是否有计划支持Voxtral系列模型的微调？
#5479 opened Aug 21, 2025
--packing在SFT训练中是否会切断语料，破坏语料训练的上下文完整性。
#5478 opened Aug 21, 2025
infonce loss 在 world_size > 1 时的计算逻辑
#5477 opened Aug 21, 2025
Add support for vllm --compilation-config
#5476 opened Aug 20, 2025
swift量化qwen2.5-vl模型报错Qwen2_5_VLModel' object has no attribute 'layers'
#5472 opened Aug 20, 2025
swift框架微调InternVL3
#5470 opened Aug 20, 2025
使用多卡训练分类模型出现tensor对齐情况
#5469 opened Aug 20, 2025
支持训练时在多个验证集上独立评估
#5467 opened Aug 20, 2025
qwen3-reranker微调数据问题
#5466 opened Aug 20, 2025
eval框架支持sleep mode吗？
#5465 opened Aug 20, 2025
AutoAWQ不再更新
#5462 opened Aug 20, 2025
SGlang for GRPO rollout
#5461 opened Aug 19, 2025
多模态模型 B200 训练
#5457 opened Aug 19, 2025
多模态模型base64数据微调报错
#5456 opened Aug 19, 2025
Does ms-swift support sequence parallel(up to 32k or 128k) for qwen2.5-vl model pretrain, thanks.
#5455 opened Aug 19, 2025
swift框架使用lora微调Qwen3-Embedding-0.6B，lora merge后结果和没微调的时候一样
#5450 opened Aug 19, 2025
GKD训练中断，如何加载checkpoint恢复训练
#5449 opened Aug 19, 2025
多模态模型如何自定义vision tower
#5448 opened Aug 19, 2025
embedding模型训练支持MRL吗
#5447 opened Aug 19, 2025
使用swift带--deepspeed zero参数全量微调qwen2.5-3b模型显存占用不降反增
#5444 opened Aug 19, 2025
How to use custom chat template defined in jinja template format for SFT training
#5443 opened Aug 18, 2025
inference的时候如何指定remove_unused_columns为false
#5442 opened Aug 18, 2025
Eval_loss与eval_acc的趋势问题
#5441 opened Aug 18, 2025
swift3 export 得到的merge后的模型qwen2.5-vl-7B加载模型时报错
#5440 opened Aug 18, 2025
ms-swift的维护
#5438 opened Aug 18, 2025
ms-swift3.7.1 hf转换为mcore格式失败
#5435 opened Aug 18, 2025
相同的配置多次启动实验偶发报错
#5433 opened Aug 18, 2025
LoRA合并qwen2.5-vl-7B模型在 ms-swift 3.6.4 与 3.7 表现差异: 3.7版本合并的模型在多槽位grounding任务上显著优于3.6.4版本合并的模型
#5430 opened Aug 18, 2025
是否考虑支持ernie VL系列模型
#5429 opened Aug 18, 2025
模型支持：internlm/Intern-S1-GGUF
#5427 opened Aug 18, 2025
embedding模型替换
#5423 opened Aug 18, 2025
How can we use bnb quantisation
#5422 opened Aug 17, 2025
自定义多模态数据集
#5420 opened Aug 17, 2025
DPO 损失函数计算NaN问题查找
#5418 opened Aug 17, 2025
希望能支持v3原生的agent template
#5415 opened Aug 16, 2025

25 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[BREAKING] Refactor Scheduler and GRPOTrainer for Flexible Multi-Turn Training
#5307 commented on Aug 22, 2025 • 10 new comments
[model] support glm-4.5 agent
#5305 commented on Aug 22, 2025 • 1 new comment
微调qwen3-embedding-0.6b模型的时候报错
#5088 commented on Aug 23, 2025 • 0 new comments
Qwen3-235b-a22b-instruct lora微调后输出有奇怪字符
#5258 commented on Aug 22, 2025 • 0 new comments
使用deepspeed zer3训练Qwen3-30B-A3B-Instruct-2507时加载完模型和数据训练进度条会卡住不动
#5400 commented on Aug 22, 2025 • 0 new comments
KIMI VL SFT ERROR
#5218 commented on Aug 22, 2025 • 0 new comments
support train bert from scratch?
#4195 commented on Aug 22, 2025 • 0 new comments
如何在自定义奖励模型中使用 vllmengine？
#4327 commented on Aug 22, 2025 • 0 new comments
swift infer 推理结果和 merge后模型的推理结果输出差距过大，swift infer有思考，merge后模型无法思考
#5196 commented on Aug 21, 2025 • 0 new comments
qwen3 reranker模型训练
#5256 commented on Aug 21, 2025 • 0 new comments
About Qwen2.5 VL finetuning
#5280 commented on Aug 21, 2025 • 0 new comments
请问对于ovis模型，怎么正确将MAX_PARTITION传入？依靠环境变量vllm serve启动不生效
#4881 commented on Aug 21, 2025 • 0 new comments
LLaVA-OV-chat系列模型支持
#4187 commented on Aug 21, 2025 • 0 new comments
🍭[Roadmap] ms-swift3.6-3.8
#4561 commented on Aug 20, 2025 • 0 new comments
使用强化学习微调qwen2.5_omni_3B出现复读机情况
#5370 commented on Aug 20, 2025 • 0 new comments
swift export出来的模型如何用mindie进行推理
#5399 commented on Aug 20, 2025 • 0 new comments
multiple node training slower than single node
#4291 commented on Aug 20, 2025 • 0 new comments
定制化评测
#4283 commented on Aug 19, 2025 • 0 new comments
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 commented on Aug 18, 2025 • 0 new comments
Intern-S1模型的支持
#5356 commented on Aug 18, 2025 • 0 new comments
Eval_loss与eval_acc的趋势不一致
#2663 commented on Aug 18, 2025 • 0 new comments
qwen3训练卡在use_logits_to_keep: True环节一直不动
#4875 commented on Aug 18, 2025 • 0 new comments
想问下embedding的训练如何加入system or instructions？
#4638 commented on Aug 18, 2025 • 0 new comments
python attr method cannot work if llm_prefix contains nested objects in _patch_sequence_classification
#4245 commented on Aug 18, 2025 • 0 new comments
Discussion: Does DFT really have practical effects?
#5386 commented on Aug 17, 2025 • 0 new comments