-
Notifications
You must be signed in to change notification settings - Fork 830
Insights: modelscope/ms-swift
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v3.7.2 Patch release v3.7.2
published
Aug 21, 2025
15 Pull requests merged by 3 people
-
[bugfix] fix from_dict
#5506 merged
Aug 23, 2025 -
fix seed
#5503 merged
Aug 22, 2025 -
support intern-s1
#5500 merged
Aug 22, 2025 -
fix ulysses
#5501 merged
Aug 22, 2025 -
Add seed oss
#5499 merged
Aug 22, 2025 -
[megatron] Support deepseek v3.1
#5498 merged
Aug 22, 2025 -
fix think template prepend nothink_prefix
#5492 merged
Aug 22, 2025 -
[bugfix] fix sp & loss_scale
#5497 merged
Aug 22, 2025 -
[template] refactor extra_kwargs
#5491 merged
Aug 22, 2025 -
[train] support Ovis2.5 padding_free
#5486 merged
Aug 21, 2025 -
[bugfix] fix grpoargs check server_base_url
#5483 merged
Aug 21, 2025 -
support deepseek-V3.1 & add no_think_prefix for hybrid thinking models
#5463 merged
Aug 21, 2025 -
Fix test bugs
#5484 merged
Aug 21, 2025 -
[megatron] Fix ref_adapter_load
#5480 merged
Aug 21, 2025 -
[bugfix] fix megatron load/finetune
#5481 merged
Aug 21, 2025
1 Pull request opened by 1 person
-
[WIP] [megatron] support multimodal model
#5502 opened
Aug 22, 2025
49 Issues closed by 7 people
-
Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 closed
Aug 23, 2025 -
GTX Cards Support?
#4171 closed
Aug 23, 2025 -
可以传入多个loss scale吗
#4175 closed
Aug 23, 2025 -
dpo训练,多机多卡,deepseed会导致显存逐渐增加,最后显存不足,训练出错。
#4191 closed
Aug 23, 2025 -
怎么保存性能最好的几个checkpoint
#4538 closed
Aug 22, 2025 -
qwen2.5-vl grounding任务里同时有分类,是否支持?
#4614 closed
Aug 22, 2025 -
per_device_train_batch_size 变大 代码报错
#4858 closed
Aug 22, 2025 -
有没有训练function call的强化学习示例
#5434 closed
Aug 22, 2025 -
多机多卡megatron训练Qwen3-30B-A3B-Instruct-2507
#5458 closed
Aug 22, 2025 -
Megatron-SWIFT训练增加report_to参数支持swanlab
#4212 closed
Aug 22, 2025 -
是否支持微调Janus-pro的text2image
#4231 closed
Aug 22, 2025 -
训练的速度很慢,但是NPU算力利用率大部分处于空闲态
#4233 closed
Aug 22, 2025 -
CorDA on ms-swift
#4239 closed
Aug 22, 2025 -
关于agent-grpo数据集和训练的咨询
#5490 closed
Aug 21, 2025 -
自定义多模态模型如何注册
#5489 closed
Aug 21, 2025 -
请问该如何定义新的数据格式
#5394 closed
Aug 21, 2025 -
如何设置system prompt
#5431 closed
Aug 21, 2025 -
Qwen2.5-VL-3B 在 2卡a100中推理会爆显存
#4471 closed
Aug 21, 2025 -
支持gif图像的训练数据吗
#5475 closed
Aug 21, 2025 -
export qwen2.5-vl-3b 的lora模型存在问题
#4511 closed
Aug 21, 2025 -
huggingface上下载的数据集在finetuning的时候还需要重新下载
#4510 closed
Aug 21, 2025 -
LLama-omni进行audio微调索引报错
#4101 closed
Aug 21, 2025 -
无法单服务器多卡训练
#4334 closed
Aug 21, 2025 -
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍,原因为何?
#4609 closed
Aug 21, 2025 -
vllm不支持微调的qwen2.5-omni模型
#4542 closed
Aug 21, 2025 -
deploy后client无法连接
#4879 closed
Aug 21, 2025 -
How to use existing assistant content as a part of prompt
#4978 closed
Aug 21, 2025 -
How to resample data during training?
#5050 closed
Aug 21, 2025 -
推理时卡住
#5148 closed
Aug 21, 2025 -
reranker 0.6B训练 每到验证环境出现卡住不动的情况
#5254 closed
Aug 21, 2025 -
obb数据集应该怎么微调大模型
#5251 closed
Aug 21, 2025 -
examples/infer/demo_lora.py vllm后端执行报错
#5252 closed
Aug 21, 2025 -
多模态训练加速支持
#5263 closed
Aug 21, 2025 -
Qwen2.5-VL-7B fp8量化报错
#5286 closed
Aug 21, 2025 -
packing遇到超长的应该打印一些信息出来,这里静默处理,啥也不知道不太好
#5300 closed
Aug 21, 2025 -
How to customize plugins externally and how to use them? 如何在外部定义plugin以及如何使用plugin
#5304 closed
Aug 21, 2025 -
Does ms-swift support GRPO fine tuning of gpt oss models and other MOE's
#5460 closed
Aug 21, 2025 -
使用swift eval配合vllm 0.10版本报错
#5301 closed
Aug 21, 2025 -
`ms-swift` Is Not Available on Conda Forge
#5376 closed
Aug 21, 2025 -
关于能否快速实现基于qwen2.5_VL衍生模型的自定义设置
#5317 closed
Aug 21, 2025 -
针对序列分类问题,如何修改其默认的交叉熵损失
#5342 closed
Aug 21, 2025 -
The swift script not builds after installing
#5351 closed
Aug 21, 2025 -
Megatron格式数据集
#5349 closed
Aug 21, 2025 -
请问wandb如何自定义项目名、当前的实验名?
#5366 closed
Aug 21, 2025 -
tokenizer是被封装到了engine中吗?
#5380 closed
Aug 21, 2025 -
reward model dataset inference
#4864 closed
Aug 21, 2025 -
Qwen2.5-omni GRPO训练出现内存OOM
#4739 closed
Aug 21, 2025 -
kto训练后,使用lora和merge权重后预测效果差异很大
#4215 closed
Aug 21, 2025 -
训练时建议打印一些参数信息
#4216 closed
Aug 21, 2025
14 Issues opened by 13 people
-
qwen3 sft的时候的think
#5507 opened
Aug 23, 2025 -
单机多卡训练卡死,每次卡的位置都一样
#5505 opened
Aug 23, 2025 -
可以提供一个用于数学推理的SFT训练模版吗
#5504 opened
Aug 22, 2025 -
Update Wechat QR Code plz
#5496 opened
Aug 22, 2025 -
修改InternVL图像num_image_token后多卡训练速度异常
#5495 opened
Aug 22, 2025 -
预训练自定义模型
#5494 opened
Aug 22, 2025 -
grpo没有办法在output中保存images的文件夹。
#5493 opened
Aug 22, 2025 -
建议引入通义实验室Trinity-RFT团队提出的CHORD框架
#5488 opened
Aug 21, 2025 -
有没有大神可以解释一下grpo——internal下的工程原理啊
#5487 opened
Aug 21, 2025 -
train_type为full时冻结llm失败问题
#5485 opened
Aug 21, 2025 -
report_to 参数设置
#5482 opened
Aug 21, 2025 -
是否有计划支持Voxtral系列模型的微调?
#5479 opened
Aug 21, 2025 -
--packing在SFT训练中是否会切断语料,破坏语料训练的上下文完整性。
#5478 opened
Aug 21, 2025 -
infonce loss 在 world_size > 1 时的计算逻辑
#5477 opened
Aug 21, 2025
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[model] support glm-4.5 agent
#5305 commented on
Aug 22, 2025 • 1 new comment -
[BREAKING] Refactor Scheduler and GRPOTrainer for Flexible Multi-Turn Training
#5307 commented on
Aug 22, 2025 • 0 new comments -
swift框架使用lora微调Qwen3-Embedding-0.6B,lora merge后结果和没微调的时候一样
#5450 commented on
Aug 23, 2025 • 0 new comments -
微调qwen3-embedding-0.6b模型的时候报错
#5088 commented on
Aug 23, 2025 • 0 new comments -
Qwen3-235b-a22b-instruct lora微调后输出有奇怪字符
#5258 commented on
Aug 22, 2025 • 0 new comments -
使用deepspeed zer3训练Qwen3-30B-A3B-Instruct-2507时 加载完模型和数据 训练进度条会卡住不动
#5400 commented on
Aug 22, 2025 • 0 new comments -
GKD训练中断,如何加载checkpoint恢复训练
#5449 commented on
Aug 22, 2025 • 0 new comments -
KIMI VL SFT ERROR
#5218 commented on
Aug 22, 2025 • 0 new comments -
support train bert from scratch?
#4195 commented on
Aug 22, 2025 • 0 new comments -
如何在自定义奖励模型中使用 vllmengine?
#4327 commented on
Aug 22, 2025 • 0 new comments -
How to use custom chat template defined in jinja template format for SFT training
#5443 commented on
Aug 21, 2025 • 0 new comments -
embedding模型训练支持MRL吗
#5447 commented on
Aug 21, 2025 • 0 new comments -
支持训练时在多个验证集上独立评估
#5467 commented on
Aug 21, 2025 • 0 new comments -
使用多卡训练分类模型出现tensor对齐情况
#5469 commented on
Aug 21, 2025 • 0 new comments -
swift量化qwen2.5-vl模型报错Qwen2_5_VLModel' object has no attribute 'layers'
#5472 commented on
Aug 21, 2025 • 0 new comments -
swift infer 推理结果 和 merge后模型的推理结果输出差距过大,swift infer有思考,merge后模型无法思考
#5196 commented on
Aug 21, 2025 • 0 new comments -
qwen3 reranker模型训练
#5256 commented on
Aug 21, 2025 • 0 new comments -
About Qwen2.5 VL finetuning
#5280 commented on
Aug 21, 2025 • 0 new comments -
请问对于ovis模型,怎么正确将MAX_PARTITION传入?依靠环境变量vllm serve启动不生效
#4881 commented on
Aug 21, 2025 • 0 new comments -
LLaVA-OV-chat系列模型支持
#4187 commented on
Aug 21, 2025 • 0 new comments