-
Notifications
You must be signed in to change notification settings - Fork 829
Insights: modelscope/ms-swift
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v3.7.2 Patch release v3.7.2
published
Aug 21, 2025
34 Pull requests merged by 3 people
-
fix seed
#5503 merged
Aug 22, 2025 -
support intern-s1
#5500 merged
Aug 22, 2025 -
fix ulysses
#5501 merged
Aug 22, 2025 -
Add seed oss
#5499 merged
Aug 22, 2025 -
[megatron] Support deepseek v3.1
#5498 merged
Aug 22, 2025 -
fix think template prepend nothink_prefix
#5492 merged
Aug 22, 2025 -
[bugfix] fix sp & loss_scale
#5497 merged
Aug 22, 2025 -
[template] refactor extra_kwargs
#5491 merged
Aug 22, 2025 -
[train] support Ovis2.5 padding_free
#5486 merged
Aug 21, 2025 -
[bugfix] fix grpoargs check server_base_url
#5483 merged
Aug 21, 2025 -
support deepseek-V3.1 & add no_think_prefix for hybrid thinking models
#5463 merged
Aug 21, 2025 -
Fix test bugs
#5484 merged
Aug 21, 2025 -
[megatron] Fix ref_adapter_load
#5480 merged
Aug 21, 2025 -
[bugfix] fix megatron load/finetune
#5481 merged
Aug 21, 2025 -
[grpo] fix apply template to tool call dataset
#5471 merged
Aug 20, 2025 -
[bugfix] compat vllm 0.10.1
#5474 merged
Aug 20, 2025 -
[bugfix] fix vllm qwen2_5_vl
#5473 merged
Aug 20, 2025 -
[megatron] Support dpo adapters
#5451 merged
Aug 20, 2025 -
fix paired metrics
#5468 merged
Aug 20, 2025 -
[rlhf] support ref_adapters
#5459 merged
Aug 19, 2025 -
add gemma3-270m
#5454 merged
Aug 19, 2025 -
[train] support dpo/kto/grpo adapters
#5452 merged
Aug 19, 2025 -
[megatron] support lora router
#5437 merged
Aug 19, 2025 -
[megatron] support export lora to_mcore
#5445 merged
Aug 19, 2025 -
[train] support target_parameters
#5340 merged
Aug 18, 2025 -
[model] Support ovis2.5
#5426 merged
Aug 18, 2025 -
[bugfix] fix megatron pp4 max_epochs
#5432 merged
Aug 18, 2025 -
[docs] update docs base64
#5425 merged
Aug 18, 2025 -
update rope_scaling
#5421 merged
Aug 18, 2025 -
Update grounding docs
#5419 merged
Aug 17, 2025 -
[megatron] support qwen3_thinking
#5417 merged
Aug 17, 2025 -
fix vllm embedding
#5413 merged
Aug 17, 2025 -
[bugfix] fix megatron convert
#5416 merged
Aug 17, 2025 -
update swift image
#5412 merged
Aug 16, 2025
1 Pull request opened by 1 person
-
[WIP] [megatron] support multimodal model
#5502 opened
Aug 22, 2025
108 Issues closed by 17 people
-
Some problems about loading Janus-Pro - traceback : Signal 11 (SIGSEGV) received by PID xxx
#4134 closed
Aug 23, 2025 -
GTX Cards Support?
#4171 closed
Aug 23, 2025 -
可以传入多个loss scale吗
#4175 closed
Aug 23, 2025 -
dpo训练,多机多卡,deepseed会导致显存逐渐增加,最后显存不足,训练出错。
#4191 closed
Aug 23, 2025 -
怎么保存性能最好的几个checkpoint
#4538 closed
Aug 22, 2025 -
qwen2.5-vl grounding任务里同时有分类,是否支持?
#4614 closed
Aug 22, 2025 -
per_device_train_batch_size 变大 代码报错
#4858 closed
Aug 22, 2025 -
有没有训练function call的强化学习示例
#5434 closed
Aug 22, 2025 -
多机多卡megatron训练Qwen3-30B-A3B-Instruct-2507
#5458 closed
Aug 22, 2025 -
Megatron-SWIFT训练增加report_to参数支持swanlab
#4212 closed
Aug 22, 2025 -
是否支持微调Janus-pro的text2image
#4231 closed
Aug 22, 2025 -
训练的速度很慢,但是NPU算力利用率大部分处于空闲态
#4233 closed
Aug 22, 2025 -
CorDA on ms-swift
#4239 closed
Aug 22, 2025 -
关于agent-grpo数据集和训练的咨询
#5490 closed
Aug 21, 2025 -
自定义多模态模型如何注册
#5489 closed
Aug 21, 2025 -
请问该如何定义新的数据格式
#5394 closed
Aug 21, 2025 -
如何设置system prompt
#5431 closed
Aug 21, 2025 -
Qwen2.5-VL-3B 在 2卡a100中推理会爆显存
#4471 closed
Aug 21, 2025 -
支持gif图像的训练数据吗
#5475 closed
Aug 21, 2025 -
export qwen2.5-vl-3b 的lora模型存在问题
#4511 closed
Aug 21, 2025 -
huggingface上下载的数据集在finetuning的时候还需要重新下载
#4510 closed
Aug 21, 2025 -
LLama-omni进行audio微调索引报错
#4101 closed
Aug 21, 2025 -
无法单服务器多卡训练
#4334 closed
Aug 21, 2025 -
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍,原因为何?
#4609 closed
Aug 21, 2025 -
vllm不支持微调的qwen2.5-omni模型
#4542 closed
Aug 21, 2025 -
deploy后client无法连接
#4879 closed
Aug 21, 2025 -
How to use existing assistant content as a part of prompt
#4978 closed
Aug 21, 2025 -
How to resample data during training?
#5050 closed
Aug 21, 2025 -
推理时卡住
#5148 closed
Aug 21, 2025 -
reranker 0.6B训练 每到验证环境出现卡住不动的情况
#5254 closed
Aug 21, 2025 -
obb数据集应该怎么微调大模型
#5251 closed
Aug 21, 2025 -
examples/infer/demo_lora.py vllm后端执行报错
#5252 closed
Aug 21, 2025 -
多模态训练加速支持
#5263 closed
Aug 21, 2025 -
Qwen2.5-VL-7B fp8量化报错
#5286 closed
Aug 21, 2025 -
packing遇到超长的应该打印一些信息出来,这里静默处理,啥也不知道不太好
#5300 closed
Aug 21, 2025 -
How to customize plugins externally and how to use them? 如何在外部定义plugin以及如何使用plugin
#5304 closed
Aug 21, 2025 -
Does ms-swift support GRPO fine tuning of gpt oss models and other MOE's
#5460 closed
Aug 21, 2025 -
使用swift eval配合vllm 0.10版本报错
#5301 closed
Aug 21, 2025 -
`ms-swift` Is Not Available on Conda Forge
#5376 closed
Aug 21, 2025 -
关于能否快速实现基于qwen2.5_VL衍生模型的自定义设置
#5317 closed
Aug 21, 2025 -
针对序列分类问题,如何修改其默认的交叉熵损失
#5342 closed
Aug 21, 2025 -
The swift script not builds after installing
#5351 closed
Aug 21, 2025 -
Megatron格式数据集
#5349 closed
Aug 21, 2025 -
请问wandb如何自定义项目名、当前的实验名?
#5366 closed
Aug 21, 2025 -
tokenizer是被封装到了engine中吗?
#5380 closed
Aug 21, 2025 -
reward model dataset inference
#4864 closed
Aug 21, 2025 -
Qwen2.5-omni GRPO训练出现内存OOM
#4739 closed
Aug 21, 2025 -
kto训练后,使用lora和merge权重后预测效果差异很大
#4215 closed
Aug 21, 2025 -
训练时建议打印一些参数信息
#4216 closed
Aug 21, 2025 -
[Bug]: [WARNING:swift] Please install the package: pip install "decord" -U
#4709 closed
Aug 20, 2025 -
关于resume_from_checkpoint加载deepspeed
#4765 closed
Aug 20, 2025 -
相似度指标返回结果有一个小bug
#5446 closed
Aug 20, 2025 -
Questions about the loss calculation during SFT
#5453 closed
Aug 20, 2025 -
vllm support ascend, how use ms-swift deploy ascend base on vllm?
#4284 closed
Aug 20, 2025 -
ms-swift框架当前支持5090显卡微调训练么?
#5464 closed
Aug 20, 2025 -
Support model_type='gpt-oss'
#5291 closed
Aug 20, 2025 -
评测GSM8k失败
#4196 closed
Aug 20, 2025 -
Ovis2 加图片时微调loss为0.0,不加图片时损失正常训练
#5238 closed
Aug 19, 2025 -
Qwen3-Reranker-8B 做sft时加载自定义数据集如何使用自定义的instruct?
#5189 closed
Aug 19, 2025 -
reward margin
#3603 closed
Aug 19, 2025 -
多轮对话重复使用一张图片
#5247 closed
Aug 19, 2025 -
liger_kernel with Qwen3 Embedding fine-tuning - KeyError: 'last_hidden_state'
#5219 closed
Aug 19, 2025 -
swift微调后进行提示工程
#5260 closed
Aug 19, 2025 -
关于Support deepspeed-AutoTP的疑问
#5217 closed
Aug 19, 2025 -
lora微调qwen2_5vloom
#5199 closed
Aug 19, 2025 -
qwen reranker训练数据能带think么
#5205 closed
Aug 19, 2025 -
多gpu环境找不到gpu
#5162 closed
Aug 19, 2025 -
推理出现core错误
#5179 closed
Aug 19, 2025 -
只使用accelerate数据并行,不用模型并行
#5181 closed
Aug 19, 2025 -
继续checkpoint训练时,数据能否继续训而不是重头开始
#5197 closed
Aug 19, 2025 -
多模态reward模型是否支持正负样本为图片的数据格式?
#5201 closed
Aug 19, 2025 -
支持硬件设备
#5092 closed
Aug 19, 2025 -
微调模型使用多个数据集报错
#5123 closed
Aug 19, 2025 -
使用sft-lora微调Qwen2.5-vl-7B,推理时得到的输出明显只有一半,这是什么原因?
#5138 closed
Aug 19, 2025 -
Prefix prompt for Embedding training
#4911 closed
Aug 19, 2025 -
多模态GRPO训练视频模型耗时相关问题
#4943 closed
Aug 19, 2025 -
About loss_type=None
#4937 closed
Aug 19, 2025 -
qwen3 微调发现的问题
#4939 closed
Aug 19, 2025 -
DPO Qwen2.5VL issue
#4940 closed
Aug 19, 2025 -
swift export OOM
#4963 closed
Aug 19, 2025 -
embedding 模型微调出错
#5046 closed
Aug 19, 2025 -
支持使用云存储中的大型多模态数据集进行训练
#4179 closed
Aug 19, 2025 -
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
#5439 closed
Aug 18, 2025 -
GLM-4.1V-9B-Thinking微调环境
#4987 closed
Aug 18, 2025 -
模型支持Ovis2.5
#5436 closed
Aug 18, 2025 -
How to construct my fine-tuning dataset?
#4891 closed
Aug 18, 2025 -
使用ms-swift sft之后模型的config.json文件变了,导致我不能直接使用vllm部署模型
#4844 closed
Aug 18, 2025 -
grpo微调deepseek_coder模型填充信息有误
#4808 closed
Aug 18, 2025 -
设置packing_cache后,第二次训练没有从cache读取数据,又重新packing了。
#4803 closed
Aug 18, 2025 -
test kimi vl thinking meet error!
#4780 closed
Aug 18, 2025 -
awq量化qwen2.5-vl-7b报错
#4828 closed
Aug 18, 2025 -
qwen2.5-vl的awq量化问题
#4762 closed
Aug 18, 2025 -
swift rlhf --vllm_mode server, rollout报错: NCCL error
#5428 closed
Aug 18, 2025 -
基于本地加载数据集进行多卡并行训练,停在Init COMPLETE... 无法进入train阶段
#4743 closed
Aug 18, 2025 -
输入多图的编号问题
#4742 closed
Aug 18, 2025 -
Does the packing feature block attention score between different samples?
#4736 closed
Aug 18, 2025 -
agent推理时是否还不支持实际的工具调用,参考demo_agent.py
#4764 closed
Aug 18, 2025 -
ms swift如何加入early stop
#4741 closed
Aug 18, 2025 -
swift rollout 出现OOM,相对deploy显存要求较高
#5424 closed
Aug 18, 2025 -
微调DeepSeek模型报错:AssertionError: noaux_tc not supported for training
#4737 closed
Aug 18, 2025 -
Packing and lazy_tokenize are incompatible in v3.8.0
#5402 closed
Aug 18, 2025 -
lora微调qwen3 embedding模型弹出警告find_unused_parameters
#4698 closed
Aug 18, 2025 -
a question for rl
#4735 closed
Aug 18, 2025 -
SFT训练一个回归任务后,推理使用vllm加速,模型load会报错,有办法解决吗
#4676 closed
Aug 18, 2025 -
请问现在十分支持部署 基座qwen2.5-VL + 多个lora 这样的服务
#4153 closed
Aug 17, 2025 -
glm4.5/dsv3 agent训练时, 似乎没有正确放置system prompt
#5414 closed
Aug 16, 2025
45 Issues opened by 42 people
-
单机多卡训练卡死,每次卡的位置都一样
#5505 opened
Aug 23, 2025 -
可以提供一个用于数学推理的SFT训练模版吗
#5504 opened
Aug 22, 2025 -
Update Wechat QR Code plz
#5496 opened
Aug 22, 2025 -
修改InternVL图像num_image_token后多卡训练速度异常
#5495 opened
Aug 22, 2025 -
预训练自定义模型
#5494 opened
Aug 22, 2025 -
grpo没有办法在output中保存images的文件夹。
#5493 opened
Aug 22, 2025 -
建议引入通义实验室Trinity-RFT团队提出的CHORD框架
#5488 opened
Aug 21, 2025 -
有没有大神可以解释一下grpo——internal下的工程原理啊
#5487 opened
Aug 21, 2025 -
train_type为full时冻结llm失败问题
#5485 opened
Aug 21, 2025 -
report_to 参数设置
#5482 opened
Aug 21, 2025 -
是否有计划支持Voxtral系列模型的微调?
#5479 opened
Aug 21, 2025 -
--packing在SFT训练中是否会切断语料,破坏语料训练的上下文完整性。
#5478 opened
Aug 21, 2025 -
infonce loss 在 world_size > 1 时的计算逻辑
#5477 opened
Aug 21, 2025 -
Add support for vllm --compilation-config
#5476 opened
Aug 20, 2025 -
swift量化qwen2.5-vl模型报错Qwen2_5_VLModel' object has no attribute 'layers'
#5472 opened
Aug 20, 2025 -
swift框架微调InternVL3
#5470 opened
Aug 20, 2025 -
使用多卡训练分类模型出现tensor对齐情况
#5469 opened
Aug 20, 2025 -
支持训练时在多个验证集上独立评估
#5467 opened
Aug 20, 2025 -
qwen3-reranker微调数据问题
#5466 opened
Aug 20, 2025 -
eval框架支持sleep mode吗?
#5465 opened
Aug 20, 2025 -
AutoAWQ不再更新
#5462 opened
Aug 20, 2025 -
SGlang for GRPO rollout
#5461 opened
Aug 19, 2025 -
多模态模型 B200 训练
#5457 opened
Aug 19, 2025 -
多模态模型base64数据微调报错
#5456 opened
Aug 19, 2025 -
Does ms-swift support sequence parallel(up to 32k or 128k) for qwen2.5-vl model pretrain, thanks.
#5455 opened
Aug 19, 2025 -
swift框架使用lora微调Qwen3-Embedding-0.6B,lora merge后结果和没微调的时候一样
#5450 opened
Aug 19, 2025 -
GKD训练中断,如何加载checkpoint恢复训练
#5449 opened
Aug 19, 2025 -
多模态模型 如何自定义vision tower
#5448 opened
Aug 19, 2025 -
embedding模型训练支持MRL吗
#5447 opened
Aug 19, 2025 -
使用swift带--deepspeed zero参数全量微调qwen2.5-3b模型显存占用不降反增
#5444 opened
Aug 19, 2025 -
How to use custom chat template defined in jinja template format for SFT training
#5443 opened
Aug 18, 2025 -
inference的时候如何指定remove_unused_columns为false
#5442 opened
Aug 18, 2025 -
Eval_loss与eval_acc的趋势问题
#5441 opened
Aug 18, 2025 -
swift3 export 得到的merge后的模型qwen2.5-vl-7B加载模型时报错
#5440 opened
Aug 18, 2025 -
ms-swift的维护
#5438 opened
Aug 18, 2025 -
ms-swift3.7.1 hf转换为mcore格式失败
#5435 opened
Aug 18, 2025 -
相同的配置多次启动实验偶发报错
#5433 opened
Aug 18, 2025 -
LoRA合并qwen2.5-vl-7B模型在 ms-swift 3.6.4 与 3.7 表现差异: 3.7版本合并的模型在多槽位grounding任务上显著优于3.6.4版本合并的模型
#5430 opened
Aug 18, 2025 -
是否考虑支持ernie VL系列模型
#5429 opened
Aug 18, 2025 -
模型支持:internlm/Intern-S1-GGUF
#5427 opened
Aug 18, 2025 -
embedding模型替换
#5423 opened
Aug 18, 2025 -
How can we use bnb quantisation
#5422 opened
Aug 17, 2025 -
自定义多模态数据集
#5420 opened
Aug 17, 2025 -
DPO 损失函数计算NaN问题查找
#5418 opened
Aug 17, 2025 -
希望能支持v3原生的agent template
#5415 opened
Aug 16, 2025
25 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[BREAKING] Refactor Scheduler and GRPOTrainer for Flexible Multi-Turn Training
#5307 commented on
Aug 22, 2025 • 10 new comments -
[model] support glm-4.5 agent
#5305 commented on
Aug 22, 2025 • 1 new comment -
微调qwen3-embedding-0.6b模型的时候报错
#5088 commented on
Aug 23, 2025 • 0 new comments -
Qwen3-235b-a22b-instruct lora微调后输出有奇怪字符
#5258 commented on
Aug 22, 2025 • 0 new comments -
使用deepspeed zer3训练Qwen3-30B-A3B-Instruct-2507时 加载完模型和数据 训练进度条会卡住不动
#5400 commented on
Aug 22, 2025 • 0 new comments -
KIMI VL SFT ERROR
#5218 commented on
Aug 22, 2025 • 0 new comments -
support train bert from scratch?
#4195 commented on
Aug 22, 2025 • 0 new comments -
如何在自定义奖励模型中使用 vllmengine?
#4327 commented on
Aug 22, 2025 • 0 new comments -
swift infer 推理结果 和 merge后模型的推理结果输出差距过大,swift infer有思考,merge后模型无法思考
#5196 commented on
Aug 21, 2025 • 0 new comments -
qwen3 reranker模型训练
#5256 commented on
Aug 21, 2025 • 0 new comments -
About Qwen2.5 VL finetuning
#5280 commented on
Aug 21, 2025 • 0 new comments -
请问对于ovis模型,怎么正确将MAX_PARTITION传入?依靠环境变量vllm serve启动不生效
#4881 commented on
Aug 21, 2025 • 0 new comments -
LLaVA-OV-chat系列模型支持
#4187 commented on
Aug 21, 2025 • 0 new comments -
🍭[Roadmap] ms-swift3.6-3.8
#4561 commented on
Aug 20, 2025 • 0 new comments -
使用强化学习微调qwen2.5_omni_3B出现复读机情况
#5370 commented on
Aug 20, 2025 • 0 new comments -
swift export出来的模型如何用mindie进行推理
#5399 commented on
Aug 20, 2025 • 0 new comments -
multiple node training slower than single node
#4291 commented on
Aug 20, 2025 • 0 new comments -
定制化评测
#4283 commented on
Aug 19, 2025 • 0 new comments -
ModuleNotFoundError: No module named 'torch.distributed.device_mesh'
#4092 commented on
Aug 18, 2025 • 0 new comments -
Intern-S1模型的支持
#5356 commented on
Aug 18, 2025 • 0 new comments -
Eval_loss与eval_acc的趋势不一致
#2663 commented on
Aug 18, 2025 • 0 new comments -
qwen3训练卡在use_logits_to_keep: True环节一直不动
#4875 commented on
Aug 18, 2025 • 0 new comments -
想问下embedding的训练如何加入system or instructions?
#4638 commented on
Aug 18, 2025 • 0 new comments -
python attr method cannot work if llm_prefix contains nested objects in _patch_sequence_classification
#4245 commented on
Aug 18, 2025 • 0 new comments -
Discussion: Does DFT really have practical effects?
#5386 commented on
Aug 17, 2025 • 0 new comments