Sglang 每日 Commit 总结

Created2026-04-12|Updated2026-04-13|技术

|Post Views:

Sglang Main 分支 Commit 总结

统计时间范围：UTC+8 2026-04-11 00:00 ~ 24:00（UTC 2026-04-10 16:00 ~ 2026-04-11 16:00）

Commit 总数：30 个（非 merge commit）

一、新模型支持

Commit Message	总结	PR 链接
MiniMax-M2.5 - Support dp attention, dp reduce scatter, FP4 all gather, AR fusion in prepare_attn (#20067)	为 MiniMax-M2.5 模型增加 DP Attention、DP Reduce Scatter、FP4 All Gather 及 AR Fusion 支持	PR #20067

二、性能优化

Commit Message	总结	PR 链接
perf: precompute FA3 scheduler_metadata to eliminate per-layer prepare_varlen_num_blocks (#21104)	预计算 FA3 scheduler_metadata，消除每层的 prepare_varlen_num_blocks 调用	PR #21104
perf: enable inductor combo_kernels for horizontal fusion (#21977)	为横向融合启用 inductor combo_kernels 优化	PR #21977
[sgl] improve mamba_track_indices perf in specdec (#22380)	优化 speculative decoding 中 mamba_track_indices 的性能	PR #22380
Reduce GPU memory for MoE parallel groups (#22515)	减少 MoE 并行组的 GPU 内存占用	PR #22515
Add offline auto-tuning for LoRA CSGMV kernel (#20391)	增加 LoRA CSGMV kernel 的离线自动调优功能	PR #20391
cuda graph: adjust capture time num-non-padded-tokens to align capture with replay (#22404)	调整 cuda graph 捕获时的 num-non-padded-tokens 以对齐捕获与推理阶段	PR #22404
[VLM] GPU Image Preprocessing for Kimi-K2.5 (#22368)	为 Kimi-K2.5 增加 GPU 图像预处理支持，加速 VLM 推理	PR #22368

三、Bug Fix

Commit Message	总结	PR 链接
fix: server crash when stop_token_ids contains null (#22175)	修复 stop_token_ids 包含 null 时导致 server 崩溃的问题	PR #22175
Fix tool call constrained decoding and parsing for models with native formats (#21593)	修复具有原生工具调用格式的模型的 constrained decoding 和解析问题	PR #21593
Fix multi_layer_eagle_worker_v2 draft extend selection, add chain style multi layer mtp test (#22340)	修复 multi_layer_eagle_worker_v2 的 draft extend 选择逻辑，增加 chain style 多层 MTP 测试	PR #22340
[sgl] fix using symmetric memory issues for attention_tp (#22286)	修复 attention_tp 中使用 symmetric memory 的问题	PR #22286
[Diffusion][CI] Fix nunchaku unit test broken by #22365 (#22560)	修复被 #22365 破坏的 nunchaku 单元测试	PR #22560
[diffusion] CI: improve readability and fix bug of early-return (#22507)	修复 diffusion CI 中的 early-return bug 并提升可读性	PR #22507
[mem] Fix idle token_usage missing mamba_usage; add FIXME for naming (#22555)	修复 idle token_usage 缺失 mamba_usage 的问题	PR #22555
fix: match est_time updates by backend, not just suite (#22563)	修复 est_time 更新按 backend 匹配而非仅按 suite 匹配	PR #22563

四、server_args.py 新增参数

昨日 server_args.py 无新增命令行参数，仅有一处逻辑变更：

Commit Message	总结	PR 链接
[MUSA][9/N] Add FA3 attention backend support through MATE (MUSA AI Tensor Engine) (#22051)	MUSA 平台默认 page_size 设为 64（非 MUSA 仍为 1）	PR #22051

五、新增环境变量

环境变量	类型	说明	来源 PR
`SGLANG_MUSA_FA3_FORCE_UPDATE_METADATA`	EnvBool(False)	强制 MUSA FA3 更新 metadata	PR #22051
`SGLANG_LORA_CONFIG_DIR`	Path	LoRA 离线自动调优配置文件目录	PR #20391
`SGLANG_RECORD_STEP_TIME`	EnvBool（已有）	记录 step 时间（mem 模块新增使用）	PR #22554

六、内存与指标（Metrics/Memory）

Commit Message	总结	PR 链接
[mem] Introduce PoolStats dataclass; unify pool metrics and token_usage (#22554)	引入 PoolStats dataclass，统一 pool 指标和 token_usage 统计	PR #22554
[metrics] Add `PoolStats.update_scheduler_stats` to deduplicate metrics assignment (#22559)	增加 PoolStats.update_scheduler_stats 方法以去重指标赋值	PR #22559

七、Tokenizer 与 Serving

Commit Message	总结	PR 链接
[tokenizer] improve non streaming request processing + some small fixes. (#20310)	改进非流式请求处理逻辑，包含多项小修复	PR #20310

八、分布式与通信

Commit Message	总结	PR 链接
[sgl] _ATTN_TP and _ATTN_CP use message queue for broadcast on CPU (#22205)	_ATTN_TP 和 _ATTN_CP 在 CPU 上使用消息队列进行 broadcast	PR #22205

九、CI / 基础设施

Commit Message	总结	PR 链接
[CI] Add GB200 nightly perf regression pipeline (#22461)	增加 GB200 夜间性能回归测试流水线	PR #22461
feat: add weekly workflow to update CI test est_time values (#22545)	增加每周自动更新 CI 测试 est_time 值的 workflow	PR #22545
chore: update CI test est_time values (#22565)	更新 CI 测试 est_time 值（250 个文件）	PR #22565
fix: track est_time per suite instead of per backend (#22557)	改为按 suite 而非 backend 追踪 est_time	PR #22557
[misc] update CI_PERMISSIONS.json (#22570)	更新 CI_PERMISSIONS.json 权限配置	PR #22570
Update CI_PERMISSIONS.json (#22465)	更新 CI_PERMISSIONS.json 权限配置	PR #22465
Remove redundant test_page_size.py (#22571)	移除冗余的 test_page_size.py 测试文件	PR #22571

十、其他平台支持

Commit Message	总结	PR 链接
[MUSA][9/N] Add FA3 attention backend support through MATE (MUSA AI Tensor Engine) (#22051)	摩尔线程 MUSA 平台增加 FA3 attention backend 支持（913 行新增）	PR #22051
[AMD] Upgrade Aiter (#22264)	升级 AMD Aiter 依赖版本	PR #22264

十一、其他

Commit Message	总结	PR 链接
feat: update ModelExpress metadata API to SourceIdentity-based schema (#21222)	更新 ModelExpress metadata API 为基于 SourceIdentity 的 schema	PR #21222

重点关注摘要

新模型

MiniMax-M2.5：增加 DP Attention、FP4 All Gather 等高级特性支持

性能优化

FA3 scheduler_metadata 预计算，减少 per-layer 开销
inductor combo_kernels 用于横向融合
LoRA CSGMV kernel 离线自动调优
MoE 并行组 GPU 内存优化
Kimi-K2.5 GPU 图像预处理

Bug Fix

stop_token_ids 含 null 时 server 崩溃修复
工具调用 constrained decoding 修复
multi_layer_eagle_worker_v2 draft extend 修复
symmetric memory for attention_tp 修复

server_args.py 变更

MUSA 平台默认 page_size 设为 64

新增环境变量

SGLANG_MUSA_FA3_FORCE_UPDATE_METADATA：强制 MUSA FA3 更新 metadata
SGLANG_LORA_CONFIG_DIR：LoRA 离线调优配置目录

Author: John Doe

Link: http://example.com/2026/04/12/Sglang-%E6%AF%8F%E6%97%A5-Commit-%E6%80%BB%E7%BB%93/

Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.

每日总结 Sglang

Related Articles

Sglang 每日 Commit 总结 (2026-04-18)

Sglang 每日 Commit 总结 (2026-04-18) 统计时间范围：UTC+8 2026-04-18 00:00 - 23:59提交总数：17 个 commit 总览昨日的提交涵盖了多个模块的改进，主要包括： Diffusion 模块：HunyuanVideo 性能优化、LTX-2 两阶段设备管理器、NVFP4 后端支持模型支持：Qwen3-next 自动启用 flashinfer allreduce、MLX 平台 radix cache 支持性能优化：MoE Triton runner 重构去重、norm dispatch 简化 Bug Fix：HiCacheFile key suffix 修复、AMD 测试修复 API 变更：合并 /get_load 到 /v1/loads、移除废弃的 double sparsity 特性平台支持：NPU 文档更新、AMD ROCm DFLASH speculative decoding 一、新模型与新特性新增模型支持 Commit Message 总结 PR 链接 Qwen3...

Sglang Main Branch 每日变更总结 (2026-04-16)

Sglang Main Branch 每日变更总结日期: UTC+8 2026-04-16 (00:00 ~ 24:00)统计范围: 共 43 个 commits 一、新模型 / 模型支持昨日的提交中没有引入全新的模型，但有多项对已有模型的增强支持。 Commit Message 总结 PR 链接 [VLM] Enable per-image ViT cache and avoid TP CUDA context creation for Kimi-K2.5 (#22858) 为 Kimi-K2.5 启用逐图像 ViT 缓存，避免 TP CUDA 上下文创建，降低显存占用 PR #22858 [EPD][VLM] Support Kimi VL EPD (#22490) 为 Kimi VL 模型添加 EPD（Encode-Prefill-Decode） disaggregation 支持 PR #22490 [Bugfix] Preserve auto-detected quant_config for GLM NextN draft model ...

SGLang 每日 Commit 总结 - 2026-04-10

SGLang 每日 Commit 总结 (UTC+8 2026-04-10)昨日 main 分支共产生 40 个 commit，涵盖模型支持、性能优化、Bug 修复、CI/CD、Docker 优化等多个方面。一、新模型 / 新特性 Commit Message 总结 PR 链接 [EPD][VLM] Support Kimi K25 EPD (#22269) 支持 Kimi K25 的 EPD（Encode Prefill Disaggregation），扩展 VLM disaggregation 到 Kimi 系列模型 PR #22269 [feature] asr: add chunk-based streaming ASR for Qwen3-ASR (#22089) 为 Qwen3-ASR 添加基于 chunk 的流式语音识别（ASR）支持 PR #22089 Enable DFLASH support for additional model backends (#22358) 为 DeepSeekV2、GPT-OSS、Kim...

Sglang 代码变更总结 - 2026-04-04

Sglang 代码变更总结 (UTC+8 2026-04-04) 本文总结了 Sglang 项目在 2026年4月4日（UTC+8 0时到24时） main 分支的所有 commit 变更，共计 38 个 commit。总体概览分类 Commit 数量关键变更新模型/模型增强 4 LFM2-VL 视觉语言模型、Reasoning Tokens Usage、Score API、GLM-4.7 加载格式性能优化/Kernel 8 LoRA CUDA Graph、FA4 Speculative Decoding、VLM Chunk-aware ViT、NVFP4 CUTLASS 默认、DSV3 router GEMM 基准、norm fusion、flashinfer 0.6.7.post2、kernel 0.4.1 Bug Fix 6 killall_sglang、spec decoding flaky test、mistral embedding 回归、XGrammarBackend reset、DP attention IPv6、...

SGLang main分支每日Commit总结 - 2026-04-21

SGLang main分支 Commit 总结 (2026-04-21 UTC+8)本文总结了2026年4月21日（UTC+8 0:00-24:00）期间 SGLang main分支的所有commit，共计 51个commit。一、新模型支持昨日新增了对以下模型或模型特性的支持： Commit Message 总结 PR链接 [AMD] Fused qk rmsnorm bf16 for amd/Kimi-K2.5-MXFP4 (#23186) 为AMD GPU上的Kimi-K2.5-MXFP4模型实现fused qk rmsnorm bf16优化 #23186 [AMD] Enable MTP for GLM-5-mxfp4 model (#23219) 为GLM-5-mxfp4模型启用MTP（Multi-Token Prediction）支持 #23219 [Diffusion][CPU] Init CPU platform support for SGLang Diffusion (#20816) 为SGLang Diffusion初始化CPU平台支...

Sglang 代码变更总结 - 2026-04-02

Sglang 代码变更总结 (UTC+8 2026-04-02) 本文总结了 Sglang 项目在 2026年4月2日（UTC+8 0时到24时） main 分支的所有 commit 变更，共计 41 个 commit。总体概览分类 Commit 数量关键变更新模型/模型增强 3 GLM-4.7-Flash(NPU)、MiMo-V2-Flash reasoning parser、MiniMax-M2.5 FP8 MoE 性能优化/Kernel 7 融合 temperature+softmax sampling、ngram corpus 迁移到 TVM FFI、trtllm sparse MLA kernel、NSA trtllm 默认(Blackwell)、DSA trtllm default、flashinfer_trtllm mxfp8 gemm、多线程权重加载默认启用 Bug Fix 6 spec_v2+logprob、multi tool streaming、PCG torch dynamo recompile、tokeni...