SGLang 每日 Commit 总结 (2026-04-20)

SGLang 每日 Commit 总结

日期： UTC+8 2026-04-20 00:00 ~ 24:00
分支： main
Commit 总数： 20 个

昨日的修改主要集中在以下几个方向：

Multi platform Plugin 系统：引入了平台插件机制，支持 OOT（Out-of-Tree）平台通过插件方式接入，包括平台接口定义、插件钩子注册、server_args 默认值注入等核心基础设施。
Native gRPC 支持：添加了原生 gRPC 服务器的 proto 定义、Rust crate 脚手架以及 server_args 集成，支持环境变量控制。
StreamingSession 核心重构：将 StreamingSession 深度集成到 UnifiedRadixCache 中，包括 session 模块目录迁移和 always-on 模式。
Diffusion 模型优化：针对 diffusion 模型的图像/视频输入加载方式进行了性能优化，并对 LTX2.3 进行了代码清理。
Bugfix 和 CI 改进：修复了 DeepEP 编译超时、KV-Events 在 CP 模式下的发布问题、AMD TBO 运行时错误等多个 bug。

变更	说明
`enable_grpc` (实例属性)	从环境变量 `SGLANG_ENABLE_GRPC` 读取，非 CLI 参数
`grpc_port` (实例属性)	从环境变量 `SGLANG_GRPC_PORT` 读取，默认 `port + 10000`，需与 `--port` 不同
`--enable-streaming-session` help 文本	更新帮助文本，将 “SessionAwareCache” 改为 “StreamingSession”
`enable_two_batch_overlap` 约束	移除了 `attn_cp_size <= 1` 的限制条件，使 CP 模式下也可使用 two batch overlap
OOT 平台默认值注入	在 server_args 初始化中调用 `current_platform.apply_server_args_defaults()`
OOT 平台 piecewise cuda graph	OOT 平台不支持 piecewise cuda graph 时自动禁用
OOT 平台 attention backend	OOT 平台可自定义默认 attention backend

Commit Message	总结	PR 链接
`Multi platform Plugin (#21388)`	引入完整的平台插件系统，支持 OOT 平台通过接口定义、钩子注册、server_args 默认值注入等方式接入 SGLang	PR #21388
`Support allreduce fusion with cp (#21249)`	支持 CP（Context Parallel）模式下的 allreduce 融合通信优化，移除 two batch overlap 的 CP 限制	PR #21249
`[diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118)`	优化 diffusion 模型的 URL/base64 图像输入，默认采用内存加载方式提升性能	PR #23118
`[core] Always-on StreamingSession in UnifiedRadixCache (#23202)`	在 UnifiedRadixCache 中启用 always-on 的 StreamingSession 模式，优化流式会话的缓存管理	PR #23202
`integrate streaming session into UnifiedRadixCache (#23145)`	将 StreamingSession 集成到 UnifiedRadixCache，完善 scheduler 和 cache 初始化流程	PR #23145
`move session to python/sglang/srt/session (#23144)`	将 session 相关代码从 mem_cache/managers 迁移到独立的 session 目录，提升代码组织结构	PR #23144

Commit Message	总结	PR 链接
`[gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736)`	添加原生 gRPC 服务器的 proto 定义、Rust crate 脚手架、server_args 和环境变量集成	PR #22736
`[gRPC] Pass --experimental_allow_proto3_optional to protoc in build.rs (#23226)`	修复 gRPC 构建时 protoc 对 proto3 optional 字段的支持问题	PR #23226

Commit Message	总结	PR 链接
`[diffusion] refactor: LTX2.3 code cleanup (#23207)`	对 LTX2.3 diffusion 模型的 pipeline 代码进行清理和重构，简化 denoising 流程	PR #23207

Commit Message	总结	PR 链接
`[Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185)`	修复在 EP+DP+TP 组合并行模式下编译 DeepGeMM 时 DeepEP 超时的问题	PR #23185
`[KV-Events] Fix kv events events publishing for CP (#22983)`	修复 KV-Events 在 CP（Context Parallel）模式下的事件发布问题	PR #22983
`[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598)`	修复 AMD 平台上 two batch overlap (TBO) 在初始化 cuda graph 元数据时的运行时错误	PR #22598
`[AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)`	固定 peft 版本 < 0.19，修复 ROCm CI 中的 ImportError	PR #23161
`[Bugfix] Add missing http_worker_ipc in session error path (#22766)`	修复 session 错误路径中遗漏的 http_worker_ipc 清理问题	PR #22766
`wait for reap in kill_process_tree (#23213)`	修复 kill_process_tree 中等待子进程回收的逻辑，避免僵尸进程	PR #23213
`Fix test_modelopt_export using stale ModelConfig kwargs (#23214)`	修复测试中 ModelConfig 使用了过时的关键字参数	PR #23214
`Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)`	回退 FP8 模型 PCG inductor 路径优化（因引入问题）	PR #23159

Commit Message	总结	PR 链接
`[Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)`	将 NSA 相关工具函数去重并合并到 cp_utils.py，统一 context parallel 的工具函数管理	PR #22914

Commit Message	总结	PR 链接
`[CI] Partition stage-a-test-cpu into 4 matrix shards (#23208)`	将 CPU 测试阶段拆分为 4 个 matrix shard，加速 CI 执行	PR #23208
`[CI] Exclude diffusion-specific paths from main_package filter (#23053)`	在 main_package 过滤器中排除 diffusion 相关路径，减少不必要的 CI 触发	PR #23053
`fix(ci): repair path filters regressed by #21482 (#23201)`	修复被 #21482 回归的 CI 路径过滤器（覆盖 AMD、NPU、Xeon、XPU 等平台）	PR #23201
`ci: run weekly est_time update on Monday using p90 of last 15 runs (#23120)`	调整每周 CI 估计时间更新逻辑，改为周一运行并使用最近 15 次运行的 p90 值	PR #23120
`[AMD] Update AMD workflow name (#23245)`	更新 AMD ROCm 7.2 CI 工作流名称	PR #23245
`[AMD] Fix multimodal timeout issue : rocm7.2 PR Test (#23247)`	修复 AMD ROCm 7.2 PR 测试中多模态超时问题	PR #23247

昨日没有新增模型支持。