Sglang 代码变更总结 (UTC+8 2026-04-06)

本文总结了 Sglang 项目在 2026年4月6日（UTC+8 0时到24时） main 分支的所有 commit 变更，共计 24 个 commit。

总体概览

分类	Commit 数量	关键变更
新模型/模型增强	1	LTX2.3 视频扩散模型
性能优化/特性	4	Ngram Spec 外部语料库+后缀自动机、Ngram anchor match state、gfx95 量化格式缓存、TRT-LLM router_logits dtype
server_args.py 新增参数	3	`--speculative-ngram-external-corpus-path`、`--speculative-ngram-external-sam-budget`、`--speculative-ngram-external-corpus-max-tokens`
新增环境变量	4	`SGLANG_DISAGG_STAGING_BUFFER`、`SGLANG_DISAGG_STAGING_BUFFER_SIZE_MB`、`SGLANG_DISAGG_STAGING_POOL_SIZE_MB`、`SGLANG_STAGING_USE_TORCH`
Bug Fix	5	router gemm sm103 hotfix、TRT-LLM MHA CUDA 非法地址、hisparse LRU、hisparse minor fix、Spec V2 bug fixes
重构/对齐	2	incremental streaming logprobs 对齐、MatchState followup 修复
CI/测试	6	staging buffer CI、auth.py 单测、function_call 单测、tiktoken 单测、diffusion diffusers 后端、setuptools-scm 版本
PD Disaggregation	1	staging buffer CI 测试和文档
其他	3	测试覆盖率报告、test skills 更新、移除 cu13 docker 工作流

一、新模型与模型增强

1.1 LTX2.3 视频扩散模型

新增 LTX2.3 视频扩散模型支持，包括 DiT 配置、adapter 连接器、VAE、pipeline 配置和模型 overlay 等完整组件。

Commit Message	总结	PR 链接
`[diffusion] model: support LTX2.3 (#22111)`	新增 LTX2.3 视频扩散模型支持，涵盖 DiT、adapter 连接器、VAE、pipeline 配置和 entrypoint	PR #22111

二、性能优化与特性

2.1 Ngram Speculative Decoding 外部语料库

引入外部语料库支持，通过后缀自动机（Suffix Automaton）构建只读 SAM 索引，大幅提升 ngram speculative decoding 的覆盖率。这是当天最大的特性变更。

Commit Message	总结	PR 链接
`[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)`	支持从外部语料库加载数据并构建后缀自动机（SAM），为 ngram speculative decoding 提供只读索引	PR #21425

2.2 Ngram Anchor Match State

Commit Message	总结	PR 链接
`[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243)`	在 decode 步骤间存储和推进 anchor match state，优化 ngram 匹配效率	PR #21243

2.3 AMD gfx95 量化格式缓存

Commit Message	总结	PR 链接
`Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143)`	缓存 gfx95 量化格式检测结果，避免在 DeepseekV2DecoderLayer 中重复检测	PR #22143

2.4 TRT-LLM router_logits dtype 修复

Commit Message	总结	PR 链接
`Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006)`	修复 TRT-LLM FP8 per-tensor scale MoE wrapper 中 router_logits 的数据类型问题	PR #22006

三、server_args.py 新增参数

本次时间窗口内新增 3 个 命令行参数，均用于 ngram speculative decoding 的外部语料库功能：

Commit Message	参数名	说明	PR 链接
`[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)`	`--speculative-ngram-external-corpus-path` (str, 默认 None)	外部语料库路径，用于构建只读 SAM 索引	PR #21425
`[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)`	`--speculative-ngram-external-sam-budget` (int, 默认 0)	为外部 SAM 子树保留的 draft 节点数，必须为正数且小于 `speculative_num_draft_tokens - 1`	PR #21425
`[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)`	`--speculative-ngram-external-corpus-max-tokens` (int, 默认 10000000)	外部语料库的最大 token 数上限	PR #21425

四、新增环境变量

本次时间窗口内在文档中新增了 4 个 PD Disaggregation Staging Buffer 相关的环境变量：

环境变量	默认值	说明	来源 Commit
`SGLANG_DISAGG_STAGING_BUFFER`	`false`	启用 GPU staging buffer 用于异构 TP KV 传输，适用于 prefill 和 decode 使用不同 TP/attention-TP 的场景（仅非 MLA 模型）	PR #21921
`SGLANG_DISAGG_STAGING_BUFFER_SIZE_MB`	`64`	Prefill 端每个 worker 的 staging buffer 大小（MB），用于在批量 RDMA 传输前聚合 KV head 切片	PR #21921
`SGLANG_DISAGG_STAGING_POOL_SIZE_MB`	`4096`	Decode 端环形缓冲区池总大小（MB），接收所有 prefill rank 的 RDMA 数据，越大支持越高并发	PR #21921
`SGLANG_STAGING_USE_TORCH`	`false`	强制使用 PyTorch gather/scatter 回退而非 Triton fused kernel，用于调试	PR #21921

五、Bug Fix

Commit Message	总结	PR 链接
`[Hotfix] Fix router gemm on sm103 (#22134)`	修复 SM103（Blackwell）上 router GEMM 的崩溃问题	PR #22134
`fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)`	修复 EAGLE v2 + DP attention 场景下 TRT-LLM MHA 的 CUDA 非法地址错误	PR #21649
`fix hisparse LRU policy (#22170)`	修复 hisparse 的 LRU 缓存淘汰策略	PR #22170
`Hisparse Minor Fix (#22131)`	hisparse 的小幅修复	PR #22131
`[sgl] two potential spec_v2 bug fixes (#21589)`	修复 spec_v2 中的两个潜在 bug（logits_processor 和 eagle_worker_v2）	PR #21589

六、重构与对齐

Commit Message	总结	PR 链接
`Align incremental streaming logprobs with streamed output tokens (#21583)`	对齐 incremental streaming logprobs 与 streamed output tokens 的行为，确保一致性	PR #21583
`[Spec][Ngram] Followup fixes for MatchState incremental advance (#22180)`	修复 MatchState 增量推进的后续问题，移动测试到 unit 目录	PR #22180

七、CI / 测试

Commit Message	总结	PR 链接
`Add staging buffer CI test and documentation for heterogeneous TP (#21921)`	为异构 TP staging buffer 增加 CI 测试和文档	PR #21921
`[CI] Add unit tests for srt/utils/auth.py (#21400)`	为 auth.py 增加单元测试	PR #21400
`[CI] Add unit tests for function_call detectors (hermes, llama32, mistral) (#21399)`	为 function_call 检测器（hermes、llama32、mistral）增加单元测试	PR #21399
`[Test] Add unit tests for srt/tokenizer/tiktoken_tokenizer (#21107)`	为 tiktoken_tokenizer 增加单元测试	PR #21107
`[diffusion] CI: apply diffusers backend in lora case (#22157)`	在 LoRA 场景中应用 diffusers 后端进行 CI 测试	PR #22157
`[Fix] Fix setuptools-scm version resolution for rc tags (#22165)`	修复 setuptools-scm 在 rc 标签下的版本解析问题	PR #22165

八、其他变更

Commit Message	总结	PR 链接
`Update test coverage report (#22190)`	更新测试覆盖率报告	PR #22190
`Update test skills and guide (#22189)`	更新测试技能和指南文档	PR #22189
`[Doc] Fix and improve DeepSeek V3.2/GLM-5 documentation (#22179)`	修复并改进 DeepSeek V3.2 和 GLM-5 的文档	PR #22179
`[Misc] Remove unused cu13 docker release workflow (#22167)`	移除未使用的 cu13 docker 发布工作流	PR #22167
`Update ci_auto_bisect.py to have streak 1 so that all failures will be reported (#22161)`	更新 ci_auto_bisect.py 将 streak 设为 1，报告所有失败	PR #22161
`Fix create_grammar_backend test calls with think_end_id (#22158)`	修复 create_grammar_backend 测试调用中 think_end_id 的参数问题	PR #22158
`Fix ut module importing (#22176)`	修复 UT 模块导入问题	PR #22176

重点关注总结

新增模型

LTX2.3: 视频扩散模型，含 DiT、adapter 连接器、VAE 完整组件

性能优化

Ngram Spec 外部语料库 + 后缀自动机: 从外部语料库构建只读 SAM 索引，提升 ngram speculative 覆盖率
Ngram anchor match state: decode 步骤间存储和推进 match state
gfx95 量化格式缓存: 避免 DeepseekV2DecoderLayer 中重复检测
TRT-LLM router_logits dtype 修复: 修正数据类型问题

server_args.py 新增参数

--speculative-ngram-external-corpus-path: 外部语料库路径
--speculative-ngram-external-sam-budget: 外部 SAM 子树 draft 节点预算
--speculative-ngram-external-corpus-max-tokens: 外部语料库最大 token 数

新增环境变量

SGLANG_DISAGG_STAGING_BUFFER: 启用 GPU staging buffer（异构 TP）
SGLANG_DISAGG_STAGING_BUFFER_SIZE_MB: Prefill 端 staging buffer 大小（默认 64MB）
SGLANG_DISAGG_STAGING_POOL_SIZE_MB: Decode 端环形缓冲池大小（默认 4096MB）
SGLANG_STAGING_USE_TORCH: 强制 PyTorch 回退模式

Bug Fix

SM103 router GEMM: Blackwell 上 router GEMM hotfix
TRT-LLM MHA CUDA 非法地址: EAGLE v2 + DP attention 场景修复
hisparse LRU 策略: 缓存淘汰修复
Spec V2: 两个潜在 bug 修复