Who contributed to PyTorch on May 29, 2026?

4 developers shipped this update, including ethansfng, mergennachin, AdrianLundell, and wwwjn.

What were the notable PyTorch updates?

Hot-load Triton bundles from cache artifacts (#184953), Make profiler resilient to duplicate flow start IDs (#184792), and [Regional AOTI] Mutate root ScriptModule in place in _replace_submodule_with_typecheck_pybind (#185321) (#185321).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

PYTORCH SHIPS TRITON CACHE HOT-LOAD AND PROFILER RESILIENCE FIXES

By RepoJournal · Filed 06:04 UTC on May 29, 2026 · About PyTorch

4 people shipped this

wwwjn @wwwjn 2 cited

ethansfng @ethansfng 1 cited

mergennachin @mergennachin 1 cited

AdrianLundell @AdrianLundell 1 cited

Inductor's FX graph cache now hot-loads Triton bundles directly from serialized artifacts, eliminating the delay until cache-hit compilation [ref:1].

The cache hot-load fix [1] means compiled graphs emit Triton code and cubin files immediately on load instead of waiting for a second cache hit. This cuts time-to-first-inference on cached models and simplifies the emit pipeline. Separately, the PyTorch profiler now handles duplicate async-flow correlation IDs from active backends without crashing [2], allowing workloads to complete instead of aborting. Regional AOTI's submodule replacement now mutates the root ScriptModule in place [3], avoiding expensive cloning during compilation. Three emerging fixes address correctness gaps: named tensor revert [4] restores vLLM and TPU compatibility after a lint-driven change broke downstream, while OpInfo NumPy tests now run on CPU [5] after a logic error skipped them entirely. On the executorch desk, QuantFusionPass adds shared fusion infrastructure for quantization patterns [6], MLX now supports GGUF exports for Gemma 4 31B [7], and a new general Aten lowering pass [8] reuses single-op dialect replacements across backends. TorchTitan's RL loop now batches episodes with configurable microbatch sizing [9], and RoPE refactoring enforces model-intrinsic sequence length limits as hard errors [10].

Action items

→ Pull cache hot-load fix (pytorch#184953) into your inference pipeline - eliminates cache-hit compilation delay pytorch/pytorch [plan]
→ Update to profiler resilience patch (pytorch#184792) if workloads use ROCm or duplicate flow IDs pytorch/pytorch [plan]
→ If shipping vLLM or TPU code, verify named tensor revert (pytorch#173895) compatibility before next release pytorch/pytorch [monitor]
→ Review RoPE sequence length enforcement (torchtitan#3395) if custom models override max_sequence_length pytorch/torchtitan [monitor]

References

[1] Hot-load Triton bundles from cache artifacts (#184953) pytorch/pytorch
[2] Make profiler resilient to duplicate flow start IDs (#184792) pytorch/pytorch
[3] [Regional AOTI] Mutate root ScriptModule in place in _replace_submodule_with_typecheck_pybind (#185321) (#185321) pytorch/pytorch
[4] Revert "Remove named tensor (#173895)" pytorch/pytorch
[5] [test] Remove unintentional skip for OpInfo test against NumPy on CPU (#182999) pytorch/pytorch
[6] Add shared fusion infrastructure and QuantFusionPass (#19724) ↗ pytorch/executorch
[7] Add GGUF → MLX export support for Gemma 4 31B ↗ pytorch/executorch
[8] Add general Aten lowering pass ↗ pytorch/executorch
[9] [rl] Add Batcher in RL Loop ↗ pytorch/torchtitan
[10] RoPE refactor: Using model's max_sequence_length as the upper bound of Training.sequence_length ↗ pytorch/torchtitan

Quick answers

What shipped in PyTorch on May 29, 2026?: Inductor's FX graph cache now hot-loads Triton bundles directly from serialized artifacts, eliminating the delay until cache-hit compilation [ref:1]. In total, 80 commits and 30 pull requests landed.
Who contributed to PyTorch on May 29, 2026?: 4 developers shipped this update, including ethansfng, mergennachin, AdrianLundell, and wwwjn.
What were the notable PyTorch updates?: Hot-load Triton bundles from cache artifacts (#184953), Make profiler resilient to duplicate flow start IDs (#184792), and [Regional AOTI] Mutate root ScriptModule in place in _replace_submodule_with_typecheck_pybind (#185321) (#185321).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 9 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 9 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.