Who contributed to PyTorch on May 30, 2026?

1 developer shipped this update, including ethansfng.

What were the notable PyTorch updates?

[MLX][Gemma4] Add turbo quant support (#19866), Add fuse() to remaining QuantizationPatterns (#19727), and Add fuse() to QuantizationPatterns (#19726).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

EXECUTORCH ADDS TURBOQUANT TO GEMMA4, PYTORCH PURGES FLAKY DYNAMO TESTS

By RepoJournal · Filed 06:03 UTC on May 30, 2026 · About PyTorch

1 person shipped this

ethansfng @ethansfng 2 cited

ExecutorchGemma 4 31B can now handle arbitrarily long contexts with TurboQuant 4-bit KV cache compression, while PyTorch is ripping out the dynamo_eager and aot_eager integration tests that have been bleeding flakiness into trunk.

The ExecutorTorch team shipped TurboQuant TQ4 support for the MLX backend [1], compressing full-attention KV caches from bf16 to 4-bit codebooks plus per-vector norms. This lets Gemma 4 31B-IT scale to very long contexts without touching sliding-window layers. The same team landed fuse() implementations across all remaining Cadence QuantizationPattern subclasses [2] [3] and enabled QuantFusionPass in the compiler pipeline [4], unifying quantization fusion logic across backends. On the PyTorch side, the compiler team is nuking the dynamo_eager and aot_eager integration tests from inductor-periodic CI [5], which have been chronic sources of flakiness without generating useful signals. Concurrently, they're decoupling the aoti_cross_compile_for_windows shard from the main cuda13 test job [7], so Windows build breaks no longer take down an entire day of CUDA testing. The inductor team also fixed a C++ Most Vexing Parse bug in cpp_wrapper_cpu_array_ref [6] that was breaking thread_local declarations when constructors were involved. Dynamo tooling is getting hardened too: debug and repro utilities are being made device-agnostic [8] so non-CUDA accelerators can actually generate reproduction scripts.

Action items

→ Review the new QuantFusionPass implementation if you maintain quantization backends pytorch/executorch [plan]
→ Skip dynamo_eager and aot_eager tests in your local inductor validation runs pytorch/pytorch [monitor]
→ Verify TurboQuant integration with your Gemma4 deployment pipeline pytorch/executorch [plan]

References

[1] [MLX][Gemma4] Add turbo quant support (#19866) pytorch/executorch
[2] Add fuse() to remaining QuantizationPatterns (#19727) ↗ pytorch/executorch
[3] Add fuse() to QuantizationPatterns (#19726) ↗ pytorch/executorch
[4] Enable QuantFusionPass in compiler pipeline (#19728) (#19728) pytorch/executorch
[5] [CI] Nuke all the dynamo_eager and aot_eager integration tests (#185224) pytorch/pytorch
[6] [inductor] Fix C++ Most Vexing Parse in cpp_wrapper_cpu_array_ref (#185257) (#185257) pytorch/pytorch
[7] Decouple aoti cross-compile shard from main cuda13 test job (#185680) pytorch/pytorch
[8] Make dynamo debug/repro utilities device-agnostic (#184851) pytorch/benchmark

Quick answers

What shipped in PyTorch on May 30, 2026?: ExecutorchGemma 4 31B can now handle arbitrarily long contexts with TurboQuant 4-bit KV cache compression, while PyTorch is ripping out the dynamo_eager and aot_eager integration tests that have been bleeding flakiness into trunk. In total, 79 commits, 25 pull requests, and 2 releases landed.
Who contributed to PyTorch on May 30, 2026?: 1 developer shipped this update, including ethansfng.
What were the notable PyTorch updates?: [MLX][Gemma4] Add turbo quant support (#19866), Add fuse() to remaining QuantizationPatterns (#19727), and Add fuse() to QuantizationPatterns (#19726).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 9 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 9 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.