Who contributed to PyTorch on May 28, 2026?

3 developers shipped this update, including SaoirseARM, kirklandsign, and digantdesai.

What were the notable PyTorch updates?

[MPS] Enable NDHWC+DHWIO fast path for Conv3d on channels_last_3d (#184612), [dynamo] mimic tp_richcompare handling (#182759), and Wrap up OSDC/EC2 shadow-traffic experiment (#185181).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

MPS CONVOLUTION FAST PATH LANDS, DYNAMO GAINS RICH COMPARISON OPS

By RepoJournal · Filed 06:04 UTC on May 28, 2026 · About PyTorch

3 people shipped this

SaoirseARM @SaoirseARM 1 cited

kirklandsign @kirklandsign 1 cited

digantdesai @digantdesai 1 cited

PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does.

The MPS backend now runs 3D convolutions with channels_last_3d layout through the NDHWC fast path [1], eliminating a guard that forced slower fallback routes and adding in-graph weight transposes for bf16 and fp16 kernels when the heuristic says it's worth it. Forward and backward passes both honor layout propagation. Simultaneously, Dynamo's compile story got cleaner: the team implemented CPython's tp_richcompare dispatch mechanism [2], so comparison operators now route through generic_richcompare with proper subclass priority and fallback behavior instead of trying to call __eq__ directly. These aren't flashy features but they're the kind of foundational fixes that make real workloads faster. AOTriton bumped to 0.12b [4] with breaking changes (varlen LSE shape is now H, Total_seqlen) and expanded hardware support (gfx1100, gfx1151 out of experimental). The infra team wrapped up EC2/OSDC shadow-traffic experiments [3], reverting the dual-route testing and returning to a single config. ExecutorchRuntime, ExecutorchRuntimeException, and EValue converted from Java to Kotlin [6], completing wave 2 of the Android SDK migration with careful JNI handling. WebGPU runtime gained memory aliasing for intermediate tensors [7], and Arm backend fixed nested control-flow partition checks [5].

Action items

→ Review MPS channels_last_3d Conv3d changes if you ship inference on Apple Silicon pytorch/pytorch [plan]
→ Update code that compares objects in torch.compile for new semantics pytorch/pytorch [monitor]
→ Test varlen attention if using AOTriton 0.12b; LSE shape is now breaking pytorch/pytorch [immediate]
→ Review ExecutorchRuntime Kotlin conversion if maintaining Android SDK pytorch/executorch [plan]

References

[1] [MPS] Enable NDHWC+DHWIO fast path for Conv3d on channels_last_3d (#184612) pytorch/pytorch
[2] [dynamo] mimic tp_richcompare handling (#182759) pytorch/pytorch
[3] Wrap up OSDC/EC2 shadow-traffic experiment (#185181) pytorch/pytorch
[4] [ROCm] Bump AOTriton to 0.12b (#184288) pytorch/pytorch
[5] Arm backend: Fix nested control-flow partition checks ↗ pytorch/executorch
[6] Convert ExecuTorchRuntime, ExecutorchRuntimeException, EValue from Java to Kotlin (#19788) ↗ pytorch/executorch
[7] WebGPU: add memory aliasing for intermediate tensor buffers ↗ pytorch/executorch

Quick answers

What shipped in PyTorch on May 28, 2026?: PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does. In total, 73 commits and 28 pull requests landed.
Who contributed to PyTorch on May 28, 2026?: 3 developers shipped this update, including SaoirseARM, kirklandsign, and digantdesai.
What were the notable PyTorch updates?: [MPS] Enable NDHWC+DHWIO fast path for Conv3d on channels_last_3d (#184612), [dynamo] mimic tp_richcompare handling (#182759), and Wrap up OSDC/EC2 shadow-traffic experiment (#185181).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 9 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 9 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.