The Wire · Showcase
MPS CONVOLUTION FAST PATH LANDS, DYNAMO GAINS RICH COMPARISON OPS
By RepoJournal · Filed · About PyTorch
PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does.
The MPS backend now runs 3D convolutions with channels_last_3d layout through the NDHWC fast path [1], eliminating a guard that forced slower fallback routes and adding in-graph weight transposes for bf16 and fp16 kernels when the heuristic says it's worth it. Forward and backward passes both honor layout propagation. Simultaneously, Dynamo's compile story got cleaner: the team implemented CPython's tp_richcompare dispatch mechanism [2], so comparison operators now route through generic_richcompare with proper subclass priority and fallback behavior instead of trying to call __eq__ directly. These aren't flashy features but they're the kind of foundational fixes that make real workloads faster. AOTriton bumped to 0.12b [4] with breaking changes (varlen LSE shape is now H, Total_seqlen) and expanded hardware support (gfx1100, gfx1151 out of experimental). The infra team wrapped up EC2/OSDC shadow-traffic experiments [3], reverting the dual-route testing and returning to a single config. ExecutorchRuntime, ExecutorchRuntimeException, and EValue converted from Java to Kotlin [6], completing wave 2 of the Android SDK migration with careful JNI handling. WebGPU runtime gained memory aliasing for intermediate tensors [7], and Arm backend fixed nested control-flow partition checks [5].
Action items
- → Review MPS channels_last_3d Conv3d changes if you ship inference on Apple Silicon pytorch/pytorch [plan]
- → Update code that compares objects in torch.compile for new semantics pytorch/pytorch [monitor]
- → Test varlen attention if using AOTriton 0.12b; LSE shape is now breaking pytorch/pytorch [immediate]
- → Review ExecutorchRuntime Kotlin conversion if maintaining Android SDK pytorch/executorch [plan]
References
- [1] [MPS] Enable NDHWC+DHWIO fast path for Conv3d on channels_last_3d (#184612) pytorch/pytorch
- [2] [dynamo] mimic tp_richcompare handling (#182759) pytorch/pytorch
- [3] Wrap up OSDC/EC2 shadow-traffic experiment (#185181) pytorch/pytorch
- [4] [ROCm] Bump AOTriton to 0.12b (#184288) pytorch/pytorch
- [5] Arm backend: Fix nested control-flow partition checks ↗ pytorch/executorch
- [6] Convert ExecuTorchRuntime, ExecutorchRuntimeException, EValue from Java to Kotlin (#19788) ↗ pytorch/executorch
- [7] WebGPU: add memory aliasing for intermediate tensor buffers ↗ pytorch/executorch
FAQ
- What changed in PyTorch on May 28, 2026?
- PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does.
- What should PyTorch teams do about it?
- Review MPS channels_last_3d Conv3d changes if you ship inference on Apple Silicon • Update code that compares objects in torch.compile for new semantics • Test varlen attention if using AOTriton 0.12b; LSE shape is now breaking
- Which PyTorch repositories shipped on May 28, 2026?
- pytorch/pytorch, pytorch/executorch