RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

MPS CONVOLUTION FAST PATH LANDS, DYNAMO GAINS RICH COMPARISON OPS

By RepoJournal · Filed · About PyTorch

PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does.

The MPS backend now runs 3D convolutions with channels_last_3d layout through the NDHWC fast path [1], eliminating a guard that forced slower fallback routes and adding in-graph weight transposes for bf16 and fp16 kernels when the heuristic says it's worth it. Forward and backward passes both honor layout propagation. Simultaneously, Dynamo's compile story got cleaner: the team implemented CPython's tp_richcompare dispatch mechanism [2], so comparison operators now route through generic_richcompare with proper subclass priority and fallback behavior instead of trying to call __eq__ directly. These aren't flashy features but they're the kind of foundational fixes that make real workloads faster. AOTriton bumped to 0.12b [4] with breaking changes (varlen LSE shape is now H, Total_seqlen) and expanded hardware support (gfx1100, gfx1151 out of experimental). The infra team wrapped up EC2/OSDC shadow-traffic experiments [3], reverting the dual-route testing and returning to a single config. ExecutorchRuntime, ExecutorchRuntimeException, and EValue converted from Java to Kotlin [6], completing wave 2 of the Android SDK migration with careful JNI handling. WebGPU runtime gained memory aliasing for intermediate tensors [7], and Arm backend fixed nested control-flow partition checks [5].

Action items

References

  1. [1] [MPS] Enable NDHWC+DHWIO fast path for Conv3d on channels_last_3d (#184612) pytorch/pytorch
  2. [2] [dynamo] mimic tp_richcompare handling (#182759) pytorch/pytorch
  3. [3] Wrap up OSDC/EC2 shadow-traffic experiment (#185181) pytorch/pytorch
  4. [4] [ROCm] Bump AOTriton to 0.12b (#184288) pytorch/pytorch
  5. [5] Arm backend: Fix nested control-flow partition checks ↗ pytorch/executorch
  6. [6] Convert ExecuTorchRuntime, ExecutorchRuntimeException, EValue from Java to Kotlin (#19788) ↗ pytorch/executorch
  7. [7] WebGPU: add memory aliasing for intermediate tensor buffers ↗ pytorch/executorch

FAQ

What changed in PyTorch on May 28, 2026?
PyTorch shipped critical inference optimizations for Apple Silicon while torch.compile finally handles Python's comparison protocol the way CPython does.
What should PyTorch teams do about it?
Review MPS channels_last_3d Conv3d changes if you ship inference on Apple Silicon • Update code that compares objects in torch.compile for new semantics • Test varlen attention if using AOTriton 0.12b; LSE shape is now breaking
Which PyTorch repositories shipped on May 28, 2026?
pytorch/pytorch, pytorch/executorch

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.