RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

HALIDE FUSION BUG CRUSH, ROCM DEADLOCK FIXES ACROSS THE STACK

By RepoJournal · Filed · About PyTorch

PyTorch's Halide backend had a critical inplace mutation fusion bug that could silently corrupt results on aliased reads, and it's not alone: ROCm deadlocks are being systematically hunted across FBGEMM and core compute kernels.

The Halide inplace mutation fusion fix [1] addresses a subtle but dangerous bug where TailStrategy::ShiftInwards could overcompute output tiles and reread stale buffer values during vertically fused operations with transposed destinations. This is production-critical for anyone using Halide autoscheduling on mutation-heavy workloads. Meanwhile, the ROCm team is methodically closing deadlock vectors: FBGEMM's compute_amax_and_quantize_kernel [2] had threads hitting early returns before barrier synchronization, a pattern that kills HIP execution without obvious error messages. The same audit caught grid overflow issues in direct_mapped_lxu_cache_lookup_kernel [3], applying canonical caps to prevent silent launch failures. On the ExecutorTorch side, the Arm backend is shipping incremental gains: TOSA dialect ARGMAX support [4], dim mapping helpers for shape-changing operators [5], and adaptive pooling decomposition [6]. TorchTitan shipped a checkpoint compatibility cleanup [7] decoupling from PyTorch distributed state_dict APIs, critical for checkpoint portability, while also reverting a deterministic topk change [8] that broke internal numerics and will reland upstream once semantics stabilize. The MoE sequence parallelism bug fix [9] corrects token index placement when tensor, expert, and sequence parallelism run together, preventing routing misplacement in large multi-axis parallel setups.

Action items

References

  1. [1] Fix Halide inplace mutation fusion with aliased reads (#186121) pytorch/pytorch
  2. [2] Fix ROCm __syncthreads deadlock in compute_amax_and_quantize_kernel pytorch/FBGEMM
  3. [3] Fix HIP grid overflow in direct_mapped_lxu_cache_lookup_kernel (#5882) pytorch/FBGEMM
  4. [4] Arm backend: Add TOSA dialect ARGMAX op ↗ pytorch/executorch
  5. [5] Arm backend: Add dim mapping helpers ↗ pytorch/executorch
  6. [6] Arm backend: Add adaptive pooling node visitors ↗ pytorch/executorch
  7. [7] [Checkpointer] Remove the dependencies on PyTorch distributed state_dict APIs (#3623) pytorch/torchtitan
  8. [8] Revert "Add deterministic topk for MoE routing" ↗ pytorch/torchtitan
  9. [9] [Bug] Fix MoE SP token combine indices ↗ pytorch/torchtitan

FAQ

What changed in PyTorch on June 13, 2026?
PyTorch's Halide backend had a critical inplace mutation fusion bug that could silently corrupt results on aliased reads, and it's not alone: ROCm deadlocks are being systematically hunted across FBGEMM and core compute kernels.
What should PyTorch teams do about it?
Review Halide fusion behavior in production pipelines using transposed mutations • If running ROCm with FP4 quantization or split embeddings, pull FBGEMM deadlock fixes • TorchTitan users: validate checkpoint compatibility after state_dict API decoupling
Which PyTorch repositories shipped on June 13, 2026?
pytorch/pytorch, pytorch/FBGEMM, pytorch/executorch, pytorch/torchtitan

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.