What should PyTorch teams do about it?

Review Halide fusion behavior in production pipelines using transposed mutations • If running ROCm with FP4 quantization or split embeddings, pull FBGEMM deadlock fixes • TorchTitan users: validate checkpoint compatibility after state_dict API decoupling

Which PyTorch repositories shipped on June 13, 2026?

pytorch/pytorch, pytorch/FBGEMM, pytorch/executorch, pytorch/torchtitan

HALIDE FUSION BUG CRUSH, ROCM DEADLOCK FIXES ACROSS THE STACK

By RepoJournal · Filed 06:03 UTC on June 13, 2026 · About PyTorch

PyTorch's Halide backend had a critical inplace mutation fusion bug that could silently corrupt results on aliased reads, and it's not alone: ROCm deadlocks are being systematically hunted across FBGEMM and core compute kernels.

The Halide inplace mutation fusion fix [1] addresses a subtle but dangerous bug where TailStrategy::ShiftInwards could overcompute output tiles and reread stale buffer values during vertically fused operations with transposed destinations. This is production-critical for anyone using Halide autoscheduling on mutation-heavy workloads. Meanwhile, the ROCm team is methodically closing deadlock vectors: FBGEMM's compute_amax_and_quantize_kernel [2] had threads hitting early returns before barrier synchronization, a pattern that kills HIP execution without obvious error messages. The same audit caught grid overflow issues in direct_mapped_lxu_cache_lookup_kernel [3], applying canonical caps to prevent silent launch failures. On the ExecutorTorch side, the Arm backend is shipping incremental gains: TOSA dialect ARGMAX support [4], dim mapping helpers for shape-changing operators [5], and adaptive pooling decomposition [6]. TorchTitan shipped a checkpoint compatibility cleanup [7] decoupling from PyTorch distributed state_dict APIs, critical for checkpoint portability, while also reverting a deterministic topk change [8] that broke internal numerics and will reland upstream once semantics stabilize. The MoE sequence parallelism bug fix [9] corrects token index placement when tensor, expert, and sequence parallelism run together, preventing routing misplacement in large multi-axis parallel setups.

FAQ

What changed in PyTorch on June 13, 2026?: PyTorch's Halide backend had a critical inplace mutation fusion bug that could silently corrupt results on aliased reads, and it's not alone: ROCm deadlocks are being systematically hunted across FBGEMM and core compute kernels.
What should PyTorch teams do about it?: Review Halide fusion behavior in production pipelines using transposed mutations • If running ROCm with FP4 quantization or split embeddings, pull FBGEMM deadlock fixes • TorchTitan users: validate checkpoint compatibility after state_dict API decoupling
Which PyTorch repositories shipped on June 13, 2026?: pytorch/pytorch, pytorch/FBGEMM, pytorch/executorch, pytorch/torchtitan

@pytorch

HALIDE FUSION BUG CRUSH, ROCM DEADLOCK FIXES ACROSS THE STACK

The showcase is a teaser.
Your wire is the product.

HALIDE FUSION BUG CRUSH, ROCM DEADLOCK FIXES ACROSS THE STACK

The showcase is a teaser. Your wire is the product.

The showcase is a teaser.
Your wire is the product.