The Wire · Showcase
CUDA BACKWARD KERNEL BUG FIXED AS EXECUTORCH EXPANDS DYNAMIC SHAPE SUPPORT
By RepoJournal · Filed · About PyTorch
PyTorch shipped a critical fix for avg_pool2d CUDA backward on channels_last inputs that was computing wrong window coordinates, while ExecuTorch landed SymInt arithmetic ops to unlock truly dynamic shape inference on WebGPU.
The avg_pool2d_backward_out_cuda_frame_nhwc kernel was recovering flat input coordinates without accounting for padding, causing window formulas to operate on unpadded indices [1]. This affected any channels_last workflow using pooling with padding on CUDA. Separately, PyTorch merged a scalar bias gradient fix [2] that required waiting for upstream buffer grad changes to land first, unblocking a FlexAttention learnable scalar pattern that had workarounds but was too clunky for production. On the build side, print_sccache_stats now warns when sccache is optional but missing and fails loudly when it's expected [3], solving the silent failure problem on Linux CI where engineers couldn't detect whether the cache layer was actually present. ExecuTorch shipped SymInt arithmetic operations (add, sub, mul, floordiv) for dynamic shapes on WebGPU [6], letting mobile models express variable dimensions in compiled graphs without recompilation. Infrastructure improvements landed across both repos: PyTorch hardened its CI with absolute action references for cross-repo compatibility [4] and extended the Scalar(long long) constructor guard to NetBSD and other LP64 BSDs that were failing builds [5]. ExecuTorch also cached SwiftShader prebuilts to skip per-run compilation [7], cutting CI overhead for graphics-heavy test suites.
Action items
- → If using channels_last avg_pool2d with padding on CUDA, upgrade immediately to pick up the coordinate fix pytorch/pytorch [immediate]
- → Review sccache configuration on Linux CI systems - the new warnings will surface gaps in cache setup pytorch/pytorch [plan]
- → If shipping dynamic shapes on WebGPU, integrate the new SymInt arithmetic ops into your model export pipeline pytorch/executorch [plan]
- → Monitor ExecuTorch CI for SwiftShader cache behavior - seed S3 on next revision bump pytorch/executorch [monitor]
References
- [1] Fix avg_pool2d CUDA backward for channels_last inputs with padding (#188345) pytorch/pytorch
- [2] Fix from 20260702-pytorch-adhoc-6f35e0 (#188869) pytorch/pytorch
- [3] print_sccache_stats: warn if sccache optional, fail if expected (#188920) pytorch/pytorch
- [4] [CI] Use absolute action references in teardown-xpu for cross-repo compatibility (#188769) pytorch/pytorch
- [5] Add Scalar(long long) constructor guard for NetBSD and other LP64 BSDs (#188941) pytorch/pytorch
- [6] [ExecuTorch][WebGPU] SymInt arithmetic ops (add/sub/mul/floordiv) for dynamic shapes (#20712) pytorch/executorch
- [7] CI: cache SwiftShader prebuilt to skip per-run from-source build (#20203) (#20203) pytorch/executorch
FAQ
- What changed in PyTorch on July 5, 2026?
- PyTorch shipped a critical fix for avg_pool2d CUDA backward on channels_last inputs that was computing wrong window coordinates, while ExecuTorch landed SymInt arithmetic ops to unlock truly dynamic shape inference on WebGPU.
- What should PyTorch teams do about it?
- If using channels_last avg_pool2d with padding on CUDA, upgrade immediately to pick up the coordinate fix • Review sccache configuration on Linux CI systems - the new warnings will surface gaps in cache setup • If shipping dynamic shapes on WebGPU, integrate the new SymInt arithmetic ops into your model export pipeline
- Which PyTorch repositories shipped on July 5, 2026?
- pytorch/pytorch, pytorch/executorch