Who contributed to PyTorch on May 24, 2026?

4 developers shipped this update, including ethche, oulgen, calebmkim, and kirklandsign.

What were the notable PyTorch updates?

[lint] Cover ATen + autograd/serde generated C++ in clang-tidy (#184951), Replace deprecated std::aligned_storage in group/layer norm CUDA kernels (#184474), and Remove redundant/unreachable CMake code (#184861).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

PYTORCH TIGHTENS BUILD STANDARDS WITH CLANG-TIDY COVERAGE AND C++20 MODERNIZATION

By RepoJournal · Filed 06:02 UTC on May 24, 2026 · About PyTorch

4 people shipped this

ethche @ethche 1 cited

oulgen @oulgen 1 cited

calebmkim @calebmkim 1 cited

kirklandsign @kirklandsign 1 cited

PyTorch is systematizing its code quality by extending clang-tidy lints across generated C++ and replacing deprecated standard library calls across CUDA kernels.

The pytorch/pytorch team shipped two major hygiene wins overnight. First, clang-tidy now covers ATen and autograd-generated C++ [1], closing a gap where auto-generated code was running unchecked. The linting infrastructure drops the stale LineFilter exclusion and adds paths for generated headers to resolve properly. Codegen templates were tightened in parallel to emit cleaner output: concatenated namespaces, value-initialized primitives, and proper move semantics on ForwardRef classes. Second, deprecated std::aligned_storage got ripped out of group and layer norm CUDA kernels [2], replacing it with modern alignas syntax. This keeps PyTorch ahead of C++ standard deprecations. The team also pruned dead CMake code [3], removing unreachable compiler checks and duplicate option declarations that accumulated over time. On the C++ side, Dict.h now uses spaceship operators for comparisons [4], leveraging the C++20 support already in place. Over in pytorch/helion, the attention backward pass got a boost with tensor descriptor support in epilogue subtiling [5], and the CuTe reduction fuser learned to handle wide-chunk shapes [6]. A new XSA kernel example landed [7] showing exclusive self-attention with fused epilogues. pytorch/executorch fixed a transient linker failure [8] where llm_runner_helper.h was included but extension_llm_runner wasn't declared as a CMake dependency.

Action items

→ Review generated C++ in your ATen extensions - they're now under clang-tidy coverage and stricter codegen rules pytorch/pytorch [plan]
→ Update CUDA kernel code using std::aligned_storage to alignas before next integration pytorch/pytorch [plan]
→ If using transducer_runner in ExecuTorch builds, pull the latest to get llm_runner CMake fix pytorch/executorch [immediate]

References

[1] [lint] Cover ATen + autograd/serde generated C++ in clang-tidy (#184951) pytorch/pytorch
[2] Replace deprecated std::aligned_storage in group/layer norm CUDA kernels (#184474) pytorch/pytorch
[3] Remove redundant/unreachable CMake code (#184861) pytorch/pytorch
[4] spaceship operator for comparisons in Dict.h (#179224) pytorch/pytorch
[5] Support tensor_descriptor + epilogue subtile ↗ pytorch/helion
[6] Vec-aware two-pass load fusion for CuTe reductions ↗ pytorch/helion
[7] (Onboarding task) examples: add XSA (exclusive self-attention) kernel ↗ pytorch/helion
[8] Add extension_llm_runner to CMake deps (#19749) ↗ pytorch/executorch

Quick answers

What shipped in PyTorch on May 24, 2026?: PyTorch is systematizing its code quality by extending clang-tidy lints across generated C++ and replacing deprecated standard library calls across CUDA kernels. In total, 26 commits and 5 pull requests landed.
Who contributed to PyTorch on May 24, 2026?: 4 developers shipped this update, including ethche, oulgen, calebmkim, and kirklandsign.
What were the notable PyTorch updates?: [lint] Cover ATen + autograd/serde generated C++ in clang-tidy (#184951), Replace deprecated std::aligned_storage in group/layer norm CUDA kernels (#184474), and Remove redundant/unreachable CMake code (#184861).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 10 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 10 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.