Who contributed to PyTorch on May 21, 2026?

4 developers shipped this update, including jethroqti, norx1991, thcmbs, and lanluo-nvidia.

What were the notable PyTorch updates?

[Dynamo] Standardize call_function/call_method args to list[VariableTracker] (#183600), [Profiler] Expose ITraceActivity to Python for direct chrome trace ex… (#184273), and NXP backend: Remove `max_pool2d` maximum kernel size restriction. (#19688).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

DYNAMO STANDARDIZES CORE API, PROFILER BYPASSES C++ SERIALIZATION

By RepoJournal · Filed 06:04 UTC on May 21, 2026 · About PyTorch

4 people shipped this

norx1991 @norx1991 2 cited

jethroqti @jethroqti 1 cited

thcmbs @thcmbs 1 cited

lanluo-nvidia @lanluo-nvidia 1 cited

PyTorch's compiler is standardizing its variable tracker arguments while the profiler cuts latency by streaming traces directly to disk.

The Dynamo team standardized `call_function` and `call_method` signatures across 230+ call sites to consistently use `list[VariableTracker]` instead of the mixed `Sequence` annotations [1], eliminating type confusion that forced runtime assertions. In parallel, the profiler exposed Kineto's `ITraceActivity` objects to Python via pybind [2], enabling Python-side chrome trace export that bypasses C++ JSON serialization and writes events directly to disk through optional gzip compression, cutting both latency and memory overhead. The executorch team shipped a flurry of backend wins: NXP removed the MaxPool2D kernel size restriction now that Neutron 3.1.1 supports it [3], while the Qualcomm HTP backend added runtime heap profiling for Android with pre- and post-context checkpoints [4]. Helion's Pallas backend got a major CI boost with CPU interpret-mode testing [5] plus fixes for tile.index broadcast indexing [6] and factory padding that Triton needs but Pallas doesn't [7]. Build infrastructure bumped torch_tensorrt from 2.11 to 2.12 [8] and upgraded Windows XPU support to 2026.0 [9].

Action items

→ Review Dynamo type standardization if maintaining variable tracker subclasses pytorch/pytorch [plan]
→ Test Pallas interpret CI to validate local changes without TPU hardware pytorch/helion [monitor]
→ Upgrade torch_tensorrt to 2.12 in your test environment pytorch/test-infra [plan]
→ Pin XPU support to 2026.0 on Windows CI if running Intel GPU tests pytorch/test-infra [plan]

References

[1] [Dynamo] Standardize call_function/call_method args to list[VariableTracker] (#183600) pytorch/pytorch
[2] [Profiler] Expose ITraceActivity to Python for direct chrome trace ex… (#184273) pytorch/pytorch
[3] NXP backend: Remove `max_pool2d` maximum kernel size restriction. (#19688) pytorch/executorch
[4] Qualcomm AI Engine Direct - heap profiling at runtime with HTP backend ↗ pytorch/executorch
[5] [Pallas] Add Pallas interpret CI job (revival of #1938) ↗ pytorch/helion
[6] [Pallas] Support tile.index broadcast indexing in load codegen ↗ pytorch/helion
[7] [Pallas] Disable factory padding and preserve concrete dims ↗ pytorch/helion
[8] promote torch_tensorrt from 2.11 to 2.12 ↗ pytorch/test-infra
[9] [BE] Upgrade XPU support package to 2026.0 in Windows CICD (#8103) pytorch/test-infra

Quick answers

What shipped in PyTorch on May 21, 2026?: PyTorch's compiler is standardizing its variable tracker arguments while the profiler cuts latency by streaming traces directly to disk. In total, 101 commits, 38 pull requests, and 1 releases landed.
Who contributed to PyTorch on May 21, 2026?: 4 developers shipped this update, including jethroqti, norx1991, thcmbs, and lanluo-nvidia.
What were the notable PyTorch updates?: [Dynamo] Standardize call_function/call_method args to list[VariableTracker] (#183600), [Profiler] Expose ITraceActivity to Python for direct chrome trace ex… (#184273), and NXP backend: Remove `max_pool2d` maximum kernel size restriction. (#19688).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 10 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 10 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.