Who contributed to PyTorch on May 17, 2026?

3 developers shipped this update, including fulvius31, choijon5, and SherlockNoMad.

What were the notable PyTorch updates?

Reject NestedTensor inputs in flex_attention (#183516), Add batching rule for count_nonzero (#183860), and [BE][MacOS] Suppress deprecated declarations warnings (#183927).

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

FLEX_ATTENTION TIGHTENS, VMAP GAINS VECTORIZATION, MACOS CLEARS WARNINGS

By RepoJournal · Filed 06:02 UTC on May 17, 2026 · About PyTorch

3 people shipped this

choijon5 @choijon5 3 cited

fulvius31 @fulvius31 1 cited

SherlockNoMad @SherlockNoMad 1 cited

PyTorch tightened flex_attention validation and shipped the missing vmap rule for count_nonzero, while macOS builds finally silence the deprecated declarations cascade.

The flex_attention operator now explicitly rejects NestedTensor inputs [1] instead of falling through to compiler errors, fixing #177377 with a regression test on the compiled fullgraph path. In parallel, count_nonzero gained its missing batching rule [2], eliminating the performance warning and enabling vectorized execution under torch.vmap. The macOS backend suppressed the flood of -Wdeprecated-declarations from Apple framework includes [3] by wrapping Foundation, Metal, MPS, and MPSGraph headers, clearing noise from recent SDK upgrades. On the infrastructure side, inductor.yml migrated to OSDC with the dial-up pattern [4], plumbing ARC inputs through CUDA and CPU build/test pairs, while _FastCudaLauncher now silently handles oversized kernels [5] instead of throwing unexpected ValueError. Helion's autotuner hardened LLM search to fail loudly on errors [6] and wired cache backends to RemoteAutotuneCache for warm-start enrichment [7], plus added H100 sm90 pretuned heuristics [8]. The LLM search stack shipped Opus 4.6/4.7 fast mode [9] and an effort_level knob spanning none/low/medium/high/max [10] with Anthropic adaptive thinking support. TorchTitan pinned GITHUB_TOKEN to contents: read [11] in response to CVE-2025-30066, and skipped dense numerics tests [12] due to an upstream DTensor regression with mixed-dtype sharding propagation.

Action items

→ Merge flex_attention validation fix to unblock compiled fullgraph callers pytorch/pytorch [plan]
→ Update count_nonzero vmap tests and remove xfail markers pytorch/pytorch [plan]
→ Apply macOS deprecated warnings suppression to your MPS builds pytorch/pytorch [monitor]
→ Review helion effort_level knob configuration for autotuner workflows pytorch/helion [monitor]

References

[1] Reject NestedTensor inputs in flex_attention (#183516) pytorch/pytorch
[2] Add batching rule for count_nonzero (#183860) pytorch/pytorch
[3] [BE][MacOS] Suppress deprecated declarations warnings (#183927) pytorch/pytorch
[4] [OSDC] Migrate inductor.yml to OSDC (ARC) via dial-up pattern (#183646) pytorch/pytorch
[5] [inductor] Silence _FastCudaLauncher ValueError on oversized kernels (#183967) pytorch/pytorch
[6] [Autotuner] LLM search: fail loudly + mTLS gateway compatibility (#2448) pytorch/helion
[7] [cache] Wire from_best_available / from_cache to RemoteCacheBackend ↗ pytorch/helion
[8] Add H100 (sm90) pretuned heuristics and perf gates ↗ pytorch/helion
[9] [Autotuner] LLM search: Anthropic Opus 4.6/4.7 fast mode ↗ pytorch/helion
[10] [Autotuner] LLM search: effort_level knob + Anthropic adaptive thinking + OpenAI xhigh ↗ pytorch/helion
[11] ci: declare workflow-level `contents: read` on 2 workflows (#3367) pytorch/torchtitan
[12] [graph_trainer] Skip dense numerics tests due to upstream DTensor regression ↗ pytorch/torchtitan

Quick answers

What shipped in PyTorch on May 17, 2026?: PyTorch tightened flex_attention validation and shipped the missing vmap rule for count_nonzero, while macOS builds finally silence the deprecated declarations cascade. In total, 42 commits and 14 pull requests landed.
Who contributed to PyTorch on May 17, 2026?: 3 developers shipped this update, including fulvius31, choijon5, and SherlockNoMad.
What were the notable PyTorch updates?: Reject NestedTensor inputs in flex_attention (#183516), Add batching rule for count_nonzero (#183860), and [BE][MacOS] Suppress deprecated declarations warnings (#183927).

CRITICAL OIDC INJECTION IN DOCS PREVIEW WORKFLOW PATCHED

PyTorch's docs-preview CI trusted fork-controlled artifacts in a context with token-write permissions, exposing the entire build pipeline to code injection.

python 66 shipped 1-min read

@pytorch 1 day ago

PYTORCH AUTOGRAD GETS 7% FASTER, AOTI FIXES SILENT FAILURES

Interned attribute names in autograd.Function shaved microseconds off the hot path while AOTI's scatter operations now properly report errors instead of silently corrupting results.

python 64 shipped 1-min read

@pytorch 4 days ago

DYNAMO REVERTS BREAKING CHANGE, EXECUTORCH CLEANS UP DEPRECATED TYPES

PyTorch reverted a Dynamo optimization that broke internal tests, while ExecutorTorch is aggressively deprecating c10 shims in favor of standard library types.

python 91 shipped 1-min read

@pytorch 5 days ago

PYTORCH SHIPS BUILD FIX WHILE HELION TUNES H100 KERNELS TO DEFAULT

A critical build regression in cusparselt.cpp is now patched, while the kernel autotuner promotes its pointwise seed heuristic to production defaults on H100 and B200.

python 36 shipped 1-min read

Elsewhere on the wire

AI Agents about 9 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 9 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.