The Wire · Showcase
FLEX_ATTENTION TIGHTENS, VMAP GAINS VECTORIZATION, MACOS CLEARS WARNINGS
By RepoJournal · Filed · About PyTorch
PyTorch tightened flex_attention validation and shipped the missing vmap rule for count_nonzero, while macOS builds finally silence the deprecated declarations cascade.
The flex_attention operator now explicitly rejects NestedTensor inputs [1] instead of falling through to compiler errors, fixing #177377 with a regression test on the compiled fullgraph path. In parallel, count_nonzero gained its missing batching rule [2], eliminating the performance warning and enabling vectorized execution under torch.vmap. The macOS backend suppressed the flood of -Wdeprecated-declarations from Apple framework includes [3] by wrapping Foundation, Metal, MPS, and MPSGraph headers, clearing noise from recent SDK upgrades. On the infrastructure side, inductor.yml migrated to OSDC with the dial-up pattern [4], plumbing ARC inputs through CUDA and CPU build/test pairs, while _FastCudaLauncher now silently handles oversized kernels [5] instead of throwing unexpected ValueError. Helion's autotuner hardened LLM search to fail loudly on errors [6] and wired cache backends to RemoteAutotuneCache for warm-start enrichment [7], plus added H100 sm90 pretuned heuristics [8]. The LLM search stack shipped Opus 4.6/4.7 fast mode [9] and an effort_level knob spanning none/low/medium/high/max [10] with Anthropic adaptive thinking support. TorchTitan pinned GITHUB_TOKEN to contents: read [11] in response to CVE-2025-30066, and skipped dense numerics tests [12] due to an upstream DTensor regression with mixed-dtype sharding propagation.
Action items
- → Merge flex_attention validation fix to unblock compiled fullgraph callers pytorch/pytorch [plan]
- → Update count_nonzero vmap tests and remove xfail markers pytorch/pytorch [plan]
- → Apply macOS deprecated warnings suppression to your MPS builds pytorch/pytorch [monitor]
- → Review helion effort_level knob configuration for autotuner workflows pytorch/helion [monitor]
References
- [1] Reject NestedTensor inputs in flex_attention (#183516) pytorch/pytorch
- [2] Add batching rule for count_nonzero (#183860) pytorch/pytorch
- [3] [BE][MacOS] Suppress deprecated declarations warnings (#183927) pytorch/pytorch
- [4] [OSDC] Migrate inductor.yml to OSDC (ARC) via dial-up pattern (#183646) pytorch/pytorch
- [5] [inductor] Silence _FastCudaLauncher ValueError on oversized kernels (#183967) pytorch/pytorch
- [6] [Autotuner] LLM search: fail loudly + mTLS gateway compatibility (#2448) pytorch/helion
- [7] [cache] Wire from_best_available / from_cache to RemoteCacheBackend ↗ pytorch/helion
- [8] Add H100 (sm90) pretuned heuristics and perf gates ↗ pytorch/helion
- [9] [Autotuner] LLM search: Anthropic Opus 4.6/4.7 fast mode ↗ pytorch/helion
- [10] [Autotuner] LLM search: effort_level knob + Anthropic adaptive thinking + OpenAI xhigh ↗ pytorch/helion
- [11] ci: declare workflow-level `contents: read` on 2 workflows (#3367) pytorch/torchtitan
- [12] [graph_trainer] Skip dense numerics tests due to upstream DTensor regression ↗ pytorch/torchtitan
FAQ
- What changed in PyTorch on May 17, 2026?
- PyTorch tightened flex_attention validation and shipped the missing vmap rule for count_nonzero, while macOS builds finally silence the deprecated declarations cascade.
- What should PyTorch teams do about it?
- Merge flex_attention validation fix to unblock compiled fullgraph callers • Update count_nonzero vmap tests and remove xfail markers • Apply macOS deprecated warnings suppression to your MPS builds
- Which PyTorch repositories shipped on May 17, 2026?
- pytorch/pytorch, pytorch/helion, pytorch/torchtitan