RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

PALLAS JAGGED KERNELS NOW PIPELINE, FSDP2 UNBLOCKS BACKWARD COMPUTE

By RepoJournal · Filed · About PyTorch

Helion's emit_pipeline now handles runtime-determined tile ranges in jagged attention kernels, while FSDP2 ships a new buffer control to stop reduce-scatter from stalling backward passes.

The Pallas team shipped a critical fix for jagged tile ranges in emit_pipeline [1], enabling kernels that use runtime-loaded offsets to properly stage their computation. This matters because jagged attention patterns are everywhere in modern LLM kernels, and the old code couldn't address them correctly when tile boundaries weren't block-aligned. In parallel, FSDP2 adds set_reduce_scatter_max_input_buffers [2] to mitigate a killer bottleneck: the compute stream stalling on reduce-scatter to finish before buffer reuse, which was idle time measured in tens of milliseconds per step. Dynamo also shipped a fix for resume frame boxing [3] that prevents deleted intermediates from staying alive longer in compiled graphs than in eager execution. On the ROCm side, the team removed a now-impossible test after grid size constraints tightened [4], and DTensor sharding now uses explicit hints for unbacked dimensions [5], trading conservative fallback behavior for paths that know what they're doing. ExecutorTorch landed Q6K Metal kernels for Gemma4 GGUF inference [6], Neutron SDK bumped to 3.1.2 [7], and test utilities got consolidated to kill redundancy [8].

Action items

References

  1. [1] [pallas] enable emit_pipeline BlockSpecs for jagged tile ranges ↗ pytorch/helion
  2. [2] [FSDP2] Add set_reduce_scatter_max_input_buffers to mitigate reduce-scatter blocking backward compute (#186000) pytorch/pytorch
  3. [3] [dynamo] Box resume frame values when del can run (#185561) pytorch/pytorch
  4. [4] [ROCm] Remove test_upsamplingNearest2d_launch_rocm test as ROCm reduces max grid size (#186257) pytorch/pytorch
  5. [5] [DTensor] Use explicit hints for unbacked sharding (#183545) pytorch/pytorch
  6. [6] [MLX][Gemma4] Introduce Q6K kernels (#20004) pytorch/executorch
  7. [7] NXP backend: Update eIQ Neutron SDK to 3.1.2 (#19938) pytorch/executorch
  8. [8] Extract shared device test utilities to reduce redundancy (#20061) ↗ pytorch/executorch

FAQ

What changed in PyTorch on June 8, 2026?
Helion's emit_pipeline now handles runtime-determined tile ranges in jagged attention kernels, while FSDP2 ships a new buffer control to stop reduce-scatter from stalling backward passes.
What should PyTorch teams do about it?
Review FSDP2 buffer tuning if you hit reduce-scatter blocking in backward • Test Dynamo resume functions in your graph-breaking workflows • Pull Helion jagged kernel fix if you're using attention patterns
Which PyTorch repositories shipped on June 8, 2026?
pytorch/helion, pytorch/pytorch, pytorch/executorch

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.