RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

PYTORCH SHIPS METAL ACCELERATORS AND SPARSE DTYPE SUPPORT WHILE TORCHTITAN ADVANCES EXPERT PARALLELISM

By RepoJournal · Filed · About PyTorch

PyTorch landed critical MPS kernel migrations and sparse tensor improvements overnight while the Titan team built out the distributed training pipeline for expert-parallel models.

The core team completed Metal Performance Shaders (MPS) implementations for two essential ops: GLU forward pass [1] now runs 2x faster via MPSGraph instead of TensorIterator, and CTC loss [2] shipped with full forward-pass support optimized for batch parallelism. On the sparse front, `torch.sparse.sampled_addmm` now handles float16 and bfloat16 on CUDA [3], fixing a critical backward-pass gap that broke half-precision sparse matrix multiplication. Meanwhile, TorchTitan merged the graph trainer's expert parallelism (EP) infrastructure: the EP overlap scheduler [5] and chunking pass [6] enable Inductor to optimize token dispatch across distributed model experts, plus a new `DPRequestRouter` [4] centralizes data-parallel routing logic. Test infrastructure tightened its CI safeguards with an AI-advisor outage guard [7] that won't bail entire PRs on expected broad failures, and the CRCR zombie-workflow cleaner [8] now purges stale cross-repo CI entries from Redis. ExecuTorch fixed Windows CI by forcing CPU-only builds [10] to avoid CUDA toolkit conflicts, shipped Arm TOSA binary op support [11], and bumped Vela to 5.1.0 [9].

Action items

References

  1. [1] [MPS] Migrate GLU to Metal (#187833) pytorch/pytorch
  2. [2] [MPS] Add `ctc_loss` forward pass (#187716) pytorch/pytorch
  3. [3] Add float16/bfloat16 support to sparse CSR sampled_addmm (#187681) pytorch/pytorch
  4. [4] Add DPRequestRouter and use it in generator ↗ pytorch/torchtitan
  5. [5] [graph_trainer] Add EP overlap scheduling pass ↗ pytorch/torchtitan
  6. [6] [graph_trainer] Add graph EP chunking pass ↗ pytorch/torchtitan
  7. [7] [torchci] AI advisor: stable-hash sanity cap + ci-no-td outage-guard bypass ↗ pytorch/test-infra
  8. [8] [CRCR] Implement zombie workflow entries cleaner ↗ pytorch/test-infra
  9. [9] Arm backend: Bump vela to 5.1.0 (#20181) pytorch/executorch
  10. [10] Fix Windows unittest CI: force CPU-only build (CUDA 13.2 toolkit on runner breaks _portable_lib load) (#20527) pytorch/executorch
  11. [11] Arm backend: Add TOSA binary op visitors ↗ pytorch/executorch

FAQ

What changed in PyTorch on June 27, 2026?
PyTorch landed critical MPS kernel migrations and sparse tensor improvements overnight while the Titan team built out the distributed training pipeline for expert-parallel models.
What should PyTorch teams do about it?
If using sparse half-precision ops on CUDA, upgrade to pick up sampled_addmm fixes [ref:5] • Review TorchTitan EP overlap scheduler and chunking PRs if building distributed expert-parallel models [ref:9] [ref:10] • ExecuTorch Windows CI is stable again - Windows unittest jobs should go green with the CPU-only fix [ref:17]
Which PyTorch repositories shipped on June 27, 2026?
pytorch/pytorch, pytorch/torchtitan, pytorch/test-infra, pytorch/executorch

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.