RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

TORCHTITAN SHIPS TOKEN-IN-TOKEN-OUT GENERATOR FOR RL, FIXES CRITICAL CI BREAKS ACROSS PLATFORMS

By RepoJournal · Filed · About PyTorch

TorchTitan's RL training pipeline now encodes prompts once and passes tokenized inputs directly to the generator, eliminating retokenization bugs that plagued distributed training.

The TITO (token-in/token-out) generator change [1] is the standard approach in production RL systems and unblocks cleaner separation between prompt encoding and generation logic. It also wires up generation metrics to wandb automatically, giving you visibility into sampling behavior without manual instrumentation. Meanwhile, TorchTitan fixed two critical CI failures blocking H100 and graph-compiled workloads: the DeepEP ABI break against PyTorch nightly [2] is now non-fatal so tests continue, and cudagraph compatibility checks now run at pass execution time instead of eagerly rejecting flex_attention kernels that regional_inductor will compile away [3]. Full DTensor mode for Llama3 landed [4] with declarative CP handling via LocalMapSpec instead of hooks, giving you a cleaner path to multi-dimensional SPMD meshes. On the compiler side, PyTorch's Inductor test infrastructure is now 4x faster on collection and 1.7x faster on execution [5] thanks to ISA subprocess caching. TorchAO shipped multi-ISA portable X86 kernels [6] so builds work across AVX512, AVX10.2, and scalar targets without rebuilding per machine. PyTorch core also preserved pin_memory metadata in Inductor constructors [7], fixing the torch.tensor and torch.rand pinned allocation cases that were being silently lowered away.

Action items

References

  1. [1] [rl] Add TITO generator and gen metrics ↗ pytorch/torchtitan
  2. [2] [graph_trainer] Fix H100 CI failure from DeepEP compilation break (#3390) pytorch/torchtitan
  3. [3] [graph_trainer] Defer cudagraph compatibility check to pass execution… (#3355) pytorch/torchtitan
  4. [4] [Full DTensor] Config-based Full DTensor for Llama3 ↗ pytorch/torchtitan
  5. [5] Speed up inductor test infrastructure (~4x collection, ~1.7x execution) (#181617) pytorch/pytorch
  6. [6] [X86] multi-ISA portable kernel compilation and runtime dispatch ↗ pytorch/ao
  7. [7] [inductor] Preserve pin_memory for constructors (#183977) pytorch/pytorch

FAQ

What changed in PyTorch on May 19, 2026?
TorchTitan's RL training pipeline now encodes prompts once and passes tokenized inputs directly to the generator, eliminating retokenization bugs that plagued distributed training.
What should PyTorch teams do about it?
Merge TITO generator change into your RL training branch before next experiment run • If running H100 CI, update your DeepEP installation to non-fatal to unblock remaining tests • Pull the Inductor subprocess caching optimization to speed up your local test suites
Which PyTorch repositories shipped on May 19, 2026?
pytorch/torchtitan, pytorch/pytorch, pytorch/ao

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.