The Wire · Showcase
EXECUTORCH CLOSES CROSS-COMPILE GAP, RL SUITE SHIPS MAJOR REPLAY BUFFER OVERHAUL
By RepoJournal · Filed · About PyTorch
ExecuTorch fixes a critical cross-compilation bug that was pulling host libraries into bare-metal targets, while TorchRL lands four major features that reshape how distributed RL training manages memory and replay state.
The cross-compilation fix [1] stops CMake from linking the host's libdl into executorch_core when targeting bare-metal systems like ARM Cortex-M or ESP32, a bug that cascaded through the entire exported interface and broke downstream builds. In parallel, ExecuTorch adds WebGPU support for the cat operator with ValueList graph integration [2], expanding the ops available for browser-based inference. On the RL side, four interconnected PRs ship together: replay buffer consumption semantics [3] that let you recycle slots after sampling instead of only writing sequentially, MCAdvantage shared state fixes [4] so multi-actor setups don't duplicate buffer metadata, collector device normalization [5] for CPU LazyTensorStorage writes, and mjlab environment wrapper support [6] to integrate MuJoCo Lab physics-engine environments. PyTorch core continues hardening: the dataloader test suite becomes device-agnostic [7] so pin_memory and multiprocessing tests run on any accelerator backend, not just CUDA, and torch.compile's partitioner gets a fix [8] for models using explicit CUDA streams in training mode.
Action items
- → Upgrade ExecuTorch if cross-compiling for bare-metal targets - this was silently breaking downstream CMake builds pytorch/executorch [immediate]
- → Review RL PRs 3910-3913 if you ship distributed RL training - four linked changes that reshape replay buffer semantics pytorch/rl [plan]
- → Monitor torch.compile fixes for stream-dependent models heading to next release pytorch/pytorch [monitor]
References
- [1] Don't link libdl when cross-compiling (#20522) pytorch/executorch
- [2] [ExecuTorch][WebGPU] Add cat op + ValueList graph support (aten.cat.default) ↗ pytorch/executorch
- [3] [Feature] Replay buffer consume-after-sample support ↗ pytorch/rl
- [4] [BugFix] Share MCAdvantage replay-buffer state ↗ pytorch/rl
- [5] [BugFix] Normalize collector replay buffer device metadata ↗ pytorch/rl
- [6] [Feature] Add mjlab environment wrapper ↗ pytorch/rl
- [7] [Test] Make test_dataloader.py device-generic for pin_memory and multiprocessing tests (#185156) pytorch/pytorch
- [8] Fix control_deps handling in partitioner for forward/backward extraction (#187695) pytorch/pytorch
FAQ
- What changed in PyTorch on June 28, 2026?
- ExecuTorch fixes a critical cross-compilation bug that was pulling host libraries into bare-metal targets, while TorchRL lands four major features that reshape how distributed RL training manages memory and replay state.
- What should PyTorch teams do about it?
- Upgrade ExecuTorch if cross-compiling for bare-metal targets - this was silently breaking downstream CMake builds • Review RL PRs 3910-3913 if you ship distributed RL training - four linked changes that reshape replay buffer semantics • Monitor torch.compile fixes for stream-dependent models heading to next release
- Which PyTorch repositories shipped on June 28, 2026?
- pytorch/executorch, pytorch/rl, pytorch/pytorch