The Wire · Showcase
TRL ADDS MOE LOAD BALANCING TO REINFORCEMENT LEARNING TRAINERS
By RepoJournal · Filed · About Hugging Face
The TRL library now balances expert utilization across GRPO, RLOO, and AsyncGRPO trainers, closing a critical gap for anyone post-training mixture-of-experts models at scale.
The auxiliary router loss that kept experts balanced in SFTTrainer [1] is now available across all three RL training paths [2]. This matters: MoE models without load balancing collapse into pathological states where most experts sit idle, tanking efficiency and training stability. You enable it via model_init_kwargs with output_router_logits=True and a router_aux_loss_coef. In parallel, TRL cut a hotfix pinning DeepSpeed below 0.19.2 [3] to unblock CI after a recent compatibility break. LeRobot patched a race condition in camera read loops where stop_event could flip to None mid-loop [4], a classic time-of-check/time-of-use bug that would have caused hard crashes under load. The huggingface.js hardware catalog now recognizes Blackwell B300 and datacenter variants [6], letting the ecosystem start mapping inference workloads to the latest silicon. Documentation across the stack is catching up: LeRobot documented the new LeLab web interface [5] for teleoperation and training without CLI, and llama.cpp snippets shifted to the unified `llama serve` and `llama cli` commands [7].
Action items
- → Review and merge TRL MoE auxiliary loss PR before training large MoE runs huggingface/trl [plan]
- → Upgrade TRL and pin DeepSpeed < 0.19.2 to unblock CI huggingface/trl [immediate]
- → Update LeRobot to latest patch for camera stability in production recording huggingface/lerobot [plan]
References
- [1] Add MoE auxiliary loss to GRPO, RLOO, and AsyncGRPO trainers ↗ huggingface/trl
- [2] Add MoE auxiliary loss to GRPO, RLOO, and AsyncGRPO trainers (#6083) huggingface/trl
- [3] Hotfix CI: Temporarily pin deepspeed < 0.19.2 ↗ huggingface/trl
- [4] fix(cameras): snapshot stop_event in read loops to avoid None deref (#3812) huggingface/lerobot
- [5] docs: add LeLab web interface to README ↗ huggingface/lerobot
- [6] feat(hardware): Add B300 + other GPUs to hardware-nvidia.ts ↗ huggingface/huggingface.js
- [7] replace llama-server and llama-cli ↗ huggingface/huggingface.js
FAQ
- What changed in Hugging Face on June 18, 2026?
- The TRL library now balances expert utilization across GRPO, RLOO, and AsyncGRPO trainers, closing a critical gap for anyone post-training mixture-of-experts models at scale.
- What should Hugging Face teams do about it?
- Review and merge TRL MoE auxiliary loss PR before training large MoE runs • Upgrade TRL and pin DeepSpeed < 0.19.2 to unblock CI • Update LeRobot to latest patch for camera stability in production recording
- Which Hugging Face repositories shipped on June 18, 2026?
- huggingface/trl, huggingface/lerobot, huggingface/huggingface.js