RepoJournal
Hugging Face

@huggingface

Transformers, Datasets, and the open AI-model layer

Pick a date

The Wire · Showcase

TRL ADDS MOE LOAD BALANCING TO REINFORCEMENT LEARNING TRAINERS

By RepoJournal · Filed · About Hugging Face

The TRL library now balances expert utilization across GRPO, RLOO, and AsyncGRPO trainers, closing a critical gap for anyone post-training mixture-of-experts models at scale.

The auxiliary router loss that kept experts balanced in SFTTrainer [1] is now available across all three RL training paths [2]. This matters: MoE models without load balancing collapse into pathological states where most experts sit idle, tanking efficiency and training stability. You enable it via model_init_kwargs with output_router_logits=True and a router_aux_loss_coef. In parallel, TRL cut a hotfix pinning DeepSpeed below 0.19.2 [3] to unblock CI after a recent compatibility break. LeRobot patched a race condition in camera read loops where stop_event could flip to None mid-loop [4], a classic time-of-check/time-of-use bug that would have caused hard crashes under load. The huggingface.js hardware catalog now recognizes Blackwell B300 and datacenter variants [6], letting the ecosystem start mapping inference workloads to the latest silicon. Documentation across the stack is catching up: LeRobot documented the new LeLab web interface [5] for teleoperation and training without CLI, and llama.cpp snippets shifted to the unified `llama serve` and `llama cli` commands [7].

Action items

References

  1. [1] Add MoE auxiliary loss to GRPO, RLOO, and AsyncGRPO trainers ↗ huggingface/trl
  2. [2] Add MoE auxiliary loss to GRPO, RLOO, and AsyncGRPO trainers (#6083) huggingface/trl
  3. [3] Hotfix CI: Temporarily pin deepspeed < 0.19.2 ↗ huggingface/trl
  4. [4] fix(cameras): snapshot stop_event in read loops to avoid None deref (#3812) huggingface/lerobot
  5. [5] docs: add LeLab web interface to README ↗ huggingface/lerobot
  6. [6] feat(hardware): Add B300 + other GPUs to hardware-nvidia.ts ↗ huggingface/huggingface.js
  7. [7] replace llama-server and llama-cli ↗ huggingface/huggingface.js

FAQ

What changed in Hugging Face on June 18, 2026?
The TRL library now balances expert utilization across GRPO, RLOO, and AsyncGRPO trainers, closing a critical gap for anyone post-training mixture-of-experts models at scale.
What should Hugging Face teams do about it?
Review and merge TRL MoE auxiliary loss PR before training large MoE runs • Upgrade TRL and pin DeepSpeed < 0.19.2 to unblock CI • Update LeRobot to latest patch for camera stability in production recording
Which Hugging Face repositories shipped on June 18, 2026?
huggingface/trl, huggingface/lerobot, huggingface/huggingface.js

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.