What should Hugging Face teams do about it?

Update TRL if running GKDTrainer, GOLDTrainer, or DistillationTrainer with gradient accumulation • Rebuild LeRobot dataset statistics if using image features with uint8 casting • Review GLM-4-MoE fine-tuning chat templates if in production

Which Hugging Face repositories shipped on June 14, 2026?

huggingface/trl, huggingface/lerobot, huggingface/ml-intern, huggingface/chat-ui

DISTILLATION TRAINER GRADIENT ACCUMULATION BUG FIXED, LEROBOT IMAGE STATS OVERFLOW PATCHED

By RepoJournal · Filed 06:03 UTC on June 14, 2026 · About Hugging Face

TRL's distillation trainers were silently miscalculating loss under gradient accumulation, and LeRobot's image statistics were overflowing to zero for valid data.

The GKDTrainer, GOLDTrainer, and DistillationTrainer accepted `num_items_in_batch` in their loss computation but never used it [1], causing JSD distillation loss to normalize by local microbatch token count instead of global count. Under gradient accumulation, this breaks the gradient scaling that transformers' base Trainer expects, silently producing wrong gradients. This is a critical fix for anyone training distilled models at scale [2]. In parallel, LeRobot's image statistics computation was promoting uint8 samples to float *after* squaring them in RunningQuantileStats, causing uint8 overflow that made computed variance negative and clamped to zero [3], so stats.json reported `std=0` for non-constant image data [4]. Both fixes are merged and ready. TRL also patched the GLM-4-MoE chat template to properly terminate assistant turns with role markers instead of missing end-of-turn tokens [5]. Routine dependency bumps across trl and ml-intern [6], [7]. Chat-UI increased MiniMax-M3's max_tokens to 65536 to prevent the router from truncating reasoning-heavy outputs at 2048 tokens [8].

FAQ

What changed in Hugging Face on June 14, 2026?: TRL's distillation trainers were silently miscalculating loss under gradient accumulation, and LeRobot's image statistics were overflowing to zero for valid data.
What should Hugging Face teams do about it?: Update TRL if running GKDTrainer, GOLDTrainer, or DistillationTrainer with gradient accumulation • Rebuild LeRobot dataset statistics if using image features with uint8 casting • Review GLM-4-MoE fine-tuning chat templates if in production
Which Hugging Face repositories shipped on June 14, 2026?: huggingface/trl, huggingface/lerobot, huggingface/ml-intern, huggingface/chat-ui

@huggingface

DISTILLATION TRAINER GRADIENT ACCUMULATION BUG FIXED, LEROBOT IMAGE STATS OVERFLOW PATCHED

The showcase is a teaser.
Your wire is the product.

DISTILLATION TRAINER GRADIENT ACCUMULATION BUG FIXED, LEROBOT IMAGE STATS OVERFLOW PATCHED

The showcase is a teaser. Your wire is the product.

The showcase is a teaser.
Your wire is the product.