The Wire · Showcase
TRL 1.5.0 SHIPS QWEN TEMPLATES AND FIXES OPENREWARD TOOL BINDING
By RepoJournal · Filed · About Hugging Face
TRL's latest release adds training-ready chat templates for three model families while fixing a critical bug where task-scoped tools were silently omitted during rollout binding.
TRL v1.5.0 [1] is the headline: Phi-3.5, Qwen3-VL, and Qwen3.5 Think/NoThink now have training chat templates with generation markers, meaning assistant_only_loss=True finally just works across these model families [1]. The Qwen3.5 Think/NoThink templates [2] follow the refined approach already proven in Qwen3, wrapping assistant output with generation markers and preserving thinking blocks. Separately, a critical fix to OpenRewardSpec [3] now correctly discovers and binds task-scoped tools during rollout binding, addressing a silent failure where only shared tools were being wired up. On the stability front, diffusers has locked down a determinism problem in ZImageTransformer2DModel [5] by replacing torch.empty() initialization with torch.zeros() for pad tokens, eliminating potential NaNs that could surface in layerwise casting tests. The diffusers team is now documenting torch.empty footguns [4] to prevent similar issues downstream. A fourth vision model, Qwen2.5-VL [6], is in flight with both original and training chat templates ready to land.
Action items
- → Upgrade TRL to v1.5.0 if you're training Phi-3.5, Qwen3-VL, or Qwen3.5 variants with assistant_only_loss huggingface/trl [plan]
- → Test OpenRewardSpec bindings if you're using rollout integration, verify task tools are now discoverable huggingface/trl [monitor]
- → Update diffusers if you're using ZImage or other vision transformers to stabilize determinism huggingface/diffusers [plan]
References
- [1] v1.5.0 ↗ huggingface/trl
- [2] Add Qwen3.5 Think/NoThink training chat templates with generation markers ↗ huggingface/trl
- [3] Fix `OpenRewardSpec` omitting task‑scoped tools during rollout binding (fixes #5727) ↗ huggingface/trl
- [4] note: torch.zeros -> torch.empty ↗ huggingface/diffusers
- [5] Initialize ZImage pad tokens deterministically ↗ huggingface/diffusers
- [6] Add Qwen2.5-VL original and training chat template with generation markers ↗ huggingface/trl
FAQ
- What changed in Hugging Face on May 26, 2026?
- TRL's latest release adds training-ready chat templates for three model families while fixing a critical bug where task-scoped tools were silently omitted during rollout binding.
- What should Hugging Face teams do about it?
- Upgrade TRL to v1.5.0 if you're training Phi-3.5, Qwen3-VL, or Qwen3.5 variants with assistant_only_loss • Test OpenRewardSpec bindings if you're using rollout integration, verify task tools are now discoverable • Update diffusers if you're using ZImage or other vision transformers to stabilize determinism
- Which Hugging Face repositories shipped on May 26, 2026?
- huggingface/trl, huggingface/diffusers