RepoJournal
Hugging Face

@huggingface

Transformers, Datasets, and the open AI-model layer

Pick a date

The Wire · Showcase

TRL 1.5.0 SHIPS QWEN TEMPLATES AND FIXES OPENREWARD TOOL BINDING

By RepoJournal · Filed · About Hugging Face

TRL's latest release adds training-ready chat templates for three model families while fixing a critical bug where task-scoped tools were silently omitted during rollout binding.

TRL v1.5.0 [1] is the headline: Phi-3.5, Qwen3-VL, and Qwen3.5 Think/NoThink now have training chat templates with generation markers, meaning assistant_only_loss=True finally just works across these model families [1]. The Qwen3.5 Think/NoThink templates [2] follow the refined approach already proven in Qwen3, wrapping assistant output with generation markers and preserving thinking blocks. Separately, a critical fix to OpenRewardSpec [3] now correctly discovers and binds task-scoped tools during rollout binding, addressing a silent failure where only shared tools were being wired up. On the stability front, diffusers has locked down a determinism problem in ZImageTransformer2DModel [5] by replacing torch.empty() initialization with torch.zeros() for pad tokens, eliminating potential NaNs that could surface in layerwise casting tests. The diffusers team is now documenting torch.empty footguns [4] to prevent similar issues downstream. A fourth vision model, Qwen2.5-VL [6], is in flight with both original and training chat templates ready to land.

Action items

References

  1. [1] v1.5.0 ↗ huggingface/trl
  2. [2] Add Qwen3.5 Think/NoThink training chat templates with generation markers ↗ huggingface/trl
  3. [3] Fix `OpenRewardSpec` omitting task‑scoped tools during rollout binding (fixes #5727) ↗ huggingface/trl
  4. [4] note: torch.zeros -> torch.empty ↗ huggingface/diffusers
  5. [5] Initialize ZImage pad tokens deterministically ↗ huggingface/diffusers
  6. [6] Add Qwen2.5-VL original and training chat template with generation markers ↗ huggingface/trl

FAQ

What changed in Hugging Face on May 26, 2026?
TRL's latest release adds training-ready chat templates for three model families while fixing a critical bug where task-scoped tools were silently omitted during rollout binding.
What should Hugging Face teams do about it?
Upgrade TRL to v1.5.0 if you're training Phi-3.5, Qwen3-VL, or Qwen3.5 variants with assistant_only_loss • Test OpenRewardSpec bindings if you're using rollout integration, verify task tools are now discoverable • Update diffusers if you're using ZImage or other vision transformers to stabilize determinism
Which Hugging Face repositories shipped on May 26, 2026?
huggingface/trl, huggingface/diffusers

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.