RepoJournal
Hugging Face

@huggingface

Transformers, Datasets, and the open AI-model layer

Pick a date

The Wire · Showcase

TRL FIXES SILENT TOKENIZATION CACHE DISABLING, ADDS SANDBOXES TO HUB

By RepoJournal · Filed · About Hugging Face

DPO and SFT trainers have been silently disabling tokenization caching when dataset fingerprinting fails, and now three separate systems gain missing documentation and S3 compatibility.

TRL's tokenization pipeline captured the entire trainer object in a nested function, causing dataset fingerprinting to fail silently and fall back to random hashes, which disabled caching without warning [1]. The same release fixes CPOTrainer and ORPOTrainer to auto-load their documented-optional `processing_class` parameter instead of raising errors [2], and adds packing-aware dynamic batching to AsyncGRPO that keeps micro-batches token-balanced across distributed ranks [3]. Over in huggingface.js, the inference client gained fal-ai support for both audio-to-audio and text-to-audio tasks [4][5], building out a queue-based task system that properly handles file polling. Hub docs now documents Storage Buckets' new S3-compatible gateway at s3.hf.co, letting users interact with their buckets via AWS CLI and boto3 [6]. Most significantly, huggingface_hub shipped the Sandbox API and `hf sandbox` CLI on top of Jobs infrastructure, offering E2B/Modal-style sandboxes with 6-second cold starts and live output streaming [9], plus a new `sync_job_volume` helper for managing job artifacts [8]. The hub also optimized snapshot_download by caching tree endpoint responses and skipping unnecessary HEAD requests on Xet files [7].

Action items

References

  1. [1] Fix dataset fingerprinting in DPO/SFT tokenization ↗ huggingface/trl
  2. [2] Auto-load `processing_class` in CPO/ORPO trainers when omitted ↗ huggingface/trl
  3. [3] Add packing-aware dynamic batching to AsyncGRPO ↗ huggingface/trl
  4. [4] [Inference] fal-ai: audio-to-audio support ↗ huggingface/huggingface.js
  5. [5] [Inference] Add fal-ai text-to-audio support (#2249) huggingface/huggingface.js
  6. [6] feat: add docs for buckets s3 compatibility api ↗ huggingface/hub-docs
  7. [7] [Download] Cache repo tree listing on disk in snapshot_download ↗ huggingface/huggingface_hub
  8. [8] [Jobs] Add sync_job_volume helper and local paths in hf jobs -v (#4346) huggingface/huggingface_hub
  9. [9] [Sandbox] Add Sandbox API and `hf sandbox` CLI on top of Jobs (#4350) huggingface/huggingface_hub

FAQ

What changed in Hugging Face on June 30, 2026?
DPO and SFT trainers have been silently disabling tokenization caching when dataset fingerprinting fails, and now three separate systems gain missing documentation and S3 compatibility.
What should Hugging Face teams do about it?
Update TRL to get tokenization caching fix and CPOTrainer auto-loading • Test S3 compatibility gateway for Storage Buckets if you use external S3 tooling • Try new Sandbox API for remote code execution if you need low-latency task runners
Which Hugging Face repositories shipped on June 30, 2026?
huggingface/trl, huggingface/huggingface.js, huggingface/hub-docs, huggingface/huggingface_hub

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.