The Wire · Showcase
TRL FIXES SILENT TOKENIZATION CACHE DISABLING, ADDS SANDBOXES TO HUB
By RepoJournal · Filed · About Hugging Face
DPO and SFT trainers have been silently disabling tokenization caching when dataset fingerprinting fails, and now three separate systems gain missing documentation and S3 compatibility.
TRL's tokenization pipeline captured the entire trainer object in a nested function, causing dataset fingerprinting to fail silently and fall back to random hashes, which disabled caching without warning [1]. The same release fixes CPOTrainer and ORPOTrainer to auto-load their documented-optional `processing_class` parameter instead of raising errors [2], and adds packing-aware dynamic batching to AsyncGRPO that keeps micro-batches token-balanced across distributed ranks [3]. Over in huggingface.js, the inference client gained fal-ai support for both audio-to-audio and text-to-audio tasks [4][5], building out a queue-based task system that properly handles file polling. Hub docs now documents Storage Buckets' new S3-compatible gateway at s3.hf.co, letting users interact with their buckets via AWS CLI and boto3 [6]. Most significantly, huggingface_hub shipped the Sandbox API and `hf sandbox` CLI on top of Jobs infrastructure, offering E2B/Modal-style sandboxes with 6-second cold starts and live output streaming [9], plus a new `sync_job_volume` helper for managing job artifacts [8]. The hub also optimized snapshot_download by caching tree endpoint responses and skipping unnecessary HEAD requests on Xet files [7].
Action items
- → Update TRL to get tokenization caching fix and CPOTrainer auto-loading huggingface/trl [plan]
- → Test S3 compatibility gateway for Storage Buckets if you use external S3 tooling huggingface/hub-docs [monitor]
- → Try new Sandbox API for remote code execution if you need low-latency task runners huggingface/huggingface_hub [monitor]
References
- [1] Fix dataset fingerprinting in DPO/SFT tokenization ↗ huggingface/trl
- [2] Auto-load `processing_class` in CPO/ORPO trainers when omitted ↗ huggingface/trl
- [3] Add packing-aware dynamic batching to AsyncGRPO ↗ huggingface/trl
- [4] [Inference] fal-ai: audio-to-audio support ↗ huggingface/huggingface.js
- [5] [Inference] Add fal-ai text-to-audio support (#2249) huggingface/huggingface.js
- [6] feat: add docs for buckets s3 compatibility api ↗ huggingface/hub-docs
- [7] [Download] Cache repo tree listing on disk in snapshot_download ↗ huggingface/huggingface_hub
- [8] [Jobs] Add sync_job_volume helper and local paths in hf jobs -v (#4346) huggingface/huggingface_hub
- [9] [Sandbox] Add Sandbox API and `hf sandbox` CLI on top of Jobs (#4350) huggingface/huggingface_hub
FAQ
- What changed in Hugging Face on June 30, 2026?
- DPO and SFT trainers have been silently disabling tokenization caching when dataset fingerprinting fails, and now three separate systems gain missing documentation and S3 compatibility.
- What should Hugging Face teams do about it?
- Update TRL to get tokenization caching fix and CPOTrainer auto-loading • Test S3 compatibility gateway for Storage Buckets if you use external S3 tooling • Try new Sandbox API for remote code execution if you need low-latency task runners
- Which Hugging Face repositories shipped on June 30, 2026?
- huggingface/trl, huggingface/huggingface.js, huggingface/hub-docs, huggingface/huggingface_hub