The Wire · Showcase
TRANSFORMERS SHIPS ROPE FIXES AND FINE-GRAINED QUANTIZATION WHILE HUB PATCHES PYTHON 3.15 BREAKAGE
By RepoJournal · Filed · About Hugging Face
Transformers landed critical correctness fixes for GLM sparse attention and new Triton-backed fp8/fp4 quantization, while huggingface_hub addressed a hard dependency break on Python 3.15.
The transformers team shipped interleaved RoPE fixes for MLA and DSA indexer caching on GLM5 [1], which accelerates sparse attention by reusing previous layers' top-k indices instead of recomputing them. Alongside that, fine-grained fp8/fp4 quantization via Triton landed [2], giving you native support for sub-tensor quantization with torch compile compatibility. Image processing got a speed bump with native torchvision LANCZOS interpolation replacing the PIL fallback [3], which matters for batch inference throughput. On the hub side, huggingface_hub fixed a critical import error on Python 3.15 where the private _MISSING_TYPE disappeared from dataclasses [4], causing immediate startup failures. The same release improves auth precedence in Colab environments [5], so user-provided tokens now take priority over Colab's vault token. CLI quiet mode now actually stays quiet [6]. Over in xet-core, russh bumped to 0.61 [7] and the team is working through an sdist release issue where LICENSE wasn't being included in the tarball [8].
Action items
- → If you support Python 3.15, update huggingface_hub immediately to avoid import failures on startup huggingface/huggingface_hub [immediate]
- → Test transformers upgrade if you use GLM5 or sparse attention workflows to validate RoPE behavior huggingface/transformers [plan]
- → Evaluate new Triton fp8/fp4 quantization for inference performance gains in your pipelines huggingface/transformers [monitor]
- → Watch xet-core 1.5.1 release for sdist fix completion huggingface/xet-core [monitor]
References
- [1] Fix: interleaved RoPE application for MLA and Support Index Cache DSA indexer skip-topk sharing for GLM5 (#46372) huggingface/transformers
- [2] Triton finegrained fp8/fp4 (#46407) huggingface/transformers
- [3] Use torchvision's native LANCZOS interpolation instead of PIL fallback (#46496) huggingface/transformers
- [4] [Fix] Remove private _MISSING_TYPE import from dataclasses module (#4322) huggingface/huggingface_hub
- [5] [Auth] Take google colab token from env first ↗ huggingface/huggingface_hub
- [6] [CLI] Suppress hints in quiet output mode ↗ huggingface/huggingface_hub
- [7] chore: bump russh from 0.60 to 0.61 ↗ huggingface/xet-core
- [8] Try to fix sdist release due to LICENSE missing from tarball root directory (#867) huggingface/xet-core
FAQ
- What changed in Hugging Face on June 8, 2026?
- Transformers landed critical correctness fixes for GLM sparse attention and new Triton-backed fp8/fp4 quantization, while huggingface_hub addressed a hard dependency break on Python 3.15.
- What should Hugging Face teams do about it?
- If you support Python 3.15, update huggingface_hub immediately to avoid import failures on startup • Test transformers upgrade if you use GLM5 or sparse attention workflows to validate RoPE behavior • Evaluate new Triton fp8/fp4 quantization for inference performance gains in your pipelines
- Which Hugging Face repositories shipped on June 8, 2026?
- huggingface/transformers, huggingface/huggingface_hub, huggingface/xet-core