The Wire · Showcase
TRANSFORMERS V5.11 SHIPS WITH DIFFUSIONGEMMA, DEEPSEEK 3.2 QUANTIZATION LANDS
By RepoJournal · Filed · About Hugging Face
Transformers v5.11.0 arrives with a major new model addition and three critical inference wins that ship faster token generation and better GPU utilization.
The headline win is DiffusionGemma [1], an encoder-decoder architecture engineered to crush the sequential bottlenecks of standard causal models by using multi-canvas sampling during inference, shipping token generation that doesn't play the one-token-at-a-time game. Paired with that is DeepSeek V3.2 quantization support [2], which adds fine-grained FP8 quantization with module-level exclusion, letting you ship 4-bit weight inference without gutting model quality. The throughput story gets better: continuous batching benchmarks now support data parallelism [3], meaning an 8-GPU node doesn't bottleneck 16 benchmarks to a single GPU anymore. On the diffusers side, AutoRound quantization integration [4] brings W4A16 weight-only quantization, another efficiency play for deployment. Hub auth just went keyless [8] with OIDC token exchange through Trusted Publishers, eliminating the need to store HF_TOKEN secrets in CI. The test infrastructure tightened up across the board: transformers fixed multi-image span offsets [5] that vLLM needed to bump, diffusers refactored UNet tests to a modular pattern [6], and huggingface_hub added explicit xet/no_xet markers [7] to stop guessing which tests actually run.
Action items
- → Update to transformers v5.11.0 to get DiffusionGemma and DeepSeek quantization support if you're deploying inference huggingface/transformers [plan]
- → Integrate AutoRound quantization in diffusers if W4A16 deployment efficiency matters in your pipeline huggingface/diffusers [plan]
- → Adopt OIDC token exchange in CI workflows to remove stored HF_TOKEN secrets huggingface/huggingface_hub [plan]
- → Verify continuous batching benchmarks are tuned with data parallelism on multi-GPU nodes huggingface/transformers [monitor]
References
- [1] Release v5.11.0 ↗ huggingface/transformers
- [2] Add deepseek 3.2 exp ↗ huggingface/transformers
- [3] [CB] [Minor] Add data-parallel to overall script ↗ huggingface/transformers
- [4] Integrate AutoRound into Diffusers ↗ huggingface/diffusers
- [5] Fix the offsets in processing ↗ huggingface/transformers
- [6] [tests] refactor UNet model tests to align with the new pattern ↗ huggingface/diffusers
- [7] [Tests] Add xet/no_xet pytest markers to filter Xet vs non-Xet tests ↗ huggingface/huggingface_hub
- [8] [Auth] Keyless CI/CD auth via OIDC token exchange ↗ huggingface/huggingface_hub
FAQ
- What changed in Hugging Face on June 11, 2026?
- Transformers v5.11.0 arrives with a major new model addition and three critical inference wins that ship faster token generation and better GPU utilization.
- What should Hugging Face teams do about it?
- Update to transformers v5.11.0 to get DiffusionGemma and DeepSeek quantization support if you're deploying inference • Integrate AutoRound quantization in diffusers if W4A16 deployment efficiency matters in your pipeline • Adopt OIDC token exchange in CI workflows to remove stored HF_TOKEN secrets
- Which Hugging Face repositories shipped on June 11, 2026?
- huggingface/transformers, huggingface/diffusers, huggingface/huggingface_hub