RepoJournal
Hugging Face

@huggingface

Transformers, Datasets, and the open AI-model layer

Pick a date

The Wire · Showcase

GEMMA4 CUTS 19 GIB TRAINING BLOAT, TRANSFORMERS REVERTS FSDP CHAOS, CANDLE SPEEDS SCALAR OPS

By RepoJournal · Filed · About Hugging Face

Gemma4 just shed a massive training bottleneck by replacing one-hot tensor materialization with embedding lookups, while Transformers is cleaning house after an FSDP refactor went sideways.

The Gemma4 team shipped a smart optimization that replaces one-hot encoding plus matmul patterns with two F.embedding lookups, completely eliminating the materialization of a roughly 19 GiB intermediate tensor during large batch training [1]. This is the kind of surgical fix that unblocks real production workloads. Meanwhile, Transformers just reverted a significant FSDP plus Dtensor refactor after internal discussion flagged problems [2], a necessary move that keeps the main branch stable while the team sorts out the architecture. On the performance front, Candle is tackling binary broadcast scalar operations with better kernel dispatch, adding Layout helpers to identify scalars that masquerade as strided tensors and avoiding unnecessary indexing overhead [3]. The infrastructure work is solid too: hf-mount now supports JSON log format for environments with log shippers [7], and Transformers added Metal Flash SDPA support on Apple Silicon with fixes for generate and generate_batch paths [8]. Candle also bumped three core dependencies: rubato to 2.0, hf-hub to 0.5.0, and Symphonia to 0.6.0 [4] [5] [6]. Test CI infrastructure got hardened with cache permission fixes and token cleanup [9] [10]. Finally, TRL aligned KTO training with DPO by removing the null_ref_context indirection layer [11], cleaning up how the code handles missing reference models.

Action items

References

  1. [1] [Gemma4] Replace one-hot matmul with F.embedding in position embeddings (#46176) huggingface/transformers
  2. [2] [`Revert`] FSDP+Dtensor refactor related changes ↗ huggingface/transformers
  3. [3] Binary broadcast scalar support ↗ huggingface/candle
  4. [4] chore(deps): update rubato requirement from 1 to 2 ↗ huggingface/candle
  5. [5] chore(deps): update hf-hub requirement from 0.4.1 to 0.5.0 ↗ huggingface/candle
  6. [6] chore(deps): update symphonia requirement from 0.5.3 to 0.6.0 ↗ huggingface/candle
  7. [7] feat: add json log format ↗ huggingface/hf-mount
  8. [8] Enable kernels-community/metal-flash-sdpa on MPS (#45974) huggingface/transformers
  9. [9] Fix cache read-only permission for metrics (#19) huggingface/transformers-test-ci
  10. [10] Remove token ↗ huggingface/transformers-test-ci
  11. [11] Align KTO with DPO: Remove null_ref_context ↗ huggingface/trl

FAQ

What changed in Hugging Face on May 29, 2026?
Gemma4 just shed a massive training bottleneck by replacing one-hot tensor materialization with embedding lookups, while Transformers is cleaning house after an FSDP refactor went sideways.
What should Hugging Face teams do about it?
Review Transformers main branch carefully - FSDP refactor reverted, coordinate with team before relying on distributed training changes • Pull Gemma4 optimization if you're doing large batch training - eliminates 19 GiB memory waste • Update Candle dependencies (rubato, hf-hub, Symphonia) at next minor version bump
Which Hugging Face repositories shipped on May 29, 2026?
huggingface/transformers, huggingface/candle, huggingface/hf-mount, huggingface/transformers-test-ci, huggingface/trl

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.