Who contributed to Hugging Face on May 29, 2026?

6 developers shipped this update, including vasqu, ivarflakstad, dependabot[bot], jcudit, ydshieh, and albertvillanova.

What were the notable Hugging Face updates?

[Gemma4] Replace one-hot matmul with F.embedding in position embeddings (#46176), [`Revert`] FSDP+Dtensor refactor related changes, and Binary broadcast scalar support.

@huggingface

Transformers, Datasets, and the open AI-model layer

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

GEMMA4 CUTS 19 GIB TRAINING BLOAT, TRANSFORMERS REVERTS FSDP CHAOS, CANDLE SPEEDS SCALAR OPS

By RepoJournal · Filed 06:04 UTC on May 29, 2026 · About Hugging Face

6 people shipped this

dependabot[bot] @dependabot[bot] 3 cited

vasqu @vasqu 1 cited

ivarflakstad @ivarflakstad 1 cited

jcudit @jcudit 1 cited

ydshieh @ydshieh 1 cited

albertvillanova @albertvillanova 1 cited

Gemma4 just shed a massive training bottleneck by replacing one-hot tensor materialization with embedding lookups, while Transformers is cleaning house after an FSDP refactor went sideways.

The Gemma4 team shipped a smart optimization that replaces one-hot encoding plus matmul patterns with two F.embedding lookups, completely eliminating the materialization of a roughly 19 GiB intermediate tensor during large batch training [1]. This is the kind of surgical fix that unblocks real production workloads. Meanwhile, Transformers just reverted a significant FSDP plus Dtensor refactor after internal discussion flagged problems [2], a necessary move that keeps the main branch stable while the team sorts out the architecture. On the performance front, Candle is tackling binary broadcast scalar operations with better kernel dispatch, adding Layout helpers to identify scalars that masquerade as strided tensors and avoiding unnecessary indexing overhead [3]. The infrastructure work is solid too: hf-mount now supports JSON log format for environments with log shippers [7], and Transformers added Metal Flash SDPA support on Apple Silicon with fixes for generate and generate_batch paths [8]. Candle also bumped three core dependencies: rubato to 2.0, hf-hub to 0.5.0, and Symphonia to 0.6.0 [4] [5] [6]. Test CI infrastructure got hardened with cache permission fixes and token cleanup [9] [10]. Finally, TRL aligned KTO training with DPO by removing the null_ref_context indirection layer [11], cleaning up how the code handles missing reference models.

Action items

→ Review Transformers main branch carefully - FSDP refactor reverted, coordinate with team before relying on distributed training changes huggingface/transformers [immediate]
→ Pull Gemma4 optimization if you're doing large batch training - eliminates 19 GiB memory waste huggingface/transformers [plan]
→ Update Candle dependencies (rubato, hf-hub, Symphonia) at next minor version bump huggingface/candle [plan]
→ Test Metal Flash SDPA on Apple Silicon if you support MPS inference huggingface/transformers [monitor]

References

[1] [Gemma4] Replace one-hot matmul with F.embedding in position embeddings (#46176) huggingface/transformers
[2] [`Revert`] FSDP+Dtensor refactor related changes ↗ huggingface/transformers
[3] Binary broadcast scalar support ↗ huggingface/candle
[4] chore(deps): update rubato requirement from 1 to 2 ↗ huggingface/candle
[5] chore(deps): update hf-hub requirement from 0.4.1 to 0.5.0 ↗ huggingface/candle
[6] chore(deps): update symphonia requirement from 0.5.3 to 0.6.0 ↗ huggingface/candle
[7] feat: add json log format ↗ huggingface/hf-mount
[8] Enable kernels-community/metal-flash-sdpa on MPS (#45974) huggingface/transformers
[9] Fix cache read-only permission for metrics (#19) huggingface/transformers-test-ci
[10] Remove token ↗ huggingface/transformers-test-ci
[11] Align KTO with DPO: Remove null_ref_context ↗ huggingface/trl

Quick answers

What shipped in Hugging Face on May 29, 2026?: Gemma4 just shed a massive training bottleneck by replacing one-hot tensor materialization with embedding lookups, while Transformers is cleaning house after an FSDP refactor went sideways. In total, 24 commits and 27 pull requests landed.
Who contributed to Hugging Face on May 29, 2026?: 6 developers shipped this update, including vasqu, ivarflakstad, dependabot[bot], jcudit, ydshieh, and albertvillanova.
What were the notable Hugging Face updates?: [Gemma4] Replace one-hot matmul with F.embedding in position embeddings (#46176), [`Revert`] FSDP+Dtensor refactor related changes, and Binary broadcast scalar support.

TRANSFORMERS OVERHAULS LINEAR ATTENTION WHILE DEPRECATING LEGACY RESPONSE SCHEMA

The transformers library is retiring its fragile response_schema prototype in favor of streaming-compatible parsing, while simultaneously refactoring every linear attention model to use standardized convolution patterns.

python 70 shipped 2-min read

@huggingface 1 day ago

TRANSFORMERS SHIPS FSDP DISTRIBUTED TRAINING STACK, HUB LIBRARY PLUGS REDOS HOLE

Hugging Face landed distributed training orchestration in transformers while plugging a regex vulnerability that could stall untrusted card parsing for minutes.

+10

python 61 shipped 1-min read

@huggingface 2 days ago

TRANSFORMERS HARDENS AGAINST PYTORCH FRAGMENTATION WHILE TRL SIMPLIFIES DISTILLATION

Transformers plugged a cascading import failure that breaks downstream CI on older PyTorch versions, while TRL rips out dead code to lock DistillationTrainer into prompt-only datasets.

python 70 shipped 2-min read

@huggingface 3 days ago

DATASET VIEWER LOCKS DOWN ARROW, FUNES SHIPS GROUNDED ASK

Hugging Face security teams moved overnight to contain a critical Arrow IPC parsing vulnerability in dataset-viewer while shipping three production hardening releases across Repo2RLEnv, optimum-executorch, and funes.

python 91 shipped 2-min read

Elsewhere on the wire

AI Agents about 10 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 10 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.

+11

llms 210 shipped 2-min read

@CachyOS about 10 hours ago

HYPRLAND V0.56 FIXES LAND, PACKAGE ECOSYSTEM ROLLS FORWARD

Hyprland configuration updated for v0.56 compatibility across multiple desks, while the AUR-derived ecosystem locked in four automated package bumps.

infra 85 shipped 1-min read

Elixir & Phoenix about 10 hours ago

LIVEVIEW ASYNC CLEANUP FIX SHIPS ALONGSIDE RANGE OPTIMIZATIONS

Phoenix LiveView closes a critical async task test failure while Elixir cuts unnecessary abs calls from Range operations.

elixir 19 shipped 1-min read

Want every project, not just this one?

Follow @huggingface