Who contributed to Hugging Face on May 17, 2026?

4 developers shipped this update, including apocryphx, tarekziade, coyotte508, and HuggingFaceInfra.

What were the notable Hugging Face updates?

Unigram lattice walks Unicode scalars (#352, Bug 3) (#356), BPE merge by Unicode scalar, not grapheme cluster (#352, Bug 4) (#355), and bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983).

@huggingface

Transformers, Datasets, and the open AI-model layer

github ↗

Pick a date

Topics: Python AI / ML Full archive →

The Wire · Showcase

SWIFT TOKENIZERS FIXED FOR UNICODE, TRANSFORMERS CI PATCHED, MONGOKU SHIPS SCHEMA AUDIT

By RepoJournal · Filed 06:03 UTC on May 17, 2026 · About Hugging Face

4 people shipped this

apocryphx @apocryphx 3 cited

tarekziade @tarekziade 1 cited

coyotte508 @coyotte508 1 cited

HuggingFaceInfra @HuggingFaceInfra 1 cited

Swift-transformers shipped three critical tokenizer fixes that resolve grapheme cluster bugs breaking emoji and combining marks across Unigram, BPE, and BasicTokenizer [ref:8] [ref:9], while transformers CI dodged an E2BIG argument limit blowup on large PRs [ref:1].

The Swift fixes address a fundamental mismatch: SentencePiece vocabularies index by Unicode scalar, but Swift's Character type operates on extended grapheme clusters, causing emoji like '1️⃣' and combining marks in Thai, Devanagari, and Japanese to fail tokenization [4] [5] [6]. These aren't edge cases - they're vocab coverage holes that would silently produce wrong tokens in production. Transformers sidestepped a CI disaster by refetching PR files in-script instead of piping them through environment variables, which was hitting the kernel's MAX_ARG_STRLEN limit on PRs with large patches [7]. Mongoku merged a schema auditing feature with a medium-risk warning: new collection-wide introspection endpoints could surface expensive aggregations that timeout on large datasets [8]. Hub-docs auto-bumped inference provider packages and regenerated docs without incident [9].

Action items

→ Merge swift-transformers tokenizer fixes into your build pipeline before next release huggingface/swift-transformers [immediate]
→ Test Mongoku schema audit against your largest collections before enabling in production huggingface/Mongoku [plan]
→ Monitor transformers CI for large PR submissions to confirm E2BIG fix holds huggingface/transformers [monitor]

References

[1] Unigram lattice walks Unicode scalars (#352, Bug 3) (#356) huggingface/swift-transformers
[2] BPE merge by Unicode scalar, not grapheme cluster (#352, Bug 4) (#355) huggingface/swift-transformers
[3] bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) huggingface/transformers
[4] Unigram lattice walks Unicode scalars (#352, Bug 3) ↗ huggingface/swift-transformers
[5] BPE merge by Unicode scalar, not grapheme cluster (#352, Bug 4) ↗ huggingface/swift-transformers
[6] Strip Japanese voiced-kana marks in BasicTokenizer (#352, Bug 2) ↗ huggingface/swift-transformers
[7] bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion ↗ huggingface/transformers
[8] Add schema auditing endpoints and navigation tab ↗ huggingface/Mongoku
[9] [Bot] Update Inference Providers documentation ↗ huggingface/hub-docs

Quick answers

What shipped in Hugging Face on May 17, 2026?: Swift-transformers shipped three critical tokenizer fixes that resolve grapheme cluster bugs breaking emoji and combining marks across Unigram, BPE, and BasicTokenizer [ref:8] [ref:9], while transformers CI dodged an E2BIG argument limit blowup on large PRs [ref:1]. In total, 6 commits, 7 pull requests, and 1 releases landed.
Who contributed to Hugging Face on May 17, 2026?: 4 developers shipped this update, including apocryphx, tarekziade, coyotte508, and HuggingFaceInfra.
What were the notable Hugging Face updates?: Unigram lattice walks Unicode scalars (#352, Bug 3) (#356), BPE merge by Unicode scalar, not grapheme cluster (#352, Bug 4) (#355), and bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983).

TRANSFORMERS OVERHAULS LINEAR ATTENTION WHILE DEPRECATING LEGACY RESPONSE SCHEMA

The transformers library is retiring its fragile response_schema prototype in favor of streaming-compatible parsing, while simultaneously refactoring every linear attention model to use standardized convolution patterns.

python 70 shipped 2-min read

@huggingface 1 day ago

TRANSFORMERS SHIPS FSDP DISTRIBUTED TRAINING STACK, HUB LIBRARY PLUGS REDOS HOLE

Hugging Face landed distributed training orchestration in transformers while plugging a regex vulnerability that could stall untrusted card parsing for minutes.

+10

python 61 shipped 1-min read

@huggingface 2 days ago

TRANSFORMERS HARDENS AGAINST PYTORCH FRAGMENTATION WHILE TRL SIMPLIFIES DISTILLATION

Transformers plugged a cascading import failure that breaks downstream CI on older PyTorch versions, while TRL rips out dead code to lock DistillationTrainer into prompt-only datasets.

python 70 shipped 2-min read

@huggingface 3 days ago

DATASET VIEWER LOCKS DOWN ARROW, FUNES SHIPS GROUNDED ASK

Hugging Face security teams moved overnight to contain a critical Arrow IPC parsing vulnerability in dataset-viewer while shipping three production hardening releases across Repo2RLEnv, optimum-executorch, and funes.

python 91 shipped 2-min read

Elsewhere on the wire

AI Agents about 9 hours ago

CLAUDE OPUS 5 LANDS ACROSS THE STACK

The newest Anthropic model is now live in langchain, Cline, and llama-index, with native support for extended reasoning and 1M context windows.

ai-agents 28 shipped 1-min read

Local LLMs about 9 hours ago

OLLAMA LANDS LAGUNA SUPPORT AND CRUSHES MEMORY LEAKS WHILE SGLANG HITS V0.5.16 WITH CONFIDENCE-DRIVEN SPECULATIVE DECODING

Ollama shipped three critical performance and reliability fixes for Metal residency and concurrent access patterns, while SGL-Lang released 0.5.16 with a new speculative algorithm hitting 383.7 tok/s on DeepSeek-V4.

+11

llms 210 shipped 2-min read

@CachyOS about 9 hours ago

HYPRLAND V0.56 FIXES LAND, PACKAGE ECOSYSTEM ROLLS FORWARD

Hyprland configuration updated for v0.56 compatibility across multiple desks, while the AUR-derived ecosystem locked in four automated package bumps.

infra 85 shipped 1-min read

Elixir & Phoenix about 9 hours ago

LIVEVIEW ASYNC CLEANUP FIX SHIPS ALONGSIDE RANGE OPTIMIZATIONS

Phoenix LiveView closes a critical async task test failure while Elixir cuts unnecessary abs calls from Range operations.

elixir 19 shipped 1-min read

Want every project, not just this one?

Follow @huggingface