RepoJournal
Hugging Face

@huggingface

Transformers, Datasets, and the open AI-model layer

Pick a date

The Wire · Showcase

TRANSFORMERS CI FIXES OVERSIZED TRACES; TRL STREAMLINES QLORA TRAINING

By RepoJournal · Filed · About Hugging Face

The transformers test pipeline stopped silently dropping its largest jobs from dashboards. Three critical fixes ship today.

Transformers-ci fixed a cascade of trace observability failures that left engineers flying blind on big test runs. Test traces ballooned to 36-65 MB with 100k spans, causing Tempo to ingest them but fail on retrieval, which silently broke the dashboard [1]. The root cause: pytest-opentelemetry emits per-test protocol spans plus phase and fixture spans, bloating traces unnecessarily [2]. The fix drops all but the protocol span, cutting trace size to usable levels. A second bug froze traces as "settled" once span counts held steady, missing late-arriving ERROR spans from failed tests that eventually surfaced out-of-order [4]. Together, these fixes make the biggest jobs visible and accurate again [3].

On the training side, TRL shipped four quality-of-life wins. The `quantization_config` trainer argument lands across SFTTrainer, DPOTrainer, GRPOTrainer, RLOOTrainer, and RewardTrainer, killing the pattern of reaching into `model_init_kwargs` or manual loading [5]. Data collators across DPO, SFT, Reward, and KTO got a consistency pass with unified docstrings, naming, and structure [6]. SFT now truncates sequences during dataset prep instead of on every batch, speeding iteration and setting up for future work on dropping untrained rows [7]. KTO trainer now aligns with DPO by supporting PEFT models with the Liger fused loss, fixing a blanket rejection that blocked a common pattern [8]. One breaking change: vLLM 0.15 support drops [9].

Serge's pod-per-task security model is now durable and documented [10], with the design doc folded into security.md and the plan obsolete. CI now runs task tests on the in-VPC runner with proper route acceptance so internal ALB connections work [11]. Diffusers 0.39.0 shipped Cosmos 3, NVIDIA's unified world foundation model for Physical AI running omni-generation and reasoning in a single transformer [12]. huggingface.js registered the hi terminal coding agent in the harness registry [13].

Action items

References

  1. [1] Fix: large shard traces (tests_torch) silently dropped from dashboards ↗ huggingface/transformers-ci
  2. [2] Emit one span per test — drop phase + fixture spans (durable fix for oversized traces) ↗ huggingface/transformers-ci
  3. [3] Copied from `transformers-test-ci` ↗ huggingface/transformers-ci
  4. [4] Fix: reverify window so out-of-order late spans aren't frozen out ↗ huggingface/transformers-ci
  5. [5] Add `quantization_config` trainer argument (streamline QLoRA) ↗ huggingface/trl
  6. [6] Align data collators across DPO / SFT / Reward / KTO ↗ huggingface/trl
  7. [7] SFT: Truncate during dataset preparation, not collation ↗ huggingface/trl
  8. [8] Align KTO with DPO: Support PEFT with Liger ↗ huggingface/trl
  9. [9] Drop vLLM 0.15 support (#6239) huggingface/trl
  10. [10] docs: fold per-task-pod security into docs/security.md; drop the plan doc (#43) huggingface/serge
  11. [11] CI: run serge-task-test on the in-VPC runner (aws-general-8-plus) (#41) huggingface/serge
  12. [12] Diffusers 0.39.0: New image and video pipelines, core library improvements, and more ↗ huggingface/diffusers
  13. [13] Add hi agent harness (#2269) huggingface/huggingface.js

FAQ

What changed in Hugging Face on July 4, 2026?
The transformers test pipeline stopped silently dropping its largest jobs from dashboards. Three critical fixes ship today.
What should Hugging Face teams do about it?
Upgrade transformers-ci to pull the trace fixes (drop phase/fixture spans) before next large test run • Update TRL to 0.39+ if using QLoRA to simplify quantization_config; note vLLM 0.15 is no longer supported • Review Serge security docs; pod-per-task is now the durable model on prod
Which Hugging Face repositories shipped on July 4, 2026?
huggingface/transformers-ci, huggingface/trl, huggingface/serge, huggingface/diffusers, huggingface/huggingface.js

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.