The Wire · Showcase
LEROBOT ADDS DEPTH CAMERA SUPPORT, TRL FIXES ZERO-3 DTYPE BUGS ACROSS FIVE TRAINERS
By RepoJournal · Filed · About Hugging Face
LeRobot can now capture and encode depth frames from Realsense cameras in 12-bit MP4, while TRL patches a critical ZeRO-3 compatibility issue that was blocking training across CPO, ORPO, BCO, Distillation, and TPO.
LeRobot's depth camera integration [1] lands parametric quantization for both uint16 and float32 depth data with optional log normalization, solving a core limitation for robotic vision pipelines that need 3D spatial awareness. The implementation plumbs depth metadata through the entire stack: DatasetWriter, DatasetReader, and StreamingVideoEncoder, meaning you can now train on depth-augmented robot observations from day one [2]. Over in TRL, a critical dtype mismatch between ZeRO-3 and PEFT had already been fixed for the core trainers [3], but five experimental trainers (CPO, ORPO, BCO, Distillation, TPO) were still broken. This patch applies the identical guard to all five, unblocking distributed training for anyone using those experimental methods. Transformers landed Nemotron 3.5 ASR Streaming support [4], adding RNNT architecture with proper streaming encoder caching and generation logic, though the initial design went through two refinement PRs to clean up the modular architecture [5] and fix pipeline integration [6].
Action items
- → If training with CPO, ORPO, BCO, Distillation, or TPO using ZeRO-3 and PEFT, pull the latest TRL immediately huggingface/trl [immediate]
- → Review LeRobot depth encoder configuration if building depth-aware robot datasets huggingface/lerobot [plan]
- → Test Nemotron 3.5 ASR streaming integration in dev if speech-to-text is on your roadmap huggingface/transformers [monitor]
References
- [1] feat(depth maps): adding support for depth in LeRobot ↗ huggingface/lerobot
- [2] feat(depth maps): adding support for depth in LeRobot (#3644) huggingface/lerobot
- [3] Align CPO/ORPO/BCO/Distillation/TPO with DPO: fix ZeRO-3 + PEFT mixed-dtype error ↗ huggingface/trl
- [4] Add Nemotron 3.5 ASR Streaming (#46565) huggingface/transformers
- [5] [NemotronAsrStreaming] processor without modular ↗ huggingface/transformers
- [6] [NemotronAsrStreaming] fix pipeline (#46870) huggingface/transformers
FAQ
- What changed in Hugging Face on June 28, 2026?
- LeRobot can now capture and encode depth frames from Realsense cameras in 12-bit MP4, while TRL patches a critical ZeRO-3 compatibility issue that was blocking training across CPO, ORPO, BCO, Distillation, and TPO.
- What should Hugging Face teams do about it?
- If training with CPO, ORPO, BCO, Distillation, or TPO using ZeRO-3 and PEFT, pull the latest TRL immediately • Review LeRobot depth encoder configuration if building depth-aware robot datasets • Test Nemotron 3.5 ASR streaming integration in dev if speech-to-text is on your roadmap
- Which Hugging Face repositories shipped on June 28, 2026?
- huggingface/lerobot, huggingface/trl, huggingface/transformers