The Wire · Showcase
PYTORCH RELEASE WHEELS SHEDDING 80MB OF BLOAT, QUANTIZED EXPORTS FIXED
By RepoJournal · Filed · About PyTorch
PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.
The binary size explosion came from embedding compute_120 PTX in release wheels [1], a regression from 2.12.1 that only affects releases and RCs; nightly builds keep PTX for forward compatibility. That fix hits alongside quantized ONNX gather export repair [2], which unquantizes tensor inputs before lowering, closing the gap between eager and symbolic execution paths. On the inference front, vLLM benchmarking now mirrors the test-osdc offline cache pattern [3], reading shared HuggingFace cache at runtime and refreshing nightly, with FlashInfer's JIT workspace following the same strategy. Dynamo's control flow is tightening too: module-global random.random now routes through RandomVariable instead of graph-breaking [4], treating RNG values as symbolic. Meanwhile ExecuTorch's Arm backend gains memory-hungry test decorators [5] and Torch 2.12 compatibility patches for quantized decomposition [6], while NXP's Neutron flow now handles sum operations via the new MLIR path [7].
Action items
- → Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments pytorch/pytorch [plan]
- → Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] pytorch/pytorch [monitor]
- → Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI pytorch/pytorch [plan]
References
- [1] Strip +PTX from CUDA arch list on release/RC builds (in build_env_setup.py) (#188914) pytorch/pytorch
- [2] Fix quantized ONNX gather export (#188272) pytorch/pytorch
- [3] vllm-benchmark: read shared HF cache offline, refresh on nightly (#188659) pytorch/pytorch
- [4] [dynamo] Route module-global random.random through RandomVariable (#188235) pytorch/pytorch
- [5] Arm backend: Add xlarge pytest decorator ↗ pytorch/executorch
- [6] Arm backend: fix Torch compatibility (#20671) pytorch/executorch
- [7] NXP backend: Enable sum with new Neutron flow ↗ pytorch/executorch
FAQ
- What changed in PyTorch on July 4, 2026?
- PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.
- What should PyTorch teams do about it?
- Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments • Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] • Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI
- Which PyTorch repositories shipped on July 4, 2026?
- pytorch/pytorch, pytorch/executorch