What should PyTorch teams do about it?

Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments • Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] • Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI

Which PyTorch repositories shipped on July 4, 2026?

pytorch/pytorch, pytorch/executorch

@pytorch

PyTorch and the broader machine-learning ecosystem

github ↗

Pick a date

The Wire · Showcase

PYTORCH RELEASE WHEELS SHEDDING 80MB OF BLOAT, QUANTIZED EXPORTS FIXED

By RepoJournal · Filed 06:03 UTC on July 4, 2026 · About PyTorch

PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.

The binary size explosion came from embedding compute_120 PTX in release wheels [1], a regression from 2.12.1 that only affects releases and RCs; nightly builds keep PTX for forward compatibility. That fix hits alongside quantized ONNX gather export repair [2], which unquantizes tensor inputs before lowering, closing the gap between eager and symbolic execution paths. On the inference front, vLLM benchmarking now mirrors the test-osdc offline cache pattern [3], reading shared HuggingFace cache at runtime and refreshing nightly, with FlashInfer's JIT workspace following the same strategy. Dynamo's control flow is tightening too: module-global random.random now routes through RandomVariable instead of graph-breaking [4], treating RNG values as symbolic. Meanwhile ExecuTorch's Arm backend gains memory-hungry test decorators [5] and Torch 2.12 compatibility patches for quantized decomposition [6], while NXP's Neutron flow now handles sum operations via the new MLIR path [7].

Action items

→ Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments pytorch/pytorch [plan]
→ Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] pytorch/pytorch [monitor]
→ Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI pytorch/pytorch [plan]

References

[1] Strip +PTX from CUDA arch list on release/RC builds (in build_env_setup.py) (#188914) pytorch/pytorch
[2] Fix quantized ONNX gather export (#188272) pytorch/pytorch
[3] vllm-benchmark: read shared HF cache offline, refresh on nightly (#188659) pytorch/pytorch
[4] [dynamo] Route module-global random.random through RandomVariable (#188235) pytorch/pytorch
[5] Arm backend: Add xlarge pytest decorator ↗ pytorch/executorch
[6] Arm backend: fix Torch compatibility (#20671) pytorch/executorch
[7] NXP backend: Enable sum with new Neutron flow ↗ pytorch/executorch

FAQ

What changed in PyTorch on July 4, 2026?: PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.
What should PyTorch teams do about it?: Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments • Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] • Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI
Which PyTorch repositories shipped on July 4, 2026?: pytorch/pytorch, pytorch/executorch

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

•14 famous open source orgs
•One wire per day
•Public, generic
•Read on the web, when you remember

Your wire

→Up to 1,500 of your repos - orgs, deps, vendors
→Morning and evening briefs
→Action items routed to your team
→Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.

PYTORCH RELEASE WHEELS SHEDDING 80MB OF BLOAT, QUANTIZED EXPORTS FIXED

The showcase is a teaser. Your wire is the product.

The showcase is a teaser.
Your wire is the product.