RepoJournal
PyTorch

@pytorch

PyTorch and the broader machine-learning ecosystem

Pick a date

The Wire · Showcase

PYTORCH RELEASE WHEELS SHEDDING 80MB OF BLOAT, QUANTIZED EXPORTS FIXED

By RepoJournal · Filed · About PyTorch

PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.

The binary size explosion came from embedding compute_120 PTX in release wheels [1], a regression from 2.12.1 that only affects releases and RCs; nightly builds keep PTX for forward compatibility. That fix hits alongside quantized ONNX gather export repair [2], which unquantizes tensor inputs before lowering, closing the gap between eager and symbolic execution paths. On the inference front, vLLM benchmarking now mirrors the test-osdc offline cache pattern [3], reading shared HuggingFace cache at runtime and refreshing nightly, with FlashInfer's JIT workspace following the same strategy. Dynamo's control flow is tightening too: module-global random.random now routes through RandomVariable instead of graph-breaking [4], treating RNG values as symbolic. Meanwhile ExecuTorch's Arm backend gains memory-hungry test decorators [5] and Torch 2.12 compatibility patches for quantized decomposition [6], while NXP's Neutron flow now handles sum operations via the new MLIR path [7].

Action items

References

  1. [1] Strip +PTX from CUDA arch list on release/RC builds (in build_env_setup.py) (#188914) pytorch/pytorch
  2. [2] Fix quantized ONNX gather export (#188272) pytorch/pytorch
  3. [3] vllm-benchmark: read shared HF cache offline, refresh on nightly (#188659) pytorch/pytorch
  4. [4] [dynamo] Route module-global random.random through RandomVariable (#188235) pytorch/pytorch
  5. [5] Arm backend: Add xlarge pytest decorator ↗ pytorch/executorch
  6. [6] Arm backend: fix Torch compatibility (#20671) pytorch/executorch
  7. [7] NXP backend: Enable sum with new Neutron flow ↗ pytorch/executorch

FAQ

What changed in PyTorch on July 4, 2026?
PyTorch 2.13 release builds are stripping PTX from CUDA architectures to reverse an 80-90 MB binary bloat that crept into 2.13.0, while core quantization and inference benchmarking fixes land across the stack.
What should PyTorch teams do about it?
Plan upgrade to next PyTorch release once available; the binary size fix matters for mobile and edge deployments • Review quantized ONNX exports in your pipelines; gather operations now export correctly [ref:2] • Verify vLLM benchmark runs use offline cache reads; nightly refreshes ensure model freshness without blocking CI
Which PyTorch repositories shipped on July 4, 2026?
pytorch/pytorch, pytorch/executorch

Related across the cluster

For your repos

The showcase is a teaser.
Your wire is the product.

Same engine. Different stack. Below: what changes when the wire is yours.

Showcase wire

  • 14 famous open source orgs
  • One wire per day
  • Public, generic
  • Read on the web, when you remember

Your wire

  • Up to 1,500 of your repos - orgs, deps, vendors
  • Morning and evening briefs
  • Action items routed to your team
  • Slack delivery, email, breaking-news CVE alerts

Want a hands-on demo first? Ask a current user for an invite link.