Burn ONNX 0.21.0 Release

Flame digital art generated by stable diffusion.
Thu, May 14, 2026
Dilshod Tadjibaev

This is the first release of burn-onnx from its own repository. The crate that used to live inside tracel-ai/burn as burn-import now ships as tracel-ai/burn-onnx [1] and tracks Burn 0.21.0, with a dedicated CI gate, a new test infrastructure, and 300 commits worth of operator coverage, opset cleanup, and real-world model verification on top of Burn 0.20.0.

For users, the practical result is broader model coverage, smaller generated code, and a regression suite that now tracks upstream ONNX behavior instead of only local fixtures.

What is Burn ONNX?

burn-onnx imports ONNX models into the Burn [2] deep learning framework. Unlike runtime-based importers, it works at build time:

  1. You point burn-onnx at an .onnx file from your build.rs.
  2. It reads the model and emits a model.rs source file plus a model.bpk weight file.
  3. You include the generated Rust as a normal module and call Model::from_file(bpk_path, &device) from your code.

The output is plain Burn Rust. No graph runtime, no protobuf at runtime. The forward pass is something you can read, step through in a debugger, and modify by hand if you need to. Because the result is just Burn code, imported models can target Burn backends such as CPU via Flex or NdArray, GPU via WGPU or CUDA, and WebAssembly or no_std embedded builds when the generated ops and load strategy fit that target.

If you just want to inspect the generated code without wiring up build.rs, the onnx2burn CLI ships with the crate:

cargo run -p burn-onnx --bin onnx2burn -- model.onnx ./out
# writes ./out/model.rs and ./out/model.bpk

The project is split into two layers:

  • onnx-ir parses the ONNX protobuf wire format into a clean intermediate representation. It hides the protobuf complexity (versioned attributes, opset rules, external data, type inference, optional inputs) behind a typed Node enum and a NodeProcessor trait. It also performs graph-level normalization such as shape propagation, constant folding, common subexpression elimination, and dead-node elimination.
  • burn-onnx consumes that IR and generates Burn Rust. It owns the Burn-specific lowering: how to map IR nodes onto Burn tensor and module APIs, how to choose Burn layouts such as LinearLayout, and how to collect parameters into the Burnpack [3] weight file.

The aim is full ONNX compliance. We track our progress against the upstream ONNX backend test suite[4] (1,615 tests at v1.19.0) and a growing set of real-world exported models, and 0.21.0 is the release where that goal became a measurable target with a CI gate behind it.

Headline Numbers

  • 160 supported ONNX operators out of the 201 canonical operators catalogued in the support matrix, with dimension-specific Burn rows collapsed and Linear excluded.
  • 174 node-type processors registered in onnx-ir.
  • 1,615 upstream ONNX backend tests vendored at ONNX v1.19.0.
  • 896 are in the pass + fail-compare set, meaning generated Rust gets past the skip buckets and is tracked by the compile/compare gate.
  • 717 are classified as pass (80% of that set, 44% of the full vendored suite).
  • 27 model-check crates covering more than two dozen real-world models and variants, including Stable Diffusion XL, Qwen, Kokoro TTS, Depth-Pro, ModernBERT, RF-DETR, YOLO 12x, ResNet-50, CLIP ViT-B/32, and Silero VAD.
  • Opset coverage gate: 472 of 472 generated operator/spec-version checks green across ONNX opsets 1 through 24.
  • 300 commits and 21 contributors since Burn 0.20.0.

The Big Change: a Dedicated Repository

ONNX import has a different release cadence, contributor base, and surface area than the rest of Burn. Inside the monorepo it was held to Burn's release schedule even when an ONNX-only fix was ready to ship. The Burn repo split it out, and on January 27, 2026, tracel-ai/burn-onnx [1] went live with its own CI, its own issue tracker, and its own daily workflow that follows Burn's main branch and opens automated bump-burn PRs.

The legacy burn-import crate is still published as a thin re-export shim around burn-onnx, so existing users get a one-line migration path:

# Before
[build-dependencies]
burn-import = "0.20"

# After
[build-dependencies]
burn-onnx = "0.21"

The New ONNX Official-Test Gate

The new crates/onnx-official-tests crate vendors the upstream ONNX backend node tests[4] at v1.19.0 (1,615 tests) and declares the current expected status for every one in expectations.toml. Pass-listed models go through burn-onnx codegen; harness-compatible passes run as Rust #[test] functions with output comparisons, while codegen-only passes still compile generated Rust.

Status Count Meaning
pass 717 Codegen and compile green; harnessed cases match output
fail-compare 179 Compiles; harnessed cases still diverge
skip-compile 215 Codegen ok, generated Rust does not compile
skip-codegen 504 onnx2burn refuses the model

896 of 1,615 tests are in the pass + fail-compare set, and 80% of that set is currently classified as pass. The gate runs in plain cargo test with no Docker, no Python, no network, and a drift check fails CI if expectations and reality diverge.

The official-test gate sits on top of three other layers that have been growing alongside it:

  • Hand-crafted integration tests in crates/onnx-tests/. 178 focused test directories with 558 .onnx fixtures, each generated from a uv-scripted Python source that uses onnx.reference.ReferenceEvaluator as ground truth. Bug fixes land here as a failing test first, so regressions stay caught.
  • Codegen snapshot tests in burn-onnx. NodeCodegen impls carry inline insta::assert_snapshot! cases pinning the exact Rust output for codegen branches. There are 790 snapshot assertions under crates/burn-onnx/src today; review changes with cargo insta review.
  • onnx-ir opset compliance tests in crates/onnx-ir/tests/opset_compliance/. One fixture and one test file per ONNX opset from 1 through 24, generated from a single Python source. They lock in supported operator/spec-version coverage with executable evidence, not a spreadsheet.

Graph Simplification in onnx-ir

ONNX models are exported by frameworks (PyTorch, TensorFlow, scikit-learn, ONNX Runtime itself) that all have their own habits. The same Transformer attention head turns into a different graph depending on whether it was exported from torch.onnx.export, ONNX's _expanded form, or a model-zoo checkpoint that has already been through onnxsim. onnx-ir 0.21 introduces a graph simplification framework that runs a fixed-point loop of eight passes on every imported model:

  1. Attention coalescing. Decomposed scaled-dot-product attention (MatMul -> Scale -> Mask -> Softmax -> MatMul) is recognised as a pattern and fused into a single Attention node, which burn-onnx then emits as a call to Burn's native attention primitive. Pre-scaled and Q-pre-scaled variants are detected too. Late in the cycle this work promoted 27 upstream attention_*_expanded tests to passing.
  2. Permute-reshape detection. The Shape -> Gather -> Unsqueeze -> Concat -> Reshape chain that exporters use to reorder dimensions is collapsed into a single Transpose.
  3. Constant shape propagation. Shape(x) -> Gather(i) and Shape(x) -> Slice(start, end) are folded when the relevant dimensions are known statically. (Bare Shape(x) is intentionally left alone, since exporters often hard-code dimensions that change at runtime.)
  4. Constant folding. Nodes whose inputs are all constants (including Cast, Sqrt, Slice, Concat, Unsqueeze) are evaluated at compile time and replaced by their result.
  5. Idempotent-op elimination. Relu(Relu(x)), Ceil(Ceil(x)), Floor(Floor(x)), and friends collapse to a single call.
  6. Identity-element elimination. x + 0, x * 1, x / 1, and x ** 1 are removed.
  7. Common subexpression elimination. Duplicate nodes with identical inputs and configuration are merged.
  8. Dead-node elimination. Anything that no longer reaches a graph output is removed, cascading through the graph.

The simplifier is also where 0.21 lands the constant-merge work. Constants that flow into operators are now merged into the parameter table where the op expects them. Weight-bearing operators (Conv, Gemm, MatMul, BatchNorm) recognise their constant inputs as parameters and lift them; transient constants used only for shape arithmetic disappear during constant folding; and a separate pass cleans up unused constants left behind when an op consumed only some of its constant inputs (Gather was the biggest beneficiary, but the pattern shows up across the board).

The result is smaller, faster generated code, and a graph that looks more like what a human would have written. The CLI accepts --no-simplify if you want to inspect the raw IR for debugging.

Architecture Upgrades

ScalarTensor and ScalarNative: clean host vs. device scalars

ONNX uses rank-0 tensors freely, but Burn distinguishes between host-side Rust scalars and device-side rank-1 tensors. 0.21 introduces two new ArgType variants (ScalarTensor(DType) for on-device, ScalarNative(DType) for host) plus an ArgPreference::ScalarNative mechanism that lets nodes that need a real Rust integer (Range, If, Loop, Concat, Slice, Where) ask for it. The IR can keep scalars as device tensors when tensor code needs them, materialise constant scalar tensors as real data, and convert graph inputs and outputs at the boundary so user-facing forward signatures stay clean.

Submodule partitioning for large models

Stable Diffusion XL is a graph with tens of thousands of nodes. Compiling that into a single forward method takes minutes and produces source files Rust can barely parse. 0.21 introduces an automatic partitioner: large graphs at the 200-node threshold are split into Submodule0, Submodule1, ... structs, with the algorithm greedily picking low-width cut points based on live-tensor widths. Constants are reordered to land just before their first consumer so submodule interfaces stay narrow. Burnpack weight paths are auto-prefixed (submodule0.field.weight, etc.) and routed at load time. Small graphs are unaffected.

This is what unblocked the Depth-Pro and Stable Diffusion XL model checks.

Memory-mapped loading and Burnpack weights

onnx-ir now memory-maps the input ONNX file by default (gated by the mmap feature), and generated weights are stored in Burnpack [3] format. The new LoadStrategy options make file, embedded, and caller-provided byte loading explicit, including paths that are usable from WebAssembly and no_std targets. The previous Record type machinery is gone.

External data for models > 2 GB

ONNX splits large weight tensors into separate sidecar files. onnx-ir now follows those references, so models like Stable Diffusion XL load correctly without manual preprocessing.

Backend swap: burn-ndarray to burn-flex

Examples, model checks, and the official-tests harness now run on burn-flex for CPU. Default int dtype on Flex is I32 (vs. NdArray's I64), which led to a sweep through burn-onnx codegen to make every dtype explicit (.cast(DType::...), from_data_dtype) so generated models behave identically across backends.

Loading models: the new LoadStrategy enum

Where weights live used to be implicit. 0.21 turns it into a codegen option:

  • File keeps weights in a separate .bpk file, which is the default.
  • Embedded bakes weights into the binary for WebAssembly, embedded targets, and demos.
  • Bytes lets the caller provide weight data from its own data source.
  • None skips generated weight-loading constructors.

For normal applications, the recommended path is still explicit:

let model = Model::<Backend>::from_file("path/to/weights.bpk", &device);

Use Model::default() only when the generated path and default device are what you want, and avoid Model::new(&device) unless you plan to load weights manually with load_from.

New and Expanded Operator Coverage

Signal processing arrived as a full family this cycle, driven by Kokoro TTS and other audio models:

  • STFT, DFT, MelWeightMatrix.
  • Window functions: BlackmanWindow, HannWindow, HammingWindow.

Quantization:

  • QLinearMatMul, DequantizeLinear, QuantizeLinear.

ONNX ML (the classical-ML domain):

  • Scaler, SVMRegressor, Imputer.

Normalization:

  • LpNormalization, MeanVarianceNormalization, LRN (refactored to use Burn's native LocalResponseNorm).

Recurrent:

  • RNN, GRU, bidirectional GRU coverage, plus configurable activations on the existing LSTM.

Vision:

  • GridSample, Col2Im, LpPool (1D/2D), and asymmetric padding for 1D/2D Conv and Pool.

Math and general:

  • Det, constrained Einsum (with implicit output, ellipsis, and reductions), CastLike, Shrink, Hardmax, Swish, CumSum, ScatterND and GatherND (now mapping to Burn's native scatter_nd and gather_nd), ScatterElements, and DeformConv.

Activations:

  • Selu, Elu, Celu, ThresholdedRelu, Mish, Softplus, Softsign.

Control flow:

  • If, Loop, and Scan now generate native Rust control flow for supported cases, while preserving subgraphs and outer-scope variable references.

Native attention

The Attention op now codegens to Burn's attention primitive. Many exporters still emit scaled-dot-product attention as a decomposed graph of MatMul, scale, mask, Softmax, and another MatMul; the SDPA coalescer recognizes those patterns and rewrites them to Attention before codegen. That lets imported Transformer models use Burn's specialized attention path instead of preserving a slower chain of generic tensor operations.

Opset Coverage, 1 through 24

0.21 expands the supported opset range for existing operators instead of treating support as a single yes/no flag. Many operators now accept older ONNX forms as well as newer ones, including legacy attribute layouts and opset-dependent defaults.

The opset coverage gate now reports 472 of 472 generated operator/spec-version checks green across opsets 1 through 24. It focuses on the opset versions where a supported operator's ONNX spec changed, so the minimum-opset table is backed by executable fixtures rather than a spreadsheet.

Real-World Model Verification

crates/model-checks/ contains 27 model-check crates covering more than two dozen real-world models and variants. A check typically prepares or downloads the artifact, generates Rust with burn-onnx, runs a forward pass, and compares against saved ONNX Runtime or PyTorch reference data when the model has a runnable oracle.

Coverage includes:

  • LLMs and NLP: Qwen, SmolLM, SmolLM2, ALBERT, ModernBERT, all-MiniLM-L6-v2.
  • Vision classifiers: AlexNet, VGG19, ResNet-50, MobileNet-v2, SqueezeNet, ShuffleNet, ZFNet512, DenseNet121, Inception-v1/v2.
  • Detection and pose: YOLO 12x, RF-DETR, RTMW3D, MediaPipe Face Detector, ArcFace.
  • Generative, speech, depth: Stable Diffusion XL, Kokoro TTS, Depth-Pro, Depth-Anything-v2, Silero VAD, CLIP ViT-B/32 (text and vision).

The model-checks workflow runs on pull requests and main-branch pushes that touch import, IR, or model-check code. The Kokoro check uncovered an f32 precision issue in the matrix-DFT path of STFT; the fix was to compute the matmul in f64 internally and cast back, which collapsed audio output divergence from 24x to about 1.3x. The Depth-Pro check is what motivated submodule partitioning. RF-DETR is what motivated runtime-input slicing on Shape values.

Breaking Changes and Migration

  • burn-import is deprecated. New code should depend on burn-onnx directly. The shim still works.
  • Pick a LoadStrategy when generating code. The default (LoadStrategy::File) keeps the old separate-.bpk behaviour, and Model::from_file(path, &device) is the recommended way to load a generated model with an explicit device. Model::default() remains convenient when the generated path and default device are right for your application. Avoid Model::new(&device) unless you intend to load weights manually with load_from afterward; on its own it does not load the ONNX weights into Param fields.
  • Default backend in examples is now Flex (was NdArray). Flex defaults to I32 for integer types where NdArray defaulted to I64.
  • Generated code emits explicit dtypes everywhere. Downstream code that relied on implicit dtype defaults may need tightening.
  • Forward signatures may change for graphs that fed constants into ops whose codegen now accepts runtime inputs (Slice, Pad, Tile, Squeeze, Clip, Dropout, OneHot, TopK, ArgMax, ArgMin, Mod).
  • Ignored<T> in generated structs is replaced by #[module(skip)] (Burn's modern equivalent).

What's Next

  • Closing the 504 skip-codegen and 215 skip-compile cases in the official test gate. Each one is a named, tracked gap.
  • Expanding ONNX ML coverage (more classifiers, label encoders, tree ensembles).
  • More large-model coverage and a path to a steady all-green state on the model-checks workflow.
  • Continued progress toward full ONNX compliance, measured against the upstream gate.
  • Listing burn-onnx on the public ONNX Backend Scoreboard[5] alongside ONNX Runtime, tract, and the other backends, so our compliance numbers are visible next to everyone else's. The integration draft already lives under scoreboard/ in this repo.

Thanks to everyone who contributed to this release. If you are running ONNX models from PyTorch, TensorFlow, scikit-learn, or anywhere else and want to ship them as native Burn Rust, 0.21.0 is a good time to try burn-onnx. The ONNX Import Guide[6] in the Burn Book is the place to start, and the burn-onnx repository[1] is where issues and contributions go.

References

[1]tracel-ai/burn-onnx
[2]Burn deep learning framework
[3]Burnpack weight format (in tracel-ai/burn)
[4]ONNX backend node tests
[5]ONNX Backend Scoreboard
[6]ONNX Import Guide (Burn Book)

Join the mailing list

Join our community! We'd love to keep you in the loop with our newsletter.

unsubscribed

Copyright 2025 © Burn | Tracel Inc. All rights reserved. Design by Perdomo Logo