This is the first release of burn-onnx from its own repository. The crate that
used to live inside tracel-ai/burn as burn-import now ships as tracel-ai/burn-onnx [1] and tracks Burn 0.21.0, with a dedicated CI
gate, a new test infrastructure, and 300 commits worth of operator coverage, opset cleanup,
and real-world model verification on top of Burn 0.20.0.
For users, the practical result is broader model coverage, smaller generated code, and a regression suite that now tracks upstream ONNX behavior instead of only local fixtures.
Table Of Contents
- 1.What is Burn ONNX?
- 2.Headline Numbers
- 3.The Big Change: a Dedicated Repository
- 4.The New ONNX Official-Test Gate
- 5.Graph Simplification in onnx-ir
- 6.Architecture Upgrades
- 7.New and Expanded Operator Coverage
- 8.Opset Coverage, 1 through 24
- 9.Real-World Model Verification
- 10.Breaking Changes and Migration
- 11.What's Next
What is Burn ONNX?
burn-onnx imports ONNX models into the Burn
[2] deep learning framework. Unlike runtime-based importers,
it works at build time:
-
You point
burn-onnxat an.onnxfile from yourbuild.rs. -
It reads the model and emits a
model.rssource file plus amodel.bpkweight file. -
You include the generated Rust as a normal module and call
Model::from_file(bpk_path, &device)from your code.
The output is plain Burn Rust. No graph runtime, no protobuf at runtime. The forward pass
is something you can read, step through in a debugger, and modify by hand if you need to.
Because the result is just Burn code, imported models can target Burn backends such as CPU
via Flex or NdArray, GPU via WGPU or CUDA, and WebAssembly or no_std embedded builds
when the generated ops and load strategy fit that target.
If you just want to inspect the generated code without wiring up build.rs,
the onnx2burn CLI ships with the crate:
cargo run -p burn-onnx --bin onnx2burn -- model.onnx ./out
# writes ./out/model.rs and ./out/model.bpk The project is split into two layers:
-
onnx-irparses the ONNX protobuf wire format into a clean intermediate representation. It hides the protobuf complexity (versioned attributes, opset rules, external data, type inference, optional inputs) behind a typedNodeenum and aNodeProcessortrait. It also performs graph-level normalization such as shape propagation, constant folding, common subexpression elimination, and dead-node elimination. -
burn-onnxconsumes that IR and generates Burn Rust. It owns the Burn-specific lowering: how to map IR nodes onto Burn tensor and module APIs, how to choose Burn layouts such asLinearLayout, and how to collect parameters into theBurnpack[3] weight file.
The aim is full ONNX compliance. We track our progress against the upstream ONNX backend test suite[4] (1,615 tests at v1.19.0) and a growing set of real-world exported models, and 0.21.0 is the release where that goal became a measurable target with a CI gate behind it.
Headline Numbers
- 160 supported ONNX operators out of the 201 canonical operators catalogued
in the support matrix, with dimension-specific Burn rows collapsed and
Linearexcluded. - 174 node-type processors registered in
onnx-ir. - 1,615 upstream ONNX backend tests vendored at ONNX v1.19.0.
- 896 are in the
pass+fail-compareset, meaning generated Rust gets past the skip buckets and is tracked by the compile/compare gate. - 717 are classified as
pass(80% of that set, 44% of the full vendored suite). - 27 model-check crates covering more than two dozen real-world models and variants, including Stable Diffusion XL, Qwen, Kokoro TTS, Depth-Pro, ModernBERT, RF-DETR, YOLO 12x, ResNet-50, CLIP ViT-B/32, and Silero VAD.
- Opset coverage gate: 472 of 472 generated operator/spec-version checks green across ONNX opsets 1 through 24.
- 300 commits and 21 contributors since Burn 0.20.0.
The Big Change: a Dedicated Repository
ONNX import has a different release cadence, contributor base, and surface area than the
rest of Burn. Inside the monorepo it was held to Burn's release schedule even when an
ONNX-only fix was ready to ship. The Burn repo split it out, and on January 27, 2026,
tracel-ai/burn-onnx [1] went live with its own CI, its own issue tracker,
and its own daily workflow that follows Burn's main branch and opens automated
bump-burn PRs.
The legacy burn-import crate is still published as a thin re-export shim around
burn-onnx, so existing users get a one-line migration path:
# Before
[build-dependencies]
burn-import = "0.20"
# After
[build-dependencies]
burn-onnx = "0.21" The New ONNX Official-Test Gate
The new crates/onnx-official-tests crate vendors the upstream ONNX backend node
tests[4] at v1.19.0 (1,615 tests) and declares
the current expected status for every one in expectations.toml. Pass-listed
models go through burn-onnx codegen; harness-compatible passes run as Rust #[test] functions with output comparisons, while codegen-only passes still compile generated Rust.
| Status | Count | Meaning |
|---|---|---|
pass | 717 | Codegen and compile green; harnessed cases match output |
fail-compare | 179 | Compiles; harnessed cases still diverge |
skip-compile | 215 | Codegen ok, generated Rust does not compile |
skip-codegen | 504 | onnx2burn refuses the model |
896 of 1,615 tests are in the pass + fail-compare set, and 80% of
that set is currently classified as pass. The gate runs in plain cargo test with no Docker, no Python, no network, and a drift check fails CI if expectations and reality
diverge.
The official-test gate sits on top of three other layers that have been growing alongside it:
- Hand-crafted integration tests in
crates/onnx-tests/. 178 focused test directories with 558.onnxfixtures, each generated from auv-scripted Python source that usesonnx.reference.ReferenceEvaluatoras ground truth. Bug fixes land here as a failing test first, so regressions stay caught. - Codegen snapshot tests in
burn-onnx.NodeCodegenimpls carry inlineinsta::assert_snapshot!cases pinning the exact Rust output for codegen branches. There are 790 snapshot assertions undercrates/burn-onnx/srctoday; review changes withcargo insta review. -
onnx-iropset compliance tests incrates/onnx-ir/tests/opset_compliance/. One fixture and one test file per ONNX opset from 1 through 24, generated from a single Python source. They lock in supported operator/spec-version coverage with executable evidence, not a spreadsheet.
Graph Simplification in onnx-ir
ONNX models are exported by frameworks (PyTorch, TensorFlow, scikit-learn, ONNX Runtime
itself) that all have their own habits. The same Transformer attention head turns into a
different graph depending on whether it was exported from torch.onnx.export,
ONNX's _expanded form, or a model-zoo checkpoint that has already been through
onnxsim. onnx-ir 0.21 introduces a graph simplification framework
that runs a fixed-point loop of eight passes on every imported model:
- Attention coalescing. Decomposed scaled-dot-product attention (
MatMul -> Scale -> Mask -> Softmax -> MatMul) is recognised as a pattern and fused into a singleAttentionnode, whichburn-onnxthen emits as a call to Burn's native attention primitive. Pre-scaled and Q-pre-scaled variants are detected too. Late in the cycle this work promoted 27 upstreamattention_*_expandedtests to passing. - Permute-reshape detection. The
Shape -> Gather -> Unsqueeze -> Concat -> Reshapechain that exporters use to reorder dimensions is collapsed into a singleTranspose. - Constant shape propagation.
Shape(x) -> Gather(i)andShape(x) -> Slice(start, end)are folded when the relevant dimensions are known statically. (BareShape(x)is intentionally left alone, since exporters often hard-code dimensions that change at runtime.) - Constant folding. Nodes whose inputs are all constants (including
Cast,Sqrt,Slice,Concat,Unsqueeze) are evaluated at compile time and replaced by their result. - Idempotent-op elimination.
Relu(Relu(x)),Ceil(Ceil(x)),Floor(Floor(x)), and friends collapse to a single call. - Identity-element elimination.
x + 0,x * 1,x / 1, andx ** 1are removed. - Common subexpression elimination. Duplicate nodes with identical inputs and configuration are merged.
- Dead-node elimination. Anything that no longer reaches a graph output is removed, cascading through the graph.
The simplifier is also where 0.21 lands the constant-merge work. Constants that flow into operators are now merged into the parameter table where the op expects them. Weight-bearing operators (Conv, Gemm, MatMul, BatchNorm) recognise their constant inputs as parameters and lift them; transient constants used only for shape arithmetic disappear during constant folding; and a separate pass cleans up unused constants left behind when an op consumed only some of its constant inputs (Gather was the biggest beneficiary, but the pattern shows up across the board).
The result is smaller, faster generated code, and a graph that looks more like what a
human would have written. The CLI accepts --no-simplify if you want to inspect
the raw IR for debugging.
Architecture Upgrades
ScalarTensor and ScalarNative: clean host vs. device scalars
ONNX uses rank-0 tensors freely, but Burn distinguishes between host-side Rust scalars and
device-side rank-1 tensors. 0.21 introduces two new ArgType variants (ScalarTensor(DType) for on-device, ScalarNative(DType) for host) plus an ArgPreference::ScalarNative mechanism that lets nodes that need a real Rust integer (Range, If, Loop, Concat, Slice, Where) ask for
it. The IR can keep scalars as device tensors when tensor code needs them, materialise
constant scalar tensors as real data, and convert graph inputs and outputs at the boundary
so user-facing forward signatures stay clean.
Submodule partitioning for large models
Stable Diffusion XL is a graph with tens of thousands of nodes. Compiling that into a
single forward method takes minutes and produces source files Rust can barely parse.
0.21 introduces an automatic partitioner: large graphs at the 200-node threshold are split into
Submodule0, Submodule1, ... structs, with the algorithm greedily
picking low-width cut points based on live-tensor widths. Constants are reordered to land
just before their first consumer so submodule interfaces stay narrow. Burnpack weight
paths are auto-prefixed (submodule0.field.weight, etc.) and routed at load
time. Small graphs are unaffected.
This is what unblocked the Depth-Pro and Stable Diffusion XL model checks.
Memory-mapped loading and Burnpack weights
onnx-ir now memory-maps the input ONNX file by default (gated by the mmap feature), and generated weights are stored in Burnpack [3] format. The new LoadStrategy options
make file, embedded, and caller-provided byte loading explicit, including paths that are usable
from WebAssembly and no_std targets. The previous Record type machinery
is gone.
External data for models > 2 GB
ONNX splits large weight tensors into separate sidecar files. onnx-ir now follows
those references, so models like Stable Diffusion XL load correctly without manual preprocessing.
Backend swap: burn-ndarray to burn-flex
Examples, model checks, and the official-tests harness now run on burn-flex for
CPU. Default int dtype on Flex is I32 (vs. NdArray's I64), which led to a sweep through burn-onnx codegen to make every dtype explicit (.cast(DType::...), from_data_dtype) so generated models behave identically across backends.
Loading models: the new LoadStrategy enum
Where weights live used to be implicit. 0.21 turns it into a codegen option:
-
Filekeeps weights in a separate.bpkfile, which is the default. -
Embeddedbakes weights into the binary for WebAssembly, embedded targets, and demos. -
Byteslets the caller provide weight data from its own data source. Noneskips generated weight-loading constructors.
For normal applications, the recommended path is still explicit:
let model = Model::<Backend>::from_file("path/to/weights.bpk", &device);
Use Model::default() only when the generated path and default device are what you
want, and avoid Model::new(&device) unless you plan to load weights manually
with load_from.
New and Expanded Operator Coverage
Signal processing arrived as a full family this cycle, driven by Kokoro TTS and other audio models:
-
STFT,DFT,MelWeightMatrix. -
Window functions:
BlackmanWindow,HannWindow,HammingWindow.
Quantization:
-
QLinearMatMul,DequantizeLinear,QuantizeLinear.
ONNX ML (the classical-ML domain):
-
Scaler,SVMRegressor,Imputer.
Normalization:
-
LpNormalization,MeanVarianceNormalization,LRN(refactored to use Burn's nativeLocalResponseNorm).
Recurrent:
-
RNN,GRU, bidirectional GRU coverage, plus configurable activations on the existingLSTM.
Vision:
-
GridSample,Col2Im,LpPool(1D/2D), and asymmetric padding for 1D/2D Conv and Pool.
Math and general:
-
Det, constrainedEinsum(with implicit output, ellipsis, and reductions),CastLike,Shrink,Hardmax,Swish,CumSum,ScatterNDandGatherND(now mapping to Burn's nativescatter_ndandgather_nd),ScatterElements, andDeformConv.
Activations:
-
Selu,Elu,Celu,ThresholdedRelu,Mish,Softplus,Softsign.
Control flow:
-
If,Loop, andScannow generate native Rust control flow for supported cases, while preserving subgraphs and outer-scope variable references.
Native attention
The Attention op now codegens to Burn's attention primitive. Many
exporters still emit scaled-dot-product attention as a decomposed graph of MatMul, scale, mask, Softmax, and another MatMul; the SDPA coalescer
recognizes those patterns and rewrites them to Attention before codegen. That lets
imported Transformer models use Burn's specialized attention path instead of preserving a slower
chain of generic tensor operations.
Opset Coverage, 1 through 24
0.21 expands the supported opset range for existing operators instead of treating support as a single yes/no flag. Many operators now accept older ONNX forms as well as newer ones, including legacy attribute layouts and opset-dependent defaults.
The opset coverage gate now reports 472 of 472 generated operator/spec-version checks green across opsets 1 through 24. It focuses on the opset versions where a supported operator's ONNX spec changed, so the minimum-opset table is backed by executable fixtures rather than a spreadsheet.
Real-World Model Verification
crates/model-checks/ contains 27 model-check crates covering more than two dozen
real-world models and variants. A check typically prepares or downloads the artifact, generates
Rust with burn-onnx, runs a forward pass, and compares against saved ONNX
Runtime or PyTorch reference data when the model has a runnable oracle.
Coverage includes:
- LLMs and NLP: Qwen, SmolLM, SmolLM2, ALBERT, ModernBERT, all-MiniLM-L6-v2.
- Vision classifiers: AlexNet, VGG19, ResNet-50, MobileNet-v2, SqueezeNet, ShuffleNet, ZFNet512, DenseNet121, Inception-v1/v2.
- Detection and pose: YOLO 12x, RF-DETR, RTMW3D, MediaPipe Face Detector, ArcFace.
- Generative, speech, depth: Stable Diffusion XL, Kokoro TTS, Depth-Pro, Depth-Anything-v2, Silero VAD, CLIP ViT-B/32 (text and vision).
The model-checks workflow runs on pull requests and main-branch pushes that touch import,
IR, or model-check code. The Kokoro check uncovered an f32 precision issue in the
matrix-DFT path of STFT; the fix was to compute the matmul in f64 internally
and cast back, which collapsed audio output divergence from 24x to about 1.3x. The
Depth-Pro check is what motivated submodule partitioning. RF-DETR is what motivated
runtime-input slicing on Shape values.
Breaking Changes and Migration
-
burn-importis deprecated. New code should depend onburn-onnxdirectly. The shim still works. - Pick a
LoadStrategywhen generating code. The default (LoadStrategy::File) keeps the old separate-.bpkbehaviour, andModel::from_file(path, &device)is the recommended way to load a generated model with an explicit device.Model::default()remains convenient when the generated path and default device are right for your application. AvoidModel::new(&device)unless you intend to load weights manually withload_fromafterward; on its own it does not load the ONNX weights intoParamfields. - Default backend in examples is now
Flex(wasNdArray). Flex defaults to I32 for integer types where NdArray defaulted to I64. - Generated code emits explicit dtypes everywhere. Downstream code that relied on implicit dtype defaults may need tightening.
- Forward signatures may change for graphs that fed constants into ops whose codegen now accepts runtime inputs (Slice, Pad, Tile, Squeeze, Clip, Dropout, OneHot, TopK, ArgMax, ArgMin, Mod).
-
Ignored<T>in generated structs is replaced by#[module(skip)](Burn's modern equivalent).
What's Next
-
Closing the 504
skip-codegenand 215skip-compilecases in the official test gate. Each one is a named, tracked gap. - Expanding ONNX ML coverage (more classifiers, label encoders, tree ensembles).
- More large-model coverage and a path to a steady all-green state on the model-checks workflow.
- Continued progress toward full ONNX compliance, measured against the upstream gate.
-
Listing
burn-onnxon the public ONNX Backend Scoreboard[5] alongside ONNX Runtime, tract, and the other backends, so our compliance numbers are visible next to everyone else's. The integration draft already lives underscoreboard/in this repo.
Thanks to everyone who contributed to this release. If you are running ONNX models from
PyTorch, TensorFlow, scikit-learn, or anywhere else and want to ship them as native Burn
Rust, 0.21.0 is a good time to try burn-onnx. The ONNX Import Guide[6] in the Burn Book is the place to start, and the burn-onnx repository[1] is where issues and contributions go.
