Overview
This release marks the debut of our CubeCL integration, which brings cross-platform GPU
programming capabilities directly to Rust. With CubeCL now supporting both CUDA and
WebGPU, Burn benefits from a new CUDA backend that can be enabled using the cuda-jit
feature. Please note that this backend is still considered experimental, and some
operations, particularly those related to vision, may experience issues.
Additionally, this release features significant enhancements to ONNX support, including
bug fixes, new operators, and improvements in code generation.
As always, it also includes numerous bug fixes, performance enhancements, new tensor
operations, and improved documentation.
Burn 0.14.0 introduces a new tensor data format that significantly enhances serialization
and deserialization speeds and introduces Quantization, a new Beta feature included in
this release. The format is not compatible with previous versions of Burn, but you can
migrate your previously saved records using
this guide.
Module & Tensor
• Add RoPE init_with_frequency_scaling
#2194@laggui
• Add 0-dim tensor checks for creation ops and validate TensorData shape w/ num values
#2137@laggui
• Add Hard sigmoid activation function
#2112@wingertge
• Add is_nan and contains_nan tensor ops
#2088@antimora
• Fix bug: Filling tensor containing f32::NEG_INFINITY will result in NaN for burn-ndarray
#2095@antimora
• Convert compatible prelu weights to rank 1
#2054@laggui
• Refactor tensor quantization for q_* ops
#2025@laggui
• Adding burn::nn::Sigmoid
#2031@RuelYasa
• Module weight quantization
#2000@laggui
• Cube: Matmul tiling
#1994@louisfd
• Enhance slice operation to support more range variation
#1989@antimora
• Add static tensor quantization
#1963@laggui
• Enable negative starts and ends for slice op
#1981@johnhuichen
• Implement 3D and transposed 3D convolutions.
#1945@booti386, antimora
• Print module - implement module display for remaining modules (part2)
#1933@antimora
• Print model structure like with PyTorch - Part 1
#1912@antimora
• Tanh nn wrapper
#1903@DieracDelta
• Implement `Element` for `bool`
#1878@laggui
• Feat: Add `movedim` tensor operator
#1876@LilDojd
• Make autodiff compile on wasm
#1889@ArthurBrussee
• Make Param.id public
#1859@ArthurBrussee • Remainder operator
#1726@kantic
• Add seq start position when applying RoPE encoding
#1796@laggui
• Adding max import
#1769@JachymPutta • Feat/squeeze dims
#1779@agelas
• Implement bidirectional LSTM
#1035@wcshds • Feat/remainder
#1597@agelas Bug Fixes
• Fix root-mean-square precision issue
#2193@laggui
• Fix indices dim check in gather_update_outputs
#2149@laggui
• Fix #2091 bug (in-place after expand)
#2114@antimora
• Fix aggregation results slice
#2110@laggui
• Fix: fusion auto bound checks
#2087@nathanielsimard
• Extend [min, max] range to ensure zero-point
#2055@laggui
• Bug/Remove Squeeze Panic for Multiple Dimensions
#2035@agelas
• Fix wgsl remainder definition
#1979@nathanielsimard
• Fix output tensor dtype
#1938@laggui
• feat: Make RetroForward public
#1905@femshima
• Fix conv2d_weight_grad_groups
#1891@laggui
• Fix select assign backward
#1739@nathanielsimard
• Fix repeat for dims greater than 1
#1713@louisfd
• Fix lstm batch size bug
#1695@nathanielsimard • Reshape bug fix
#1684@antimora ONNX Support
• Allow ONNX scalar greater/less with scalar
#2146@hexd0t
• Implement ONNX Gather for scalar indices
#2141@hexd0t
• feat: adding shape support for gather ONNX operation
#2128@mepatrick73
• ONNX Tile operation
#2092@mepatrick73
• Repeat operation
#2090@mepatrick73
• Add 1d and 2d modules for interpolate with scaling (also fix ONNX Resize op)
#2081@antimora
• Implement ONNX Pad Operator
#2007@johnhuichen
• Implement ONNX ConstantOfShape
#1815@hexd0t, antimora
• Add subtract tensor from scalar for ONNX sub op
#1964@johnhuichen
• Add ReduceProd ONNX Import
#1955@Dirleye
• Improve pickle (CandleTensor) conversions to NestedValue
#1944@antimora
• feat: added reduce min onnx import
#1894@JachymPutta
• feat: resize onnx import
#1863@mosure
• feat: added slice onnx import
#1856@JachymPutta
• Optimize argument handling and improve ONNX graph building
#1857@skewballfox
• feat: add sum onnx import
#1846@JachymPutta • Feat/gather import
#1843@agelas
• feat: expand onnx import
#1813@JachymPutta
• feat: added range onnx import
#1834@JachymPutta
• Feature/onnx argmax
#1814@will-maclean
• Feat: Implement ONNX RandomUniform + RandomNormal in burn-import
#1806@hexd0t
• feat: Greater + GreaterOrEqual onnx import
#1801@JachymPutta
• feat: Less + LessOrEqual onnx import
#1800@JachymPutta
• feat: added min onnx import
#1778@JachymPutta
• Squeeze Onnx Import
#1753@agelas
• Added ONNX AvgPool1d
#1744@Arjun31415
• Add MaxPool1d ONNX O
#1725@Arjun31415
• Add reduce sum onnx ops to burn imports
#1723@AntBlo
• PReLu ONNX import
#1721@Arjun31415
• Update SUPPORTED-ONNX-OPS.md
#1717@antimora
• ONNX debug improvements
#1712@antimora
• Skip updating shape for linear if not present
#1700@antimora
• Remove leaky relu ONNX file
#1697@laggui
• ONNX support for scalar unsqueeze
#1690@antimora
• Add layer norm onnx op support
#1680@laggui
• Fix reshape bug (support for opset version 1)
#1667@antimora
• Add sign ONNX op import support
#1663@wufniks
• Add where onnx op support
#1653@laggui
• Add matmul ONNX op support
#1638@laggui
• Add reduce max ONNX op support
#1636@laggui
• Add shape ONNX op support
#1639@laggui
• [ONNX] Add not op and extend cast support to tensors
#1634@laggui
• Add reduce mean ONNX op support
#1637@laggui
• Update SUPPORTED-ONNX-OPS.md
#1641@antimora
• Add sin onnx op support
#1633@laggui Bug Fixes
• Tensor type indent fix
#2196@mepatrick73
• pad-input-fix: adding support for pads as attributes
#2195@mepatrick73
• Fix ONNX Gather codegen for Shape input
#2148@hexd0t
• bug fix: adding bounds checking to pad ONNX inputs
#2120@mepatrick73
• Fix checks_channels_div_groups condition and ONNX conv import with groups
#2051@laggui
• Fix ONNX and PyTorch import section links in burn book
#1681@laggui
• Fix bug 1645 (Unsqueeze OpSet 11)
#1661@antimora
• Fix transpose onnx op (permute)
#1657@laggui Enhancements
• Add scientific notation formatting for small metric values
#2136@laggui
• Always derive Cube features from adapter
#1958@ArthurBrussee
Dynamic memory management preset + updated wgpu buffer memory management
#1962@mepatrick73, nathanielsimard
• Feat/fixed chunk alloc by class
#1960@mepatrick73
• Consistent sync/async handling, allow more functions to be async for wasm.
#1936@ArthurBrussee
• Replaced `str` with `Path`
#1919@varonroy
New autodiff graph memory management strategy
#1698@louisfd, nathanielsimard
• Move HandleContainer and Tensor Ops descriptions from burn-fusion to burn-tensor
#1654@syl20bnr
• WindowDataset/windows function
#1553@NicoZweifel Refactoring
• Scatter kernel from cpa to cubecl
#2169@mepatrick73
• Refactor binary op
#2085@nathanielsimard
• Refactor/jit/unary
#1965@nathanielsimard
• Separating ONNX parsing from burn-import
#1921@skewballfox
• Refactor tensor data
#1916@laggui
• Remove GraphicsAPI generic for WgpuRuntime
#1888@ArthurBrussee
• add dependency management for python
#1887@skewballfox
• refactor reduce into separate traits
#1798@louisfd
• Refactor/jit fusion
#1750@nathanielsimard
• Refactor/burn compute
#1580@nathanielsimard Documentation & Examples
• Enable cuda-jit in burn-core + in text classification example
#2160@nathanielsimard
• Add comments for matmul kernel
#2138@cBournhonesque
• Fix inner backend typo in book guide
#2135@laggui
• Improve ONNX import book section
#2059@antimora
• Update slice documentation
#2024@antimora
• Remove mention of example in backend section of the book
#2014@syl20bnr
• Fix image-classsification-web + autotune flag usage
#2011@laggui
• Add models and examples reference
#1966@laggui, syl20bnr
• Print module part3 - Update book
#1940@antimora
• Book: Fix the link to burn-train in "Learner" page
#1920@towerpark
• Doc: Improve module to_device/fork docs
#1901@nathanielsimard
• Book: Fix typos in the name of MessagePack format
#1868@towerpark
• docs: update README.md
#1810@eltociear
• Contributor Book: Onnx to Burn Conversion
#1771@agelas
• update ARCHITECTURE.md links to project architecture section in contributor book
#1759@benbaarber
• Add hidden code snippets to guide example in Burn book [redo]
#1742@jwric
• Fixing various syntax errors in the Burn book
#1740@mepatrick73
• Add indentation to project architecture in contributing book
#1738@ThierryCantin-Demers
• Add info about enabling debugging for new contributors
#1719@AntBlo
• [guide] Remove ambiguity lib vs. executable
#1649@syl20bnr
• [burn-book] Fix broken URL to SUPPORTED-ONNX-OPS.md
#1651@syl20bnr
• [burn-book] Fix typos in getting started
#1650@syl20bnr
• Many superficial fixes to the contributor book
#1644@louisfd
• Fix guide project name in the book
#1631@laggui • Improve grammar
#1619@Gadersd
• Docs/update contributor book
#1622@agelas CubeCL
• Remove CubeCL GELU kernel example reference (moved to CubeCL repo)
#2150@laggui
• Convert `reduce_dim_naive` kernel to use the `#[cube]` derive macro
#2117@cBournhonesque
• Rename revision key to rev for cubecl dependencies in Cargo.toml
#2086@syl20bnr
• Fix cubecl version in Cargo.toml to correctly fecth the version tag
@syl20bnr
• Refactor/jit cube/mask
#2075@louisfd
• Chore/update/cubecl
#2067@nathanielsimard
• Feat: Dynamic cube count dispatch
#1975@ArthurBrussee
• Refactor cube launch + support inplace operation
#1961@nathanielsimard
• Feat/cube/cooperative matrix-multiply and accumulate.
#1943@nathanielsimard
• Refactor/cube/mutability
#1934@nathanielsimard
• Handle visibility in cube
#1929@nathanielsimard
• Feat/cube/array assign ops
#1914@nathanielsimard
• Feat/comptime expr
#1910@nathanielsimard
• Feat/cube/compile error
#1909@nathanielsimard
• feat cube support Array
#1907@nathanielsimard
• Cube: variable reusability + refactor in cube macros
#1885@louisfd
• Refactor the tuner to be used standalone
#1884@nathanielsimard
• Add option to flush queue instead of waiting for completion.
#1864@ArthurBrussee
• Cube: Vectorization + simple matmul implementation
#1866@louisfd
• Get resources from server
#1861@ArthurBrussee
• Speedup client.create for small allocations.
#1858@ArthurBrussee
• Add a feature to initialize from an existing wgpu adapter/device/queue
#1788@ArthurBrussee • Fix cmma test
#1957@laggui
• Feat/dynamic small pool
#1931@mepatrick73
• Perf/dynamic mm slice adressing
#1917@mepatrick73
• Feat/dynamic mm basic implementation + small refactor
#1844@mepatrick73
• Cube: CubeType (no launch) and Comptime::map
#1853@louisfd
• [Refactor - Breaking] Refactor cube operations with better names & Support subgroup
operations
#1839@nathanielsimard
• Cube: cleaner use of topology values
#1835@louisfd
• Cube: support for shared memory
#1831@louisfd
• Cube: support method call + prettier tensor metadata
#1829@louisfd
• Add vectorization support into cube
#1830@nathanielsimard
• Cube: support for return + conv2d early return
#1828@louisfd
• Feat/cube/remaining ops
#1807@louisfd
• Cube: first ported kernel + comptime support + variable reuse + cleanup
#1797@louisfd
• Refactor/cube/vectorization
#1781@louisfd
• CubeCL first iteration
#1756@louisfd
• First draft CUDA runtime
#1685@nathanielsimard Miscellaneous
• Make compatible with thumbv6m-none-eabi + add raspberry pi pico example
#2096@BjornTheProgrammer
• Precision option for tensor display
#2139@antimora
• remove lto linker option to make build successful
#2123@tiruka
• Add top-k accuracy
#2097@cBournhonesque
• Modify contributing md scripts to solve conflicts between doc and scripts
#2107@tiruka
• Add polars DataFrame support for Dataset
#2029@ragyabraham, antimora
• modify broken link src of ide image
#2079@tiruka
• Bump rust minimal version to 1.79
@syl20bnr
• Added parameter trust_remote_code to hf dataset call.
#2013@Haislich
• Enable optimized handling of bytes
#2003@laggui
• Feat: Support trait with CubeCL
#1980@nathanielsimard
• Set DEFAULT_MAX_TASKS to 1 when running tests
@syl20bnr
• remove manual option matching
#1948@loganbnielsen
• Remove closed 'future improvements'
#1935@jwhogg
• Fix: launch without generics
#1932@nathanielsimard
• Update candle-core to a released version
#1913@antimora
• Do not use default burn-compute features unless enabled.
#1908@ArthurBrussee
• clippy on rust update
#1886@louisfd
• LearnerBuilder "with_checkpointing_strategy" should use builder pattern
#1841@Icekey
• Fix bench load record benchmarks
#1826@nathanielsimard
• Add configurable application logger to learner builder
#1774@jwric
• Add Clone trait to the `OptimizerAdaptor` and Clone implementations to the optimizers
#1770@getumen
• Replace opaque return types in optim
#1767@benbaarber
• #1747 Upgrade Rust dependencies
#1748@ahmedyarub, syl20bnr
• Refactor: replace trait TemplateKernel by existing trait JitKernel
#1737@sebhtml
• Autodiff Memory Management: BFS
#1710@louisfd
• [Fusion] Support multi-precision fusion
#1718@nathanielsimard
• Refactor element type to be decoupled from runtime
#1693@laggui
• Arc EventStoreClient to Rc EventStoreClient EventStoreClient
#1668@AlexErrant
• remove JIT subsequent RNG tests
#1652@louisfd
• Enable native sign operation for Candle backend
#1647@antimora Bug Fixes
• Fix module derive with generics
#2127@laggui
• modified mnist image link in the Hugging face
#2134@tiruka
• Fix broken links in contributor book
#2061@NoahSchiro
• Bump gix-tempfile to fix security audit on gix-fs
#2022@syl20bnr
• Fix warnings when using `record-backward-compat`
#1977@laggui
• Fix: constant record loading
#1902@nathanielsimard
• Fix `DataSerialize` conversion for elements of the same type
#1832@laggui
• Fix burn-jit compile error
#1803@DieracDelta
• Fix record nested value de/serialization
#1751@laggui
• fix prng bug during autotune
#1791@louisfd
• Fix unstable tests when run concurrently
#1724@AntBlo
• Handle ndarray matmul broadcasting
#1679@lancelet
• Fix inverted epoch - iteration counts in valid progress
#1699@laggui
• fix: `window` `pub window` in `dataset/mod.rs`
#1658@NicoZweifel