Burn 0.15.0 Release Notes

Flame digital art generated by stable diffusion.

Mon Oct 28 2024

Overview

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Support for ONNX models has been expanded, with additional operators and bug fixes for better operator coverage.

As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations, and enhanced documentation.

Module & Tensor

• Remove copy restriction for const generic modules #2222 @laggui

• Add deform_conv2d as implemented in torchvision #2147 @wingertge

• Add dim checks on output rank for unsqueeze and stack #2331 @laggui

• Add Softmin #2358 @NoahSchiro

• Add `round`, `floor`, `ceil` for float tensor #2372 @med1844

• Make tensor sync #2392 @kingwingfly

• Add `tensor.one_hot` int operation #2413 @tsanona

• [Breaking] Change LR schedulers to return the initial LR at first `.step()` #2337 @towerpark

• Move LrSchedule generic to make it easier to use #2309 @ArthurBrussee

• Add quantization ops default implementation #2125 #2275 #2301 @laggui

Bug Fixes

• Avoid 0 denominator in interpolate frac #2224 @laggui

• Nonzero should return an empty vec for zero tensors #2212 @laggui

• Change ndarray mask_where implementation to correctly deal with NaNs #2272 @laggui

• Fix mask_where broadcasted input #2381 @laggui

• Make powf broadcastable #2398 @laggui

Backends

• Add candle `CudaDevice` and `MetalDevice` to avoid creating a new unique device each time #2290 @laggui

• Add fusion mix precision #2247 @nathanielsimard

• Add SPIR-V compiler backend to `burn-wgpu` #2386 @wingertge

• Add burn-hip #2399 @syl20bnr

• Add `BackendRouter` to handle multiple backends on the way to distributed #2353 #2419 @laggui

Bug Fixes

• Fix autodiff memory leak #2347 @nathanielsimard

• Fix autodiff abs NaN when output is 0 #2249 @AsherJingkongChen

Documentation & Examples

• Add documentation for custom `cubecl` kernels, update some outdated docs #2404 @wingertge

• Add comments to burn fusion #2130 @cBournhonesque

• Improve doc for burn-tch #2288 @kingwingfly

• Improve regression example #2405 @laggui

• Create CITATION.cff #2231 @antimora

• Enable doc_auto_cfg to show feature-req-hint in docs.rs #2271 @kingwingfly

Fixes

• Fix tensor data elem type conversion in book #2211 @laggui

• Fix target convert in batcher and align guide imports #2215 @laggui

• Fix huber loss documentation #2232 @kingwingfly

• Fix debugger settings doc in contributor book #2223 @tiruka

• Fixed raspberry pi pico example not compiling #2220 @BjornTheProgrammer

• Fixed path in book #2262 @mehmetalianil

• Fix unresolved import `regression` #2285 @tiruka

• Fix burn book links #2303 #2327 @laggui @tiruka

• Contributor Book: Fix the link of primitive types in the "Serialization" page #2362 @towerpark

• Fix simple regression batch targets #2379 @wangjiawen2013

• Fix xtask args which are unmodified when upgrading xtask commands #2364 @tiruka

ONNX Support

• Add gather support for multi-dim indices (rank > 1) #2199 @alteredoxide

• Allow onnx-import expand op with non-const shapes #2189 @hexd0t

• Improve ONNX import tensor shape tracking #2213 @hexd0t

• Add missing output padding to conv transpose ONNX #2216 @laggui

• Fix ONNX where op for scalar inputs #2218 @hexd0t

• simplify scope tracking in burn-import #2207 @skewballfox

• Add onnx op trilu #2323 @tiruka

• Add ConvTranspose1d ONNX op #2349 @tiruka

Enhancements

• Improve slice kernel performance #2252 @nathanielsimard

• Fix burn-jit conv2d excessive loop unrolling #2263 @AsherJingkongChen

• Introduce autotuning to `conv2d` and `conv_transpose2d` with a new `im2col`/`GEMM` algorithm #2287 @wingertge

• Further data locality optimizations for implicit GEMM #2300 @wingertge

• Add utility methods to split gradients to GradientParams #2311 @ArthurBrussee

• Add bounds checking to implicit GEMM to allow arbitrary input shapes #2354 @wingertge

• Initialize accumulator to bias for implicit GEMM to save an expensive `float_add` #2383 @wingertge

Refactoring

• Select kernel from CPA to CubeCL #2168 @mepatrick73

• Migrate cubecl macro #2266 @wingertge

• Remove primitves const D generic #2298 @laggui

• Refactor elemwise fusion #2344 @nathanielsimard

• Refactor Adaptive Avg Pool to CubeCL #2351 @nathanielsimard

• Refactor pooling kernels #2356 @nathanielsimard

• Refactor burn-tensor: Split conv backward ops to allow conditional gradient computation #2278 @AsherJingkongChen

Miscellaneous

• Fix panic messages being invisible in tui mode #2226 @PaulWagener

• Refactor xtask to use tracel-xtask and refactor CI workflow #2063 @syl20bnr

• Automatic minimum rust version in README #2227 @syl20bnr

• Set MSRV to 1.81 #2388 @nathanielsimard

• Don't panic when the progress is > 1.0 #2229 @PaulWagener

• Fix compile for dataset crate with vision feature #2228 @PaulWagener

• Update CI workflow for last version of setup-linux action #2248 @syl20bnr

• [CI] Fix llvmpipe, lavapipe install for valgrind and vulnerabilities #2264 @syl20bnr

• Use CliMetricsRenderer when not in a terminal #2307 @lancelet

• Update rusqlite and associated libraries #2328 @paulirotta

• Fix missing fusion feature flag @nathanielsimard

• Move conv autotune under feature flag (except key) #2330 @laggui

• Add should_run for convs instead of panicking #2403 @ArthurBrussee

• Make changes for latest ratatui version #2421 @laggui

• Add Windows/WindowsIterator/WindowsDataset #2338 @NicoZweifel

References

[1]Github Release Page