getting started

Vessels

Latest updates

Explore what’s new

Recent Becoming the Fastest Technical Posts Tutorials Release Notes

Recent

Tue Oct 28 2025

Burn 0.19.0 Release: Quantization, Distributed Training, and LLVM Backend

Flame digital art generated by stable diffusion.

This release brings true distributed training across multiple GPUs with concurrent stream execution and gradient synchronization, comprehensive quantization support (INT4/INT8) for memory-efficient models, a new LLVM-powered CPU backend with SIMD optimization, enhanced safetensors storage with PyTorch interoperability, and a refactored training loop supporting various distributed strategies including DDP.

Nathaniel Simard

Guillaume Lagrange

Tue Aug 12 2025

Tracel AI Raised $3M in Funding

Flame digital art generated by stable diffusion.

We're happy to announce that Tracel AI, the startup behind the Burn Deep Learning Framework, has secured significant funding to advance its mission of democratizing compute resources for AI development.

Nathaniel Simard

Louis Fortier-Dubois

Mon Aug 04 2025

Announcing Burn-LM (alpha): LLM Inference Engine

Digital art generated by stable diffusion.

We're happy to announce the Burn-LM, an LLM inference engine based on Burn! The goal is to support any large model, LLM, VLM, and others, for inference but also for training (pre-training, post-training, and fine-tuning).

Nathaniel Simard

Becoming the Fastest

Tue Aug 12 2025

Tracel AI Raised $3M in Funding

Flame digital art generated by stable diffusion.

We're happy to announce that Tracel AI, the startup behind the Burn Deep Learning Framework, has secured significant funding to advance its mission of democratizing compute resources for AI development.

Nathaniel Simard

Louis Fortier-Dubois

Mon Aug 04 2025

Announcing Burn-LM (alpha): LLM Inference Engine

Digital art generated by stable diffusion.

We're happy to announce the Burn-LM, an LLM inference engine based on Burn! The goal is to support any large model, LLM, VLM, and others, for inference but also for training (pre-training, post-training, and fine-tuning).

Nathaniel Simard

Mon Feb 10 2025

Why Quantization Matters

Digital art generated by stable diffusion.

Modern deep learning models, such as large language models (LLMs), are heavily constrained by memory bandwidth. GPUs can execute floating-point operations (FLOPs) much faster than they can fetch weights from memory. For instance, an NVIDIA A10 has a peak computation throughput of 125 TFLOPS and a memory bandwidth of 600GB/s.

Guillaume Lagrange

Mon Dec 23 2024

Going Big and Small for 2025

Digital art generated by stable diffusion.

2024 marked a significant evolution in Burn's architecture. Traditional deep learning frameworks often require developers to compromise between performance, portability, and flexibility; we aimed to transcend these trade-offs. Looking ahead to 2025, we are committed to applying this philosophy across the entire computing stack, encompassing everything from embedded devices to data centers.

Louis Fortier-Dubois

Sun Oct 20 2024

Becoming the Fastest: Introduction

Digital art generated by stable diffusion.

In the rapidly evolving landscape of artificial intelligence, one truth stands paramount: size matters. However, the future of AI shouldn't be constrained by hardware monopolies or software limitations, and this is where Burn and CubeCL come in.

Nathaniel Simard

Technical Posts

Fri Jul 18 2025

State-of-the-Art Multiplatform Matrix Multiplication Kernels

Digital art generated by stable diffusion.

We implemented a sophisticated matrix multiplication engine in CubeCL that rivals the performance of cuBLAS and CUTLASS while supporting a wider range of GPUs. Leveraging double buffering, tensor cores, and vectorization, it compiles seamlessly to CUDA, ROCm, WebGPU, Metal, and Vulkan backends without relying on proprietary or third-party binaries. Matrix multiplication is central to modern AI workloads, especially transformers, and optimizing it ourselves was essential to enable kernel fusion and achieve state-of-the-art performance across platforms in a deep learning framework.

Louis Fortier-Dubois

Nathaniel Simard

Wed Jan 15 2025

Improve Rust Compile Time by 108X

Digital art generated by stable diffusion.

We started with a compilation time of 108 seconds for the matmul benchmarks, which was reduced to only 1 second after all the optimizations. The most effective optimization was the element-type generics swap, where we instantiated generic functions with predefined "faked" element types to reduce the amount of LLVM code generated. The second optimization also had a major impact, further reducing the compilation time by nearly 3×. This was achieved by using our comptime system instead of associated const generics to represent the matmul instruction sizes. Finally, the last optimization—also the simplest—was to reduce the LLVM optimization level to zero, which is particularly useful for debug builds, such as tests.

Nathaniel Simard

Tue Mar 19 2024

Optimal Performance without Static Graphs by Fusing Tensor Operation Streams

Space digital art generated by stable diffusion.

This post explores Burn's tensor operation stream strategy, optimizing models through an eager API by creating custom kernels with fused operations. Our cusotm GELU experiment reveals a remarkable improvement of up to 78 times on our WGPU backend.

Nathaniel Simard

Fri Dec 15 2023

Autotune for GPU Kernels: Ensuring Consistent Peak Performance

Space digital art generated by stable diffusion.

Crafting high-performance GPU kernels for common deep learning operations, such as matrix multiplication (matmul) and reduction, requires finesse. The speed of these kernels varies depending on input shapes and the GPU device in use, meaning the fastest one may change based on the context. In Burn, Autotune automates the task of dynamically performing kernel selection, allowing one to create a plethora of kernel variations with confidence that the best-performing one will be executed in every situation.

Louis Fortier-Dubois

Tue Nov 07 2023

Creating High Performance Asynchronous Backends With Burn-Compute

Space digital art generated by stable diffusion.

Developing new high-performance deep learning backends in Burn has become remarkably easy, as it can be readily enhanced with advanced capabilities such as asynchronous computations, intelligent memory management, and autotuning mechanisms. The innovative Burn-Compute crate lays the architectural foundation for in-house backends, effortlessly equipping them with advanced features to maximize efficiency.

Louis Fortier-Dubois

Tue Jul 25 2023

Burn's New Cross-Platform GPU Backend

Space digital art generated by stable diffusion.

Introducing Burn's new Cross-Platform GPU Backend built using WGPU. Burn now supports running deep learning models on a variety of hardware configurations, leveraging graphics APIs such as Vulkan, DirectX 11/12, Metal, OpenGL, and WebGPU. We discuss the possible applications in various domains and glimpse into the promising future of the framework.

Nathaniel Simard

Louis Fortier-Dubois

More

Tue Mar 21 2023

Reduced Memory Usage: Burn's Rusty Approach to Tensor Handling

Space digital art generated by stable diffusion.

The latest release of Burn includes significant changes to its memory management strategy, and tensor-allocated memory can now be reused way more often. Overall, these changes significantly reduce memory usage, especially on the CPU compared to PyTorch.

Nathaniel Simard

Sat Feb 11 2023

A Case for Rust in Deep Learning

Space digital art generated by stable diffusion.

In this blog post, we'll explore the case for Rust in deep learning and why it may be a better option than Python. With its ability to handle complexity through safe and concurrent abstractions, Rust has the potential to tackle this field's biggest challenges in a way that Python cannot.

Nathaniel Simard

Tutorials

Fri Aug 30 2024

Building Blocks #1: Dataset & Data Loading

Building blocks digital art generated by stable diffusion.

Burn provides key components that serve as the building blocks of the framework and your deep learning projects. The first entry in the Building Blocks series explores the dataset and batcher traits, and how they fit into Burn's data loading process.

Guillaume Lagrange

Tue Sep 17 2024

Transitioning From PyTorch to Burn

Code block digital art generated by stable diffusion.

In this updated tutorial, we'll implement the popular ResNet family of models and import ImageNet pre-trained weights available online.

Guillaume Lagrange

Release Notes

Tue Oct 28 2025

Burn 0.19.0 Release: Quantization, Distributed Training, and LLVM Backend

Flame digital art generated by stable diffusion.

This release brings true distributed training across multiple GPUs with concurrent stream execution and gradient synchronization, comprehensive quantization support (INT4/INT8) for memory-efficient models, a new LLVM-powered CPU backend with SIMD optimization, enhanced safetensors storage with PyTorch interoperability, and a refactored training loop supporting various distributed strategies including DDP.

Nathaniel Simard

Guillaume Lagrange

Fri Jul 18 2025

Burn 0.18.0 Release Notes

Flame digital art generated by stable diffusion.

This release marks a significant step forward in performance, reliability, and optimization, ensuring a more robust and efficient system for our users.

Guillaume Lagrange

Thu Apr 24 2025

Burn 0.17.0 Release Notes

Flame digital art generated by stable diffusion.

This release brings major upgrades in performance and platform compatibility (most notably, a new Metal backend via WGPU passthrough). CubeCL now powers backends for Cuda, Metal, Rocm, Vulkan and WebGpu. Tensor operation fusion support has been greatly expanded to optimize element-wise, reductions and matmul operations.

Guillaume Lagrange

Tue Jan 14 2025

Burn 0.16.0 Release Notes

Flame digital art generated by stable diffusion.

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Guillaume Lagrange

Mon Oct 28 2024

Burn 0.15.0 Release Notes

Flame digital art generated by stable diffusion.

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Guillaume Lagrange

Tue Aug 27 2024

Burn 0.14.0 Release Notes

Flame digital art generated by stable diffusion.

This release marks the debut of our CubeCL integration, which brings cross-platform GPU programming capabilities directly to Rust. As always, it also includes numerous bug fixes, performance enhancements, new tensor operations, and improved documentation.

Nathaniel Simard

More

Fri Apr 12 2024

Burn 0.13.0 Release Notes

Flame digital art generated by stable diffusion.

Burn 0.13 introduces major performance enhancements, new tensor operations, improved autodiff, Just-in-Time backend refactoring, and numerous feature additions across modules, optimizers, and backends.

Nathaniel Simard

Resources

Home Get Started Benchmarks Blog Learn Docs Burn Book

Community

Github Discord Mailing list Tracel

Projects

Burn Crate Burn GitHub CubeCL Crate CubeCL GitHub

Company

Tracel AI Careers LinkedIn X

Copyright 2025 © Burn | Tracel Inc. All rights reserved. Design by