Overview
This is a huge release with tons of improvements and new features.
Lots of work has been done in the autodiff system where gradient checkpointing is now supported. It allows recomputing the forward pass of some operations instead of saving their result. Not only can this save a lot of memory usage during training, it also composes gracefully with kernel fusion during the backward pass. This release also introduces the new burn-jit project, which allows to create new backends that can be compiled to any GPU shader language while automatically supporting all our optimizations. We ported the WGPU backend to this new representation, and new targets should be coming soon. Stay tuned for the next releases. We also put a lot of care into improving the user APIs. You don't need to implement both init and init_with methods for optimized parameter initialization, since they are now lazy. In addition, it's now easier to switch between backends and precision types at runtime using the new backend bridge. Those improvements were based on the community feedback, and we are committed to continuously improving the APIs.
Core User APIs
A major change in this release is that most Burn types no longer implement the Sync trait, such as modules, optimizers, and tensors. This change should not impact users of the Learner struct for model training. However, it may affect those who implemented their own training loop and inference server. While modules, optimizers and tensors can be sent to other threads, they cannot be accessed concurrently by multiple threads. This aligns with Burn's workflow, where each tensor operation requires an owned version of the tensor. The change was made to safely reduce the number of locks needed when modifying the state of the autodiff graph, fusion state, allocation cache, and various other use cases. While not all locks have been removed, the type signature no longer poses a problem for follow-up optimizations. Note that the same tensor can still be sent to multiple threads without copying the underlying data. However it will require cloning before sending a tensor to a thread.
Tensor
Module
Optimizer
Train
Backend
This release also introduces the backend bridge, a new mechanism for runtime switching between backends. While an improvement, it remains compatible with previous methods of supporting mixed precision.
JIT
Significant effort has been devoted over the past few months to refactor the previous Wgpu backend into a shader-agnostic Just-in-Time backend. All lower-level dependencies have been abstracted into the Just-in-Time Runtime trait, requiring a compiler, compute server, and storage.
Wgpu
Autodiff
Extensive work has also been undertaken on Burn's autodiff backend. The backend now supports gradient checkpointing to reduce memory usage and has been refactored into a client/server architecture. These updates result in significantly less blocking when tracking gradients, enhancing performance particularly on smaller models. Furthermore, various bugs have been fixed where some graph nodes weren't used, potentially truncating the autodiff graph. Overall, these changes make the autodiff process more reliable and efficient.