Struct Muon
pub struct Muon<B>where
B: Backend,{ /* private fields */ }Expand description
Muon optimizer.
Muon internally runs standard SGD-momentum, and then performs an orthogonalization post-processing step, in which each 2D parameter’s update is replaced with the nearest orthogonal matrix. For efficient orthogonalization we use a Newton-Schulz iteration, which has the advantage that it can be stably run in bfloat16 on the GPU.
§Important Notes
-
Only for 2D+ parameters: Muon is designed for weight matrices. Use AdamW or SGD for biases, embeddings, and layer norms.
-
Learning rate adjustment: Muon automatically adjusts the learning rate based on parameter shape. See
AdjustLrFnfor details. -
Weight decay timing: Unlike typical optimizers, Muon applies weight decay AFTER orthogonalization but uses the original (unadjusted) learning rate for it.
Trait Implementations§
§impl<B> SimpleOptimizer<B> for Muon<B>where
B: Backend,
impl<B> SimpleOptimizer<B> for Muon<B>where
B: Backend,
§fn step<const D: usize>(
&self,
lr: f64,
tensor: Tensor<B, D>,
grad: Tensor<B, D>,
state: Option<<Muon<B> as SimpleOptimizer<B>>::State<D>>,
) -> (Tensor<B, D>, Option<<Muon<B> as SimpleOptimizer<B>>::State<D>>)
fn step<const D: usize>( &self, lr: f64, tensor: Tensor<B, D>, grad: Tensor<B, D>, state: Option<<Muon<B> as SimpleOptimizer<B>>::State<D>>, ) -> (Tensor<B, D>, Option<<Muon<B> as SimpleOptimizer<B>>::State<D>>)
Perform a single Muon optimization step.
§Algorithm
- Apply momentum to gradient
- Orthogonalize update via Newton-Schulz
- Adjust learning rate based on parameter shape
- Apply weight decay (using original lr)
- Update parameter (using adjusted lr)
§Notes
Unlike typical optimizers, the weight decay and parameter update use different learning rates:
- Weight decay uses the original
lr - Parameter update uses the shape-adjusted
lr
§Panics
This function will panic if the input tensors are not 2D.
Auto Trait Implementations§
impl<B> Freeze for Muon<B>
impl<B> RefUnwindSafe for Muon<B>
impl<B> Send for Muon<B>
impl<B> Sync for Muon<B>
impl<B> Unpin for Muon<B>
impl<B> UnwindSafe for Muon<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more