Struct TransformerDecoderConfig
pub struct TransformerDecoderConfig {
pub d_model: usize,
pub d_ff: usize,
pub n_heads: usize,
pub n_layers: usize,
pub dropout: f64,
pub norm_first: bool,
pub quiet_softmax: bool,
pub initializer: Initializer,
pub activation: ActivationConfig,
pub layer_norm_eps: f64,
}Expand description
Configuration to create a Transformer Decoder layer using the init function.
Fields§
§d_model: usizeThe size of the model.
d_ff: usizeThe size of the position-wise feed-forward network.
n_heads: usizeThe number of attention heads.
n_layers: usizeThe number of layers.
dropout: f64The dropout rate. Default: 0.1
norm_first: boolLayer norm will be applied first instead of after the other modules.
quiet_softmax: boolUse “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
initializer: InitializerThe type of function used to initialize neural network parameters
activation: ActivationConfigThe activation function used in the position-wise feed-forward network. Default: Gelu
layer_norm_eps: f64The epsilon value for layer normalization. Default: 1e-5
Implementations§
§impl TransformerDecoderConfig
impl TransformerDecoderConfig
pub fn new(
d_model: usize,
d_ff: usize,
n_heads: usize,
n_layers: usize,
) -> TransformerDecoderConfig
pub fn new( d_model: usize, d_ff: usize, n_heads: usize, n_layers: usize, ) -> TransformerDecoderConfig
Create a new instance of the config.
§Arguments
§Required Arguments
§d_model
The size of the model.
§d_ff
The size of the position-wise feed-forward network.
§n_heads
The number of attention heads.
§n_layers
The number of layers.
§Default Arguments
§dropout
The dropout rate. Default: 0.1
- Defaults to
0.1
§norm_first
Layer norm will be applied first instead of after the other modules.
- Defaults to
false
§quiet_softmax
Use “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
- Defaults to
false
§initializer
The type of function used to initialize neural network parameters
- Defaults to
"Initializer::KaimingUniform{gain:1.0/num_traits::Float::sqrt(3.0), fan_out_only:false}"
§activation
The activation function used in the position-wise feed-forward network. Default: Gelu
- Defaults to
"ActivationConfig::Gelu"
§layer_norm_eps
The epsilon value for layer normalization. Default: 1e-5
- Defaults to
1e-5
§impl TransformerDecoderConfig
impl TransformerDecoderConfig
pub fn with_dropout(self, dropout: f64) -> TransformerDecoderConfig
pub fn with_dropout(self, dropout: f64) -> TransformerDecoderConfig
pub fn with_norm_first(self, norm_first: bool) -> TransformerDecoderConfig
pub fn with_norm_first(self, norm_first: bool) -> TransformerDecoderConfig
Sets the value for the field norm_first.
Layer norm will be applied first instead of after the other modules.
- Defaults to
false
pub fn with_quiet_softmax(self, quiet_softmax: bool) -> TransformerDecoderConfig
pub fn with_quiet_softmax(self, quiet_softmax: bool) -> TransformerDecoderConfig
Sets the value for the field quiet_softmax.
Use “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
- Defaults to
false
pub fn with_initializer(
self,
initializer: Initializer,
) -> TransformerDecoderConfig
pub fn with_initializer( self, initializer: Initializer, ) -> TransformerDecoderConfig
Sets the value for the field initializer.
The type of function used to initialize neural network parameters
- Defaults to
"Initializer::KaimingUniform{gain:1.0/num_traits::Float::sqrt(3.0), fan_out_only:false}"
pub fn with_activation(
self,
activation: ActivationConfig,
) -> TransformerDecoderConfig
pub fn with_activation( self, activation: ActivationConfig, ) -> TransformerDecoderConfig
Sets the value for the field activation.
The activation function used in the position-wise feed-forward network. Default: Gelu
- Defaults to
"ActivationConfig::Gelu"
pub fn with_layer_norm_eps(
self,
layer_norm_eps: f64,
) -> TransformerDecoderConfig
pub fn with_layer_norm_eps( self, layer_norm_eps: f64, ) -> TransformerDecoderConfig
Sets the value for the field layer_norm_eps.
The epsilon value for layer normalization. Default: 1e-5
- Defaults to
1e-5
§impl TransformerDecoderConfig
impl TransformerDecoderConfig
pub fn init<B>(
&self,
device: &<B as BackendTypes>::Device,
) -> TransformerDecoder<B>where
B: Backend,
pub fn init<B>(
&self,
device: &<B as BackendTypes>::Device,
) -> TransformerDecoder<B>where
B: Backend,
Initialize a new Transformer Decoder module.
Trait Implementations§
§impl Clone for TransformerDecoderConfig
impl Clone for TransformerDecoderConfig
§fn clone(&self) -> TransformerDecoderConfig
fn clone(&self) -> TransformerDecoderConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more§impl Config for TransformerDecoderConfig
impl Config for TransformerDecoderConfig
§impl Debug for TransformerDecoderConfig
impl Debug for TransformerDecoderConfig
§impl<'de> Deserialize<'de> for TransformerDecoderConfig
impl<'de> Deserialize<'de> for TransformerDecoderConfig
§fn deserialize<D>(
deserializer: D,
) -> Result<TransformerDecoderConfig, <D as Deserializer<'de>>::Error>where
D: Deserializer<'de>,
fn deserialize<D>(
deserializer: D,
) -> Result<TransformerDecoderConfig, <D as Deserializer<'de>>::Error>where
D: Deserializer<'de>,
§impl Display for TransformerDecoderConfig
impl Display for TransformerDecoderConfig
§impl Serialize for TransformerDecoderConfig
impl Serialize for TransformerDecoderConfig
§fn serialize<S>(
&self,
serializer: S,
) -> Result<<S as Serializer>::Ok, <S as Serializer>::Error>where
S: Serializer,
fn serialize<S>(
&self,
serializer: S,
) -> Result<<S as Serializer>::Ok, <S as Serializer>::Error>where
S: Serializer,
Auto Trait Implementations§
impl Freeze for TransformerDecoderConfig
impl RefUnwindSafe for TransformerDecoderConfig
impl Send for TransformerDecoderConfig
impl Sync for TransformerDecoderConfig
impl Unpin for TransformerDecoderConfig
impl UnwindSafe for TransformerDecoderConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<C> CloneExpand for Cwhere
C: Clone,
impl<C> CloneExpand for Cwhere
C: Clone,
fn __expand_clone_method(&self, _scope: &mut Scope) -> C
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more§impl<T> Pointable for T
impl<T> Pointable for T
§impl<T> ToCompactString for Twhere
T: Display,
impl<T> ToCompactString for Twhere
T: Display,
§fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
ToCompactString::to_compact_string()] Read more§fn to_compact_string(&self) -> CompactString
fn to_compact_string(&self) -> CompactString
CompactString]. Read more