attention_fallback

Function attention_fallback 

pub fn attention_fallback<B>(
    query: <B as BackendTypes>::FloatTensorPrimitive,
    key: <B as BackendTypes>::FloatTensorPrimitive,
    value: <B as BackendTypes>::FloatTensorPrimitive,
    mask: Option<<B as BackendTypes>::BoolTensorPrimitive>,
    attn_bias: Option<<B as BackendTypes>::FloatTensorPrimitive>,
    options: AttentionModuleOptions,
) -> <B as BackendTypes>::FloatTensorPrimitive
where B: Backend,
Expand description

Computes softmax(QKᵗ * scale) · V using separate kernels. Serves as a fallback when FlashAttention is not used.