Module attention

Module attention 

Expand description

Module with attention operations.

Functions§

naive_attention
Computes softmax(QKᵗ / √d) · V using separate kernels. Serves as a fallback when FlashAttention is not used.