burn::nn::loss

Struct CTCLoss

pub struct CTCLoss { /* private fields */ }

Expand description

Computes the Connectionist Temporal Classification (CTC) loss.

Calculates the loss between a continuous (unsegmented) time series and a target sequence. CTC sums over the probability of all possible alignments of the input to the target, producing a loss value that is differentiable with respect to each input node.

The input to this loss is expected to be log-probabilities (e.g,, via log_softmax), not raw logits.

§References

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

§Example

use burn::tensor::{Tensor, Int};
use burn::tensor::activation::log_softmax;
use burn::nn::loss::{CTCLossConfig, CTCLoss};

let device = Default::default();

// Initialize CTC Loss with default configuration
let ctc_loss = CTCLossConfig::new().init();

// Initialize CTC Loss with custom configuration
let ctc_loss = CTCLossConfig::new()
    .with_blank(1)
    .with_zero_infinity(true)
    .init();

// Prepare inputs (Logits shape: [Time, Batch, Class])
// In your actual code, the logits would be the output of your model
let logits = Tensor::<B, 3>::ones([10, 2, 5], &device);
let log_probs = log_softmax(logits, 2);

// Targets shape: [Batch, Max_Target_Len]
// Note: Targets should not contain the blank index (1).
let targets = Tensor::<B, 2, Int>::from_data([[0, 2], [3, 4]], &device);

// Lengths shape: [Batch]
let input_lengths = Tensor::<B, 1, Int>::from_data([10, 8], &device);
let target_lengths = Tensor::<B, 1, Int>::from_data([2, 2], &device);

// Compute loss
let loss = ctc_loss.forward(log_probs, targets, input_lengths, target_lengths);

Implementations§

§

impl CTCLoss

pub fn forward( &self, log_probs: Tensor<B, 3>, targets: Tensor<B, 2, Int>, input_lengths: Tensor<B, 1, Int>, target_lengths: Tensor<B, 1, Int>, ) -> Tensor<B, 1>
where B: Backend,

Computes the CTC loss for the input log-probabilities and targets with no reduction applied.

§Arguments

log_probs: The log-probabilities of the outputs (e.g., from log_softmax).
targets: A 2D tensor containing the target class indices. These indices should not include the blank index used in CTC loss. The targets are padded to the length of the longest sequence.
input_lengths: A 1D tensor containing the actual length of the input sequence for each batch. This allows retrieving the actual sequence of log-probabilities from log_probs if the batch contains sequences of varying lengths.
target_lengths: A 1D tensor containing the actual length of the target sequence for each target sequence in targets.

§Returns

A 1D tensor of shape [batch_size] containing the loss for each sample.

§Shapes

log_probs: [time_steps, batch_size, num_classes] where num_classes includes blank.
targets: [batch_size, max_target_length]
input_lengths: [batch_size]
target_lengths: [batch_size]

pub fn forward_with_reduction( &self, log_probs: Tensor<B, 3>, targets: Tensor<B, 2, Int>, input_lengths: Tensor<B, 1, Int>, target_lengths: Tensor<B, 1, Int>, reduction: Reduction, ) -> Tensor<B, 1>
where B: Backend,

Computes the CTC loss for the input log-probabilities and targets with reduction.

§Arguments

log_probs: The log-probabilities of the outputs (e.g., from log_softmax).
targets: A 2D tensor containing the target class indices. These indices should not include the blank index used in CTC loss. The targets are padded to the length of the longest sequence.
input_lengths: A 1D tensor containing the actual length of the input sequence for each batch. This allows retrieving the actual sequence of log-probabilities from log_probs if the batch contains sequences of varying lengths.
target_lengths: A 1D tensor containing the actual length of the target sequence for each target sequence in targets.
reduction: The reduction stratey to apply to the loss tensor containing the CTC loss values for each sample (e.g., mean, sum). For the mean reduction strategy, the output losses will be divided by the target lengths and then the mean over the batch is taken. This follows PyTorch’s behavior.

§Returns

A 1D tensor of shape [1] containing the reduced loss value.

§Shapes

log_probs: [time_steps, batch_size, num_classes] where num_classes includes blank.
targets: [batch_size, max_target_length]
input_lengths: [batch_size]
target_lengths: [batch_size]

§Panics

If reduction is not one of Reduction::Auto, Reduction::Mean, and Reduction::Sum.
If blank index is greater than or equal to num_classes.
If the batch dimension of log_probs, targets, input_lengths, and target_lengths do not match.