What Dice Misses: Size Penalty Loss

Abstract Traditional overlap metrics like the Dice Similarity Coefficient intrinsically bias optimization toward large lesion volumes, systematically neglecting micro-embolic lesions ($< 5 \text{ mL}$). We introduce the Size Penalty Loss (SPL), an auxiliary loss function that explicitly conditions volume regularization on ground-truth lesion size. By employing a continuous exponential size-decay function, SPL applies a massive corrective gradient specifically to missed small lesions while leaving large, well-segmented lesions unaffected.

The objective of this interactive document is to mathematically dissect and visually demonstrate the behaviour of the Size Penalty Loss function. The core premise is that relative volume errors (e.g., a $10\times$ over-prediction) should not be penalized equally across all scales. A $10\times$ over-prediction on a 50-voxel lesion is clinically critical, whereas a slight boundary error on a 2000-voxel lesion might yield the same relative mismatch but is clinically benign.

$$ \mathcal{L}_{\text{size}}(V_p, V_g) = \min\left(c, \exp\left(-\frac{V_g}{\tau_s}\right) \cdot \frac{|V_p - V_g|}{V_g + \varepsilon} \right) $$

1. The Dice Blindspot: A Synchronized Simulation

The visualization below (Figure 1) compares the training dynamics of two identical networks over 150 epochs. Network A is trained exclusively with the Dice Loss. Network B is trained with Dice + SPL.

Observe how the predicted volume boundaries (filled shapes) actively grow toward the ground-truth boundaries (green outlines). The numbers attached to the boundaries dynamically track the predicted voxel counts $V_p$. Dice successfully captures the large lesion (800 voxels) in both networks. However, because the small lesion (50 voxels) contributes negligibly to the aggregate Dice score, Network A's gradient for it collapses, leaving it missed. Network B, guided by SPL, successfully forces the predicted boundary of the small lesion to converge.

Epoch: 0 / 150

Model A (Dice Only)
Dice: 0.000

Model B (Dice + SPL)
Dice: 0.000
SPL ∇: Active

Figure 1: Synchronized training simulation. The green static outlines represent the ground truth volumes ($V_g$). The filled areas and accompanying floating text represent the model's current predicted volumes ($V_p$) at the given epoch. Notice how the predicted boundary for the small lesion in Model A stagnates, while Model B's SPL gradient actively expands the boundary to match the ground truth.

2. Interactive Parameter Space Analysis

To understand why SPL successfully corrects the small lesion, we must examine its formulation. The loss is composed of a continuous size weight $W(V_g) = \exp\left(-\frac{V_g}{\tau_s}\right)$ and a relative volume error $E_{rel} = \frac{|V_p - V_g|}{V_g + \varepsilon}$. Use the sliders below to observe how the gradient magnitude scales dramatically for small lesions compared to large lesions, even when the relative error ratio is identical.

G.T. Volume ($V_g$) 50

Pred Volume ($V_p$) 500

Tau ($\tau_s$) 600

Ratio ($V_p/V_g$): 10.00

Weight $W(V_g)$: 0.9200

Error $E_{rel}$: 9.0000

Loss $\mathcal{L}_{\text{size}}$: 8.2800

Figure 2: Component decomposition. The chart visualizes the exponential weight $W(V_g)$ (blue line). The red dot indicates the current $V_g$ position. As $V_g$ increases towards the right, the penalty weight decays rapidly, completely separating the scale of optimization pressure applied to micro-lesions versus macro-lesions.

3. Optimization Surface and Convergence

We can visualize this size-conditioned optimization as gradient descent on a 2D loss landscape. The convergence speed is directly proportional to $W(V_g)$. Click anywhere on the plot below to spawn a new particle and observe its descent trajectory in real-time. Notice how particles dropped on the left ($V_g < 500$) plummet to the optimal manifold instantly, whereas particles on the right ($V_g > 2000$) experience severe gradient decay. The faint red curves represent the contour lines of the loss surface, visually demonstrating the steep "valley" for small lesions that widens into a flat plain for large lesions.

(Click canvas to spawn)

Figure 3: Interactive Loss Landscape $\mathcal{L}_{\text{size}}(V_g, V_p/V_g)$. Particles simulate gradient descent from mispredicted ratios toward the optimal manifold (Ratio = 1.0). The vertical position represents the error ratio, while the horizontal position represents ground-truth size. By clicking to spawn particles, you can empirically verify that the gradient magnitude (descent speed) is strictly governed by the exponential size weight $W(V_g)$.

4. The "Aha!" Moment: Complementary Optimization Spectra

The true elegance of SPL lies in its complementary relationship with traditional overlap metrics. If we compute the theoretical per-voxel gradient magnitude for a missed lesion, we observe a stark divergence. Dice gradients are intrinsically holistic—if a patient has one massive lesion ($V_{\text{bg}} = 2000$), the Dice gradient for a newly emerging micro-lesion is heavily suppressed by the global intersection sum. SPL, however, is isolated. It injects a massive, localized optimization pressure precisely where the Dice gradient collapses.

Figure 4: Gradient Magnitude Complementarity. The y-axis (log scale) shows the per-voxel gradient magnitude for a missed lesion of size $V_g$. The Dice gradient (blue) remains relatively flat and suppressed due to the presence of a 2000-voxel background lesion. The SPL gradient (red) acts as an explicit "micro-lesion amplifier," crossing over to dominate the optimization dynamics exactly in the $< 5 \text{ mL}$ regime.

Cite this work

If you found this visualization or research helpful, please consider citing our work:

@article{joshi2026spl,
    title={What Dice Misses: Size-Stratified Volume Regularization for Ischemic Stroke Lesion Segmentation},
    author={Joshi, Mohit},
    year={2026},
    institution={Koita Centre for Digital Health, IIT Bombay}
}