"What Dice Misses": Size-Stratified Volume Regularization for Ischemic Stroke Lesion Segmentation
The objective of this interactive document is to mathematically dissect and visually demonstrate the behaviour of the Size Penalty Loss function. The core premise is that relative volume errors (e.g., a $10\times$ over-prediction) should not be penalized equally across all scales. A $10\times$ over-prediction on a 50-voxel lesion is clinically critical, whereas a slight boundary error on a 2000-voxel lesion might yield the same relative mismatch but is clinically benign.
1. The Dice Blindspot: A Synchronized Simulation
The visualization below (Figure 1) compares the training dynamics of two identical networks over 150 epochs. Network A is trained exclusively with the Dice Loss. Network B is trained with Dice + SPL.
Observe how the predicted volume boundaries (filled shapes) actively grow toward the ground-truth boundaries (green outlines). The numbers attached to the boundaries dynamically track the predicted voxel counts $V_p$. Dice successfully captures the large lesion (800 voxels) in both networks. However, because the small lesion (50 voxels) contributes negligibly to the aggregate Dice score, Network A's gradient for it collapses, leaving it missed. Network B, guided by SPL, successfully forces the predicted boundary of the small lesion to converge.
Dice: 0.000
Dice: 0.000
SPL ∇: Active
2. Interactive Parameter Space Analysis
To understand why SPL successfully corrects the small lesion, we must examine its formulation. The loss is composed of a continuous size weight $W(V_g) = \exp\left(-\frac{V_g}{\tau_s}\right)$ and a relative volume error $E_{rel} = \frac{|V_p - V_g|}{V_g + \varepsilon}$. Use the sliders below to observe how the gradient magnitude scales dramatically for small lesions compared to large lesions, even when the relative error ratio is identical.
3. Optimization Surface and Convergence
We can visualize this size-conditioned optimization as gradient descent on a 2D loss landscape. The convergence speed is directly proportional to $W(V_g)$. Click anywhere on the plot below to spawn a new particle and observe its descent trajectory in real-time. Notice how particles dropped on the left ($V_g < 500$) plummet to the optimal manifold instantly, whereas particles on the right ($V_g > 2000$) experience severe gradient decay. The faint red curves represent the contour lines of the loss surface, visually demonstrating the steep "valley" for small lesions that widens into a flat plain for large lesions.
4. The "Aha!" Moment: Complementary Optimization Spectra
The true elegance of SPL lies in its complementary relationship with traditional overlap metrics. If we compute the theoretical per-voxel gradient magnitude for a missed lesion, we observe a stark divergence. Dice gradients are intrinsically holistic—if a patient has one massive lesion ($V_{\text{bg}} = 2000$), the Dice gradient for a newly emerging micro-lesion is heavily suppressed by the global intersection sum. SPL, however, is isolated. It injects a massive, localized optimization pressure precisely where the Dice gradient collapses.
Cite this work
If you found this visualization or research helpful, please consider citing our work:
@article{joshi2026spl,
title={What Dice Misses: Size-Stratified Volume Regularization for Ischemic Stroke Lesion Segmentation},
author={Joshi, Mohit},
year={2026},
institution={Koita Centre for Digital Health, IIT Bombay}
}