For binary labels: you take a slice of labeled data. The mean of the ML model prediction on this data is different from the mean of the label. In practice, often a synonym for "loss is worse / could be better".
Not sure if that's what the GP meant, I only worked with binary labels stuff.