undefined | Better HN

0 pointsbojangleslover4y ago0 comments

How do you explain the accuracy going up with more samples then?

Also, it wasn't all 90/10. It was all pairwise:

Among men, the classification accuracy equaled AUC = .81 when provided with one image per person. This means that in 81% of randomly selected pairs—composed of one gay and one heterosexual man—gay men were correctly ranked as more likely to be gay. The accuracy grew significantly with the number of images available per person, reaching 91% for five images. The accuracy was somewhat lower for women, ranging from 71% (one image) to 83% (five images per person).

0 comments

advael4y ago

Why do you think that defining accuracy in relative terms works in favor of this model? This pairwise relative measure should give you less confidence that the model generalizes, because now we don't even have an idea of what the relative levels of confidence given by the models between these pairs are, just that they were ordered correctly. This further supports my claim that the way they're measuring results is designed to make them appear more significant than is justified

Explaining the model becoming more "accurate" by this measure is pretty easy. The model is working with an extremely small and skewed dataset for this sort of thing, and has overtrained to tendencies in the dataset. Given the kinds of numbers we're working with and that measure, a jump from 81 to 91% "accuracy" does not seem particularly significant, especially given that, again, the classifier fails meet even the baseline of accuracy we need under a more realistic accuracy measurement to beat a null hypothesis, and probably this baseline would need to be even higher to reflect the lower statistical power of this standard of accuracy.

In any real-world application, this classifier would need to make a judgement in situ based on some threshold of confidence. From that perspective, this metric is worse than useless, because while it doesn't really demonstrate that the result is even as significant as the (again, not meeting the base rate) thresholds described in the summary of it, this methodological smoke and mirrors has seemingly convinced you after reading it more thoroughly. I imagine this is similar to the process by which these systems are sold to investors

j / k navigate · click thread line to collapse

0 pointsbojangleslover4y ago0 comments

How do you explain the accuracy going up with more samples then?

Also, it wasn't all 90/10. It was all pairwise:

0 comments

advael4y ago

j / k navigate · click thread line to collapse