The paper says "insufficient data" for helpfulness for most positive categories (but leans more positive than negative just doesn't reach 95% confidence), but also insufficient data on most negative categories. It finds 5 conditions it's helpful for, and 3 it hurts for.