- Are you providing reasoning traces, responses or both?
- Are you evaluating reasoning traces, responses or both?
- Has your work shifted towards multi-turn or long horizon tasks?
- If you also work with chat logs of actual users, do you think that they are properly anonymized? Or do you believe that you could de-anonymize them without major efforts?
- Do you have contact to other evaluators?
- How do you (and your colleagues) feel about the work (e.g., moral qualms because "training your replacement" or proud because furthering civilization, or it's just about the money...)?
No comments yet.