1Show HN: Open Operator Evals – real-world benchmarks for LLM web agents (opens in new tab)(github.com)3monoid739mo ago1