> This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.
This is really scary. Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though? I feel like we're all finding this out together. They're probably adding guard rails as we speak.