Oh man, good question. I'm always up for swapping these stories. A lot of these came from a paper on weird AI tricks, and resulting best-of list on a blog collecting these stories.[1][2] Suffice to say, the people who think the orthogonality thesis is a weird hypothetical aren't keeping up with the state of things.
- The aforementioned Tetris story: an undirected learner was set to maximize score at Tetris learned normal play techniques, but also learned to pause the game immediately before losing so that the score wouldn't "decline" at game over.
- In the same vein as interns quitting, proxy detection of all sorts. Identify "field with sheep" by finding green fields with grey skies, or letting heuristics like "humans pick up dogs and cats" override correct identifications. (It's a goat until you pick it up, then it's a dog!)
- An agent playing Q*bert found a known bug for infinite lives, then escalated to an unknown bug which disabled the game while overflowing the score counter.
- Agents in a physics sim tasked with jumping as high as possible instead learned to 'fly' by abusing collision detection bugs, hitting themselves in ways that created upward momentum.
- Another "maximize jump height" task demonstrated that "highest" is an extremely fuzzy term. Initially measured by highest point, they became incredible tall. Measured by lowest point, they stayed tall and grew topheavy to 'kick' their base upwards.
- Number-handling bugs of all kinds. In one case, small twitches led to floating-point errors that created energy. In another, a "minimize force" task got solved by maximizing force and triggering integer wraparound.
My personal favorite is an adversarial bug. An agent playing tic-tac-toe on an infinite grid with a time limit submitted extremely remote moves which caused timeouts/crashes in any agent that tried to model the full board.
[1] https://arxiv.org/pdf/1803.03453.pdf
[2] https://aiweirdness.com/post/172894792687/when-algorithms-su...