Okay, this has been a ride.
Reverted to 0.8.0 of ollama, switched to gemma3:12b-it-qat for the model, set the seed to 42 and the temp to 0, and used my old prompt. This way I was able to get consistent results everywhere, and could confirm from old screenshots everything still matches.
Prompt and output here: https://pastebin.com/xUi3bbGh
However, when using the prompt I used previously in this thread, I'm getting a different response between machines, even with the temp and seed pinned. On the same machine, I initially found that it's reliably the same, but after running it a good few times more, I was eventually able to get the flip-flopping behavior you describe.
API wise, I just straight up wasn't able to get consistent results at all, so that was a complete bust.
Ultimately, it seems like I successfully fooled myself in the past and accidentally cherry picked an example? Or at least it's way more brittle than I thought. At this point I'd need significantly more insight into how the inference engine (ollama) works to be able to definitively ascertain whether this is a model or an engine trait, and whether it is essential for the model to work (although I'm still convinced it isn't). Not sure if that helps you much in practice though.
I wouldn't make a good scientist, apparently :)