After iterating on that for a while, I did a bunch manually (90) and then gave the LLM a list of pull requests as examples, and asked _it_ to write the prompt. It still failed.
Finally, I broke the problem up and started to ask it to generate tools to perform each step. It started to make progress - each execution gave me a new checkpoint so it wouldn't make new mistakes.