My anecdotal experience is GPT 5.2 Pro is decently ahead of Claude Opus 4.5 in this category when it gets to the tricky stuff, both in presentation and accuracy. The long reasoning seems to help a lot. But, apparently the benchmarks do not agree.
Edit - noticed OpenAI specifically focuses on finance use cases in their gpt-5.3-codex blog as well https://openai.com/index/introducing-gpt-5-3-codex/
The non-deterministic part is turning human instructions ("calculate the NPV over 10 years for X given Y") into Excel.
This is already a non-deterministic process (humans are non-deterministic!). The question is if an AI model can be more reliable than humans, and I can't see any reason why it wouldn't be.
The correct path is pretty clear, so the logits for following that path are going to be a long way from off-path.
For something like this the real problem is training the model to use Excel (which will show up by it being confused which sheet it is on or trying to use the wrong window or things like that), not the non-determinism.
It's not like these models calculate.
The Journal of Accountancy - "Bugged by Excel’s calculation errors" - https://www.journalofaccountancy.com/issues/2014/mar/excel-c...
ICAEW (Institute of Chartered Accountants) — "Rounding Errors Revisited" (Excel Tips #447) - https://www.icaew.com/technical/technology/excel-community/e...
Just a reminder that an accountant who might say "I use Currency format" is still working in binary floating point as the format is just used as a display mask. And using VBA macros with the Currency type will hit problems at the boundary, when values move between the worksheet and the macro. The tool is broken in a way that proper accounting software is not...
"Floating-point arithmetic may give inaccurate results in Excel" - https://learn.microsoft.com/en-us/troubleshoot/microsoft-365...
I remember over hearing some normal people on the bus talking about essentially orchestrating some agent scraper to pull and summarise news from 40 different sites he identified as important which put him quite ahead of his peers. These were non-technical people orchestrating an agent workflow to make them better at work.
Though there’s not much that tickles my software brain here. But the agents are coming for us all.
It does look really promising as a skeleton starting point though. Like generate it, delete numbers and populate by hand.
Not unlike the boilerplate start we saw in AI coding a couple years back
This is called 'Month End Close' in accountant speak.
Disclaimer: I use AI to code (and I code for finance) and I love Anthropic.
But: for f-ck's sake, I cannot click on the picture and have it show up in full. It stays at its tiny size, impossible to read the numbers. I had to right-click and "open in a new tab".
AI is, somehow, definitely still not fully there yet.
Since demand is so insane high. We will just get into an equilibrium.
Not like developers, where like 25 percent of people want to do something with IT...