You could set up a collection in memory/SQL with a Status + UpdatedUtc column and poll the collection for incomplete items each loop until all are in the desired state.
Your state machine could be as simple as: New, Processing, Failed, Succeeded. Outer loop will query the collection every ~second for items that are New or Failed and retry them. Items that are stuck Processing for more than X seconds should be forced to Failed each loop through (you'll retry them on the next pass). Each state transition is written to a log with timestamps for downstream reporting. Failures are exclusively set by the HTTP processing machinery with timeouts being detected as noted above.
Using SQL would make iterations of your various batch processing policies substantially easier. Using a SELECT statement to determine your batch at each iteration would permit adding constraints over aggregates. For example, you could cap the # of simultaneous in-flight requests, or abandon all hope & throw if the statistics are looking poor (aka OpenAI outage).