One thing I
would be curious at generating from this data set, given the historical period spanning, would be correlation between expected battle outcome & actual battle outcome.
As someone who picked up a computational military modeling course in college, attempting to model ancient warfar is a vastly different task than modern battles.
My gut would be that modern warfare de-correlates more strongly from numeric advantage due to increased speed and lethality of available force types.
Also, for the author, if you wanted to be more accurate, start calculating actual expected outcomes from the forces. Lanchester's Laws are as good a place as any to start.
https://en.m.wikipedia.org/wiki/Lanchester%27s_laws