Assuming they have say, 5000 front end instances, thats 5000 file descriptors being used just for this, before you are even talking about whatever threads the application needs.
It’s not surprising that they bumped into ulimits, though as part of OS provisioning, you typically have those tuned for workload.
More concerning is the 5000 x 5000 amount of open tcp sessions across their network to support this architecture. This has to be a lot of fun on any stateful firewall it might cross.