What makes you think thy've had a tough time scaling with early demand? I'd think that this scales horizontally pretty well, given that each request is largely stateless and there's no interaction between users.
But according to this article, the server seems to only translate speech to text, with the whole natural language processing and AI happening on the device.
I'm talking about the Siri outages at peak times, which seem to have subsided for now but indicate that they weren't ready for even the demand they've had.