A few percent of a second is a few milliseconds, so no worries there, that's at the very edge of "audio visual desync" perception. For huge bundles, of course, a few percent of a few seconds can hit 100ms or more, but even that's barely noticable compared to how long we're already waiting for the bundle to download.
The bulk of the argument in favour of ESM in a "bundle vs ESM" comparison is in the cost of downloading updates: redownloading a individual ESM files (even several of them) is going to be appreciably faster than redownloading an entire bundle (even if the dependencies are split out into their own chunks and don't change).