This, I think, sufficiently explains the slowness of terminals on Windows.
I started with computers more than 30 years ago, and have been a "power user" and coder for essentially all of that. In all that time and experience, I've never found any terminal to be slow, or found one to be noticeably "faster" than another.
That's pretty much where most of the terminal performance difference is on most platforms. Latency can be an issue on some, but really this was a mostly solved problem as far back as the Amiga.
I have a toy terminal written in Ruby + a tiny C extension to interface to raw xlib that currently draws individual characters and redraws the entire screen on scrolling, and even that is fast enough for most normal usage (but indeed slow at printing megabytes of text)
But getting a terminal fast enough, including on the "print megabytes of text" test boils down to a few simple principles (and you'll be "fast enough" without doing all of them even on pretty slow hardware):
* Render your text to a buffer, and only render to screen at intervals.
* Use non-blocking IO and read as much as you can into suitably sized buffers (aka: reduce pointless context switches)
* Scroll the bitmaps using whatever OS/toolkit provided functionality, rather than re-rendering the text like my stupid Ruby term.
* If you have multiple lines in your buffer, scroll once and render all of the new lines at once.
* If vector fonts, prefer to pre-render glyphs to a buffer rather than re-rendering every time.
That's about what I remember from rewriting parts of the AROS (AmigaOS replacement) terminal handling code a decade ago (does not do all of the above, but is still more than fast enough).