So yeah, the good-display screen and the waveshare ones
I think are the exact same hardware, not really sure 100% on that. I wasn't really using an API, I just used the SPI interface directly and implemented my own API on top (in fact I made a simple TLV protocol over USB to drive the RP2040 zero, which then performs the right SPI magic to drive the chip in the display which then.... the number of processors in this setup seems excessive)
Effectively I had to piece together some clues. There was the manual for the screen which gave the definition of the SPI protocol in use, and the commands you could send, what they did, screen startup sequence etc. They mentioned waveforms and modes a bit but it was frustratingly incomplete and it looked like there were gaps in the commands described. Then there was a youtube video of a guy showing how he'd implemented fast refresh on some other model of screen by uploading custom waveforms, who mentioned looking for waveforms in sample code and for chip specifications. This lead me to the original docs for the driver chip on my display (an ultrachip UC8179) which had the missing commands specified. Using these along with the waveform (LUT) definitions supplied by the vendor when I asked them for some, allowed me to make the fast mode.
I presume some commands were missed out of the vendor document as a lot of them were non-essential, and possibly could cause harm(?). Certainly the wrong waveforms could lead to poor performance or no performance. Also the chip spec document didn't give some of the higher level "Here is the startup sequence for your screen" flow chart niceties of the vendor doc.
(The waveforms are literally describing how the screen should apply voltages to transition a pixel from one colour to another. The default ones that the screen ships with are good for a 'full' refresh of the pixel, but you can get away with a lot fewer, faster voltage switches for B/W, at the cost of quality declining over time, which you reset with a periodic 'full' refresh)