Thanks :)
Agreed, the limiting factor has been diarization (generating the "who speaks when" data) speed. But the diarization backend of this app that I developed can now process 1 hour of audio in ~8 seconds on a M3 Mac. So that's more or less a solved problem now (at least on Mac), just UI work remains.