Yeah, it can help a bit with looping, but introduces other problems. I recalled from earlier that a combo of tweaking no_speech_threshold and logprob_threshold settings helped somewhat, though trying again on a random video it doesn't do much. Still hallucinates a stream of captions (albeit non-repetitive, though one run had several Touhou related lines) for what should be 4 minutes of looping background music before the first sentence. If all one needs Whisper for is transcribing English though, I still think it's pretty decent. On my test video now it will 'correctly' transcribe the music as ♪ when I ask it to just transcribe it as English.