VHS is a composite signal on the tape itself. Composite for sake of this thread means black and white detail plus color information. S-VHS has higher bandwidth in the luma (detail) but the same limited color bandwidth. And there are two audio standards. But there is no "RGB out."
At a minimum you want a device with S-Video out (it keeps the two signals on separate wires). You also need a time base corrector. These come in two forms. One is line-based sometimes built into DVD players. This is how Jason Scott at Internet Archive does it and it's wrong.
The other form of corrector requires a separate box and corrects each frame in full. Many boxes claim to be time base correctors but are not. They are "synchronizers" or amps. Don't buy until you understand the differences.
There are two time sources (not really clocks) in a VCR. The first is physical tape wobbling and stretching over a head that's spinning far faster than seems possible. Line TBC is a tiny buffer that reconstructs the sync of the luma on each line.
The other timing source is the overall signal sync. A proper TBC reconstructs this overall sync on a frame by frame basis and presents something sane to the capture card. Without it you'll drop frames silently, audio falls out of sync, and all the other crap that happens when you try to watch video older than an iPhone. Consumer video capture is total crap and you won't see it until you try to encode, edit, or watch it on a different device. And then you'll be very confused working back to the original problem.
But follow this careful path where you actually capture a clean, proper signal and feed it into even the cheapest Blackmagic box and you're good.
ChatGPT will walk you through this and seems to know more about proper ffmpeg settings than the developers themselves or 30,000 conflicting StackOverflow messages on the topic.