I develop kaldi-active-grammar [0]. The Kaldi engine itself is state of the art and open source, but is focused on research rather than usability. My project has a simple interface and comes with a pretty good open source speech model.
However, kaldi-active-grammar specializes in real time command and control, with advanced features that don't really apply to your use case. Vosk [1] is probably a simpler, better fit for you. It likewise uses Kaldi and can use my models, and offers some others of its own as well.
Neither are particularly focused on transcription per se, but they are open.
[0] https://github.com/daanzu/kaldi-active-grammar
[1] https://github.com/alphacep/vosk-api