Grammar and context. It'd be closer to dictation than current speech to text, with gpt serving as a "brain" interpreting what you mean in the current context instead of raw input. You could tie in the "natural language to [sql,bash,log parse, regex]" capabilities of gpt-3 and so on.
Obviously it wouldn't be as good as a real person, but it'd be a nice leap to the 95%+ level of accuracy over the 80%ish on high performing commercial STT systems.