I would still prefer the features in text form, in the chat GUI. Right now chatGPT doesnt seem to have options to lengthen parts of the text response, to change it etc. Perplexity and gemini do seem to get the gui right. Voice chat is fun for demos but won't catch much, just like all the predecessors. Perhaps an advanced version of this could be used as a student tutor however
I am guessing text chat will be improved in all multimodal models because they have a broader base of data for pre-training. Benchmarks seem to show 4o slightly exceeding 4 (despite being a smaller model, or at least more parallelizable)