One cheap trick to overcome this uncanny valley may be to actually use two separate LLMs or two separate contexts / channels to generate the conversations and take "turns" to generate the followup responses and even interruptions if warranted.
Might mimic a human conversation more closely.