On write amplification: a factor of 3x is not actually that unusual. The default RocksDB size amplification is 2x, and I've seen performant LSM trees with about 3x write amplification.
On the single threaded bottleneck: this is an inherent issue you have when you put your database over a network connection. LMDB can do 10k/100k+ reads/sec on a single thread since it's just doing syscalls. As soon as you start to need to distribute your database across more than 1 machine you start to need to parallelize you work for high throughput.