One of our bottlenecks was really big keys (>1MB) being written and read too often and effectively stealing all the CPU time. That was fixed in the app by reducing the number of operations + compression. Threaded IO will give us a little more room.
Other use case involves LUA scripts that operate on 5 different keys, so cluster/proxy is out of question.