Thanks for the details on how to use your benchmark script and for taking the time to investigate this. I hadn’t heard of your benchmark before and mc-crusher seems to work a bit differently than memtier_benchmark.
First a few significant differences:
1) Your value size is 10B which completely changes the results. Let’s keep the value size at 100B, which is more realistic.
2) The ratio of gets to sets significantly affects the requests per sec. We were assuming 1:1 ratio when we did our measurements. Increasing the percentage of gets really speeds up req/sec. We didn’t observe this effect on elasticache. Is this a recent improvement in the github version of memcached?
3) Your benchmark is using multiple keys in the same get command. What memtier does is pipeline multiple get commands each with one key. This seems more realistic.
4) We pipelined 16 get commands per packet while your configuration had 50 keys per get command.
I was able to reproduce the same setup as we had with ~1.2M req/sec with your mc-crusher benchmark using the following config. This has 1:1 get to set ratio with pipeline 16 and value size 100B.
send=ascii_set,recv=blind_read,conns=50,key_prefix=foobar,key_prealloc=0,pipelines=16,value_size=100
send=ascii_set,recv=blind_read,conns=50,key_prefix=foobar,key_prealloc=0,pipelines=16,value_size=100,thread=1
send=ascii_set,recv=blind_read,conns=50,key_prefix=foobar,key_prealloc=0,pipelines=16,value_size=100,thread=1
send=ascii_set,recv=blind_read,conns=50,key_prefix=foobar,key_prealloc=0,pipelines=16,value_size=100,thread=1
send=ascii_get,recv=blind_read,conns=50,pipelines=16,key_prefix=foobar,key_prealloc=1
send=ascii_get,recv=blind_read,conns=50,pipelines=16,key_prefix=foobar,key_prealloc=1,thread=1
send=ascii_get,recv=blind_read,conns=50,pipelines=16,key_prefix=foobar,key_prealloc=1,thread=1
send=ascii_get,recv=blind_read,conns=50,pipelines=16,key_prefix=foobar,key_prealloc=1,thread=1
send=ascii_get,recv=blind_read,conns=50,pipelines=16,key_prefix=foobar,key_prealloc=1,thread=1
I used the github memcached on an r4.4xlarge. I ran memcache-top on the server instance to measure the requests per second, showing about 750k gets/sec and 600k sets/sec.
With a ratio of 10:1 gets to sets I’m seeing about 3.5M req/sec which seems better than elasticache.