undefined | Better HN

0 pointsdormando7y ago0 comments

I characterize it as slow because I know it can saturate the packet rate AWS gives it with software memcached. If the packet rate were much higher then you might win out.

The only reason why you can claim 9x latency is because you've saturated the worker threads. You should still win on latency even if it were properly bottlenecking on the network, but 9x throughput and 9x latency is completely false as a capacity limit in this test.

The other issue is 100 bytes isn't typical. It's common but almost every user has a varied workload. Deploying FPGA's for the larger cache values ends up being a waste. I designed a new storage system based off of offloading larger cold keys to flash, even.

0 comments

andrewcanis7y ago

Do you have a source showing elasticache running faster than this? For example, Redis labs was only able to achieve 10M req/sec by using 6 m4.16xlarge instances which are double the price of the CPU instance we used: https://dzone.com/articles/10m-opssec-1msec-latency-with-onl...

100-500 byte values are the majority of requests at companies like Facebook and Lyft for their key value clusters. For large value sizes the network interface becomes the bottleneck so FPGAs won’t be able to help.

dormandoOP7y ago

I wrote the current version of OSS memcached. I don't know how elasticache is configured, but as I said memcached itself can definitely saturate the network from that instance. Either the version they run is too old or it's misconfigured. If I were to compare a "custom FPGA caching service" vs something memcached like, I would take the same 4xlarge instance and just run memcached on it.

On a large enough machine I've gotten it up past 55 million read ops/sec. It's quite good at read throughput.

I'm also familiar with the cache clusters at major companies.

1 more reply

j / k navigate · click thread line to collapse