Comparing Serverless Performance for CPU Bound Tasks (opens in new tab)

(blog.cloudflare.com)

127 pointsbosdev7y ago62 comments

62 comments

Each and every one of these posts from Cloudflare is direct targeting and completely biased. Should be noted that their response times are indicative of them being far fewer hops away. Unless they can run in the same DC, or even make the RTT fair, their 'Webpage response' metric is utterly useless.

Notice how they admit that they don't know how lambda really works. They switch between lambda@edge and Region-based lambdas, and don't seem to be able to be consistent with it.

Java Lambdas have horrible cold start times, and I'm not seeing any of this reflected anywhere in their report.

> Our Lambda is deployed to with the default 128MB of memory behind an API Gateway in us-east-1

Well duh the lambda is slower; it's going through API Gateway, and that does authentication processing as well.

All in all, these blog posts from Cloudflare are turning me off from them entirely, because they aren't even saying 'yeah, AWS got us beat in this case here.'

zackbloom7y ago

Hi, I'm the author of the post, thanks for sharing your thoughts!

You're absolutely right we don't know how Lambda works. We have read what we could find that's publicly available, and done a bunch of testing, but Amazon doesn't share all that much about their architecture.

I agree that the cold-start times of Lambda are slow, particularly with languages like Java and with VPCs. My plan at the moment is to write a blog post focused on cold-start times specifically, when I can figure out how to accurately test that around the world.

I'm not entirely sure why API Gateway would add hundreds of ms of latency. We also do authentication processing with our Access product, for example, and it certainly doesn't add anywhere near that. I also don't have any of those API Gateway features enabled to begin with. If you would very much like it, I'm happy to test a Lambda by hitting the Invoke API directly, but I doubt you'll see much of a difference. As the post says, Lambda is granting us a much smaller quantity of CPU time, there's not much you can do to get around that.

I apologize if the transitions between global and region-specific tests are unclear. The majority of the tests are being done from DC, specifically to focus the comparison around execution time, not global latency. I did my best in the post to specify where I was running the test from. If you have an idea of how that can be better expressed please share and I'll do my best to incorporate it in the future.

cremp7y ago

Thanks for being forthcoming; and I do appreciate the post. I'm partially venting, and I've been using AWS for a while. Vendor lock-in can be a pain in the rear sometimes.

A lot of the gripe I've got with these posts is that they seem somewhat incomplete. Hard to say things if you've only been messing with it for a week, and don't have a lot of the ins-and-outs of the services. That being said, I've spent at least 2 years diving into lambda and weird issues with it.

I'm not an employee of Amazon, and so my understanding can be off-base as well.

Lambdas are just managed EC2 instances. Each lambda code is stored in an AWS controlled S3 bucket; and on initial execution (cold start) pulled down, and run in their own chroot jail. I've dealt with java lambdas the most, and I can say that they take your zip, and run the jail inside the zip. They keep your java process open, and just call the handler on each call (warm start.) Each concurrent call can start another jail on another managed instance; getting the cold start time again.

You can get cold starts by uploading a new zip, or changing any of the lambda compute parameters.

The Golang works in a similar way, jails the zip, and keeps the program running, calling the handler as invokes come in.

I haven't done the python, node, or .NET enough to know if those are the same principles; I'd assume they are.

Interestingly, API Gateway really is just Cloudfront. Cloudfront is just the AWS managed API Gateway.

1 more reply

sreque7y ago

Assuming the author's tests are single-thread, I'm pretty sure 1024 MB doesn't give you a full CPU core on Lambda. I could be wrong though; I haven't payed attention to Lambda in a long time. Last I remember it was 1.5 GB that gave you a full core. This alone makes the comparison between a mid-range server and Lambda unfair, not to mention the differences between language runtimes.

That said, if you are using Lambda and expecting to not pay extra you have somehow been mislead. Lambda is definitely more expensive per cycle than managing your own instances, and I doubt that will change any time soon.

hobls7y ago

Anyone using Lambda should _absolutely_ do load testing with different memory configurations. You will get different results, and should analyze what is best for your application.

When calculating the overall cost of managing your own instances you should also include time spent by your engineering team. There are particular tipping points in terms of overall requests per second at which point you'd save money by moving from Lambda to something like Fargate, and then even farther above that, you're better off using EC2. And then even above that, you should be running your own instances in a colo space. (And then at some point you should probably be building your own datacenters, and then at some point you should start colonizing the moon, and then... you get the idea.)

chx7y ago

> And then even above that, you should be running your own instances in a colo space

Why are people jumping from EC2 to colo and skipping dedicated servers? Mystery of my life. We were running the 75th largest site in the US some years ago (as measured by Quantcast), ran the numbers and colo was ridiculously expensive and way more troublesome.

2 more replies

philipodonnell7y ago

Are there any tools that allow you to load test Lambda services with different memory configurations?

2 more replies

zlynx7y ago

Has anyone done research on cooling datacenters on the Moon?

NathanKP7y ago

It's not quite that straightforward. Lambda is more expensive per cycle if you are capable of keeping an instance fully utilized and getting every cycle out of that machine. After all if your Node.js or Go code takes say 20ms per request and can handle many concurrent requests you can squeeze a LOT of requests per second from a single instance.

But many workloads don't have that high of a request volume and can't actually make full use of an instance. If you have a small API or service that gets one or two requests every few seconds then paying for a 100ms chunk of Lambda execution time every couple seconds is going to be much cheaper than reserving an entire instance and then not being able to get good utilization out of it.

The tipping point is whether or not you have enough workload volume to keep an instance busy at all times. So for example password hashing in the article above. Because password hashing is deliberately CPU intensive it is very easy to keep an EC2 instance busy with even a low request volume. For a good hashing algorithm with lots of rounds its not uncommon to only get 10 authentications per second per core, because the algorithm is deliberately designed to be CPU heavy. So if you process more than 10 auths/sec then its probably cheaper to put the workload in a container that runs on an instance because you can keep that instance busy.

But if the same service is only handling one or two password hashes every minute, then you can save money by only paying for 100ms increments when an auth request arrives, and stop paying when there is nothing to do.

sfeng7y ago

How does that compare to the cost of cloudflare workers?

1 more reply

eximius7y ago

Is it just me, or does anyone else find the documentation of AWS and related services nearly incomprehensible? Maybe it's just too 'enterprise-y' and I haven't spent enough time in that environment, but it feels like all the information is squirreled away in 10,000 different pages and that I'd have to read all of it to just get the basics.

Also, does anyone know if there is an API for AWS to dynamically create, load, and launch EC2 and/or Lambda instances (i.e., boto - though I'm open to suggestions for something else) AND, preferably, have separate billing for each thing? Do I need multiple accounts to do separate billing? Something about IAM roles...?

deklerk7y ago

YES. Absolutely yes. I _hated_ this aspect of working with AWS. Also, documentation always came off as "generic" and "wordy" without being "useful" - a huge, huge article that might have the tiny sliver you need in it, or perhaps one of its dozens of sister articles would have the answer, who knows.

lainga7y ago

You're only supposed to grab the tiny sliver so you can mention it to $consultant in desperation.

PretzelFisch7y ago

I find the documentation good and some frustratingly vague depending on the service. In AWS everything can be done via code including starting and launching EC2/lambda. You'll need to look in their sdk documentation and experiment a little to figure out what everything means and in what order you need to make the api requests.

djhworld7y ago

I remember a few years ago we tried to implement a scheduled Lambda that needed to download a bunch of files from an S3 prefix, perform some aggregation on the data and then write the result to a database.

Our EC2 prototype of this on one of the m3 class instances could do the work in about 2 minutes which seemed a perfect opportunity to port to Lambda.

Even on the top memory instance at the time (1536mb), the job just couldn't finish, timing out after 5 minutes. The code was multi threaded, to parallelise the downloads, but not matter how much we tweaked this the Lambda would just never complete in time.

As you don't have visibility of the internal we didn't know whether this was due to CPU constraints (decompressing lots of GZIP streams), network saturation (downloading files from S3) or what.

In the end we gave up. Didn't have the time or resource to keep digging, and just pinned the problem on the use case we were trying to fit was against what Lamba is designed for

Not saying this is an indictment of Lambda, we use it in lots of places, with a lot of critical path code (ETL Pipelines).

alanning7y ago

We’ve found lambda’s x-ray feature to be very helpful wrt finding the source of slowdowns. I know it wasn’t available during the project you were writing about but wanted to mention it for others.

gleenn7y ago

I thought the use case for things like Lambda were more along the lines of rarely used web requests that you'd save money on by not running a full box. I do remember them being slow too.

djhworld7y ago

Nah, I think the scope is wider then that.

In my case we use lambda to perform ETL based on S3 events, so when a file drops into S3, Lambda is invoked to process it.

That works very well for us and is cheaper than running a box 24x7, as the file drops arrive sprodically throughout the day and Lambda can scale to meet the demand.

RhodesianHunter7y ago

If your job is easily parallelizable then you can run multiple lambdas in parallel. For the above use case they probably should have kicked off one lambda per prefix or similar.

1 more reply

tetha7y ago

This is a fight I currently have with our dev-team atm. Situational awareness is a real concern, especially if the code is misbehaving in productive loads. If a solution doesn't give us situational awareness if things go wrong, I object to that solution.

wolf550e7y ago

I'll copy from Twitter[1]:

@zackbloom @jgrahamc I can't find it in the docs on AWS site, but I've read that AWS Lambda scales CPU linearly until 1.5GB, then gives you 2nd thread/core and again scales linearly until 3GB. If your PBKDF2 was single threaded, Lambda bigger than 1.5GB is wasted.

11:12 AM - 9 Jul 2018

reply by blog post author[2]:

Replying to @ZTarantov @Cloudflare @jgrahamc I can't think of a way to test that within the Node code. The only option seems to be to update the C++ version (or some other language) to use multiple threads.

5:16 PM - 9 Jul 2018

1 - https://twitter.com/ZTarantov/status/1016384547364229120

2 - https://twitter.com/zackbloom/status/1016476314864312321

Dunedan7y ago

Yes, Lambda functions use a second core above 1536MB of memory. Back in the past they had it in their documentation, but removed it at a certain point. Also see: https://stackoverflow.com/questions/34135359/whats-the-maxim...

manigandham7y ago

It might not be true any longer if they removed it from the documentation.

handruin7y ago

I've recently been exploring AWS Lambda in a stack which contained API Gateway + Python Flask under Lambda for a task I was working on. I deployed it using Zappa and its purpose was to be a simple REST frontend for transferring files to S3.

After experimenting with uploads from Lambda to S3 I was noticing that the time to upload a tiny 4MB file changed dramatically when I reconfigured the Lambda function's memory size. At 500MB it took 16 seconds to upload the file which is pretty slow. Once I got past roughly 1500MB of memory, the performance no longer improved and the best I could get was about 8 seconds for the same payload.

None of my tests were controlled or rigorous in any way so take them with a grain of salt...they were just surprising to me that the speed changed dramatically with memory size allocation. I'm new to Lambda so I wasn't ware that memory size is tied to other resource performance. I'm curious if this goes beyond CPU and also changes network bandwidth/performance? The Lambda I deployed did not write data to the temp location that is provided, it streamed directly to S3.

I've since moved on from this implementation and now my Lambda function performs a much simpler task of generating pre-signed S3 URLs. I have noticed something else about Lambda that bothers me a little. If my function remains idle for some period of time and then I invoke it, the amount of time it takes to execute is around 800ms-1000ms. If I perform numerous calls right after, I get billed the minimum of 100ms because the execution time is under that. The part that bothers me is I'm being charged a one-time cost that's about 8x-10x the normal amount because my function has gone idle and cold. I'll have to continue reading to see if this is expected. It's not a huge amount in terms of cost but surprising that I'm paying for AWS to wake up from whatever state it is in.

thinkmassive7y ago

This is why people using Lambda at scale are concerned with keeping the containers “warm” https://aws.amazon.com/blogs/compute/container-reuse-in-lamb...

alanning7y ago

Keep in mind that if you have any kind of fanout at scale, keeping a few lambda instances “warm” probably won’t improve your throughout much.

Update: found a nice article with metrics re: lambda-backed api gateway but the premise applies to any fan-out.

https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l...

zackbloom7y ago

It's also worth pointing out that if your Lambda is in a VPC its cold start time can be over 10s.

lucb1e7y ago

This headline is weird. I thought it was going to be about doing computations client side since it says "serverless", but what they mean is "without a dedicated instance running all the time" (about halfway through the article, I figured out what "lambdas" are in this context).

So if there goes so much effort into calculating costs for PBKDF2 on servers (ahem, "serverless"), why not move it to the client side? I like client side hashing a lot because it transparently shows what security you apply, and any passive or after-the-fact attacks (think 1024 bit encryption decryption which will slowly move from 'impossible for small governments' to 'just very slow' soon) are instantly mitigated. The server should still apply a single round of their favorite hash function (like SHA-2) with a secret value, so an attacker will not be able to log in with stolen database credentials.

But that's probably too cheap and transparent when you can also do it with a Lambda™.

kentonv7y ago

"Serverless" is a recent industry buzzword which roughly means: "Server hosting environment where you upload code representing some sort of event handler and let the host decide where and when to run it. You are billed per event rather than per server instance."

This article is comparing the raw CPU power provided by two different serverless products. PBKDF2 is used only as an example of a computation requiring a lot of CPU.

lucb1e7y ago

> PBKDF2 is used only as an example of a computation requiring a lot of CPU.

Oh wow, I completely missed the point here. Having worked on strong client-side hashing in browsers and being into crypto generally, I saw this problem being presented and completely mistook it. Thanks!

com2kid7y ago

I'd love to see an honest comparison across other providers, throwing in Google's Firebase Functions and Azure Cloud Functions.

chrisco2557y ago

Here's a good comparison of Lambda vs Azure Functions performance and scaling up to 400 concurrent requests: https://www.azurefromthetrenches.com/azure-functions-vs-aws-...

doczoidberg7y ago

"Since I published this piece Microsoft have made significant improvements to HTTP scaling on Azure Functions and the below is out of date. Please see this post for a revised comparison. https://www.azurefromthetrenches.com/azure-functions-signifi...

1 more reply

sudhirj7y ago

zackbloom you’ve made your point already, but remember that these posts represent a moving target. AWS could crush CF performance on pretty much all these numbers with a few configuration changes, which they might well do. And you’re not acknowledging the rest of the Lambda moat, like SQS integration, free S3 bandwidth etc.

Workers has a clear advantage over Lambda@Edge, but not because of the current resource configuration differences across the two products - the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices. Harp on that, most of what you’re talking about now will likely be invalidated by the next reinvent, and they’ll make it a point to tell the world.

kentonv7y ago

> AWS could crush CF performance on pretty much all these numbers with a few configuration change

Eh? I can see how they could match Workers' raw CPU throughput by simply turning off throttling. But how would they "crush" it? And how can they easily improve other performance measures like network latency, cold start time, or deploy time? Honestly curious what you're getting at here.

> the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

Thanks for the kind words.

sudhirj7y ago

Because the pure geographic latency numbers are quite comparable. Workers latency when compared to Lambda@Edge/CloudFront isn't that different, and is also a moving target because both of you are adding locations all over the world continuously and buying more and more local racks. There's no clear winner here.

Cold start time is fixable, there's already been large improvements, and this is more a VPC problem. Easiest way would be to overprovision aggressively / make sure Lambdas tied to APIGW always have lambda running, or ever use some variation of ML predictions to keep things warm. But again, this isn't comparable - Lambda is a truck compared to Workers/Lambda@Edge being bikes. Parallel scalability is more important there than speed. There are enough ways to keep a few warm ones ready.

Deploy time is really download time from S3, AWS could cache more aggressively on the local cloudfront caches. I'm not seeing deployment time as being a big factor, though.

By "crush" I mean make claims about performance irrelevant. To claim that AWS cannot equalize performance between Lambda@Edge and Workers doesn't make sense, they can. And they can improve Lambda price-performance as well, and are already doing so. I'm saying this cannot and is not the Workers USP - no one in the AWS ecosystem is going to jump to Workers based on this because it lacks the rest of the AWS ecosystem.

> > the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

That's really the big differentiator for Workers. I think you should blow that trumpet a lot more. If you only publicize performance numbers, what happens to the Workers story when that advantage is lost?

> Thanks for the kind words.

You made a good decision and built something great, you're welcome.

penagwin7y ago

Hello, I know you're the tech lead for the web workers at cloudflare so pardon my ignorance if I'm wrong.

At least for pure computer speed, I think he means that if you (cloudflare) and AWS got into an arms race in terms of allocated CPU/Memory to the webworkers/lambda, they have more raw resources to do so. They also have a global presence, not necessarily to the degree that you do obviously.

I highly doubt they would do this, and I think you have the superior product. I'm just a student/hobbyist so I admittedly don't have a ton of experience. I'm very biased towards CF, you guys are great! :D

1 more reply

CupOfJava7y ago

The x-axis is percentile of requests with that latency or lower. You need to read the article to figure that out. Label all your axes!

dsl7y ago

The question is, what does Amazon know now, that Cloudflare will figure out in a year or so?

ryanworl7y ago

Hopefully Amazon will learn from Cloudflare that your JavaScript runtime can use V8 directly instead of running Node in a container per instance, and then Lambda@Edge can be cheaper!

microcolonel7y ago

Good work at CloudFlare. I personally figure that Amazon ought to be doing more interesting things with Lambda, like maybe starting the workers from a memory snapshot.

cremp7y ago

They do. After the cold start, things are just run in memory.

Java for example will keep your static variables and such in memory, and keep /tmp until you haven't called the lambda for a while.

With GoLang, they call start your program, and just call the handler method as required.

j / k navigate · click thread line to collapse

62 comments

cremp7y ago

Notice how they admit that they don't know how lambda really works. They switch between lambda@edge and Region-based lambdas, and don't seem to be able to be consistent with it.

Java Lambdas have horrible cold start times, and I'm not seeing any of this reflected anywhere in their report.

> Our Lambda is deployed to with the default 128MB of memory behind an API Gateway in us-east-1

Well duh the lambda is slower; it's going through API Gateway, and that does authentication processing as well.

All in all, these blog posts from Cloudflare are turning me off from them entirely, because they aren't even saying 'yeah, AWS got us beat in this case here.'

zackbloom7y ago

Hi, I'm the author of the post, thanks for sharing your thoughts!

cremp7y ago

Thanks for being forthcoming; and I do appreciate the post. I'm partially venting, and I've been using AWS for a while. Vendor lock-in can be a pain in the rear sometimes.

I'm not an employee of Amazon, and so my understanding can be off-base as well.

You can get cold starts by uploading a new zip, or changing any of the lambda compute parameters.

The Golang works in a similar way, jails the zip, and keeps the program running, calling the handler as invokes come in.

I haven't done the python, node, or .NET enough to know if those are the same principles; I'd assume they are.

Interestingly, API Gateway really is just Cloudfront. Cloudfront is just the AWS managed API Gateway.

1 more reply

sreque7y ago

hobls7y ago

Anyone using Lambda should _absolutely_ do load testing with different memory configurations. You will get different results, and should analyze what is best for your application.

chx7y ago

> And then even above that, you should be running your own instances in a colo space

2 more replies

philipodonnell7y ago

Are there any tools that allow you to load test Lambda services with different memory configurations?

2 more replies

zlynx7y ago

Has anyone done research on cooling datacenters on the Moon?

NathanKP7y ago

sfeng7y ago

How does that compare to the cost of cloudflare workers?

1 more reply

eximius7y ago

deklerk7y ago

lainga7y ago

You're only supposed to grab the tiny sliver so you can mention it to $consultant in desperation.

PretzelFisch7y ago

djhworld7y ago

Our EC2 prototype of this on one of the m3 class instances could do the work in about 2 minutes which seemed a perfect opportunity to port to Lambda.

As you don't have visibility of the internal we didn't know whether this was due to CPU constraints (decompressing lots of GZIP streams), network saturation (downloading files from S3) or what.

In the end we gave up. Didn't have the time or resource to keep digging, and just pinned the problem on the use case we were trying to fit was against what Lamba is designed for

Not saying this is an indictment of Lambda, we use it in lots of places, with a lot of critical path code (ETL Pipelines).

alanning7y ago

gleenn7y ago

I thought the use case for things like Lambda were more along the lines of rarely used web requests that you'd save money on by not running a full box. I do remember them being slow too.

djhworld7y ago

Nah, I think the scope is wider then that.

In my case we use lambda to perform ETL based on S3 events, so when a file drops into S3, Lambda is invoked to process it.

That works very well for us and is cheaper than running a box 24x7, as the file drops arrive sprodically throughout the day and Lambda can scale to meet the demand.

RhodesianHunter7y ago

If your job is easily parallelizable then you can run multiple lambdas in parallel. For the above use case they probably should have kicked off one lambda per prefix or similar.

1 more reply

tetha7y ago

wolf550e7y ago

I'll copy from Twitter[1]:

11:12 AM - 9 Jul 2018

reply by blog post author[2]:

5:16 PM - 9 Jul 2018

1 - https://twitter.com/ZTarantov/status/1016384547364229120

2 - https://twitter.com/zackbloom/status/1016476314864312321

Dunedan7y ago

manigandham7y ago

It might not be true any longer if they removed it from the documentation.

handruin7y ago

thinkmassive7y ago

This is why people using Lambda at scale are concerned with keeping the containers “warm” https://aws.amazon.com/blogs/compute/container-reuse-in-lamb...

alanning7y ago

Keep in mind that if you have any kind of fanout at scale, keeping a few lambda instances “warm” probably won’t improve your throughout much.

Update: found a nice article with metrics re: lambda-backed api gateway but the premise applies to any fan-out.

https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l...

zackbloom7y ago

It's also worth pointing out that if your Lambda is in a VPC its cold start time can be over 10s.

lucb1e7y ago

But that's probably too cheap and transparent when you can also do it with a Lambda™.

kentonv7y ago

This article is comparing the raw CPU power provided by two different serverless products. PBKDF2 is used only as an example of a computation requiring a lot of CPU.

lucb1e7y ago

> PBKDF2 is used only as an example of a computation requiring a lot of CPU.

com2kid7y ago

I'd love to see an honest comparison across other providers, throwing in Google's Firebase Functions and Azure Cloud Functions.

chrisco2557y ago

Here's a good comparison of Lambda vs Azure Functions performance and scaling up to 400 concurrent requests: https://www.azurefromthetrenches.com/azure-functions-vs-aws-...

doczoidberg7y ago

1 more reply

sudhirj7y ago

kentonv7y ago

> AWS could crush CF performance on pretty much all these numbers with a few configuration change

> the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

Thanks for the kind words.

sudhirj7y ago

Deploy time is really download time from S3, AWS could cache more aggressively on the local cloudfront caches. I'm not seeing deployment time as being a big factor, though.

> > the advantage is your choice of V8 and adoption of the Service Worker API standard, which brilliantly outshines the L@Edge API choices.

> Thanks for the kind words.

You made a good decision and built something great, you're welcome.

penagwin7y ago

Hello, I know you're the tech lead for the web workers at cloudflare so pardon my ignorance if I'm wrong.

1 more reply

CupOfJava7y ago

The x-axis is percentile of requests with that latency or lower. You need to read the article to figure that out. Label all your axes!

dsl7y ago

The question is, what does Amazon know now, that Cloudflare will figure out in a year or so?

ryanworl7y ago

Hopefully Amazon will learn from Cloudflare that your JavaScript runtime can use V8 directly instead of running Node in a container per instance, and then Lambda@Edge can be cheaper!

microcolonel7y ago

Good work at CloudFlare. I personally figure that Amazon ought to be doing more interesting things with Lambda, like maybe starting the workers from a memory snapshot.

cremp7y ago

They do. After the cold start, things are just run in memory.

Java for example will keep your static variables and such in memory, and keep /tmp until you haven't called the lambda for a while.

With GoLang, they call start your program, and just call the handler method as required.

j / k navigate · click thread line to collapse