> The project returns with significant restructuring of the toolset and Travis CI integration. Fierce battles raged between the Compiled Empire and the Dynamic Rebellion and many requests died to bring us this data. Yes, there is some comic relief, but do not fear—the only jar-jars here are Java
Brilliant!
Also, by its nature EC2 should be terrible for serious benchmarking, since you have no control whatsoever about the infrastructure.
That's why the bare metal results are the only relevant ones. The playing field is fair and stable. Let the battle begin ;-)
I expect the cause is that too much code is being loaded for every request. PHP tears down and rebuilds the world for every request, and the popular frameworks load a lot of code and instantiate a lot of objects for every request.
Very few do. Just look at Symfony's complexity , the devs managed to shove proxies into the IoC container to "lazy load services" so you could inject them into the controller class constructors without actually instantiating them until you call a route handler's controller instance method... Using complex class hierarchies in PHP has a big cost, while "raw PHP" barely executes the underlying C code. To be fair people using heavy frameworks come for the battery included first, not really for the speed of the router. And since devs want more batteries, it's unlikely it will get faster.
Why this is the status quo is something I also question.
And because of the share-nothing architecture it scales like a mofo before we have to start jumping through hoops (and 99% of us never reach that scale).
The lack of raw speed is a minor inconvenience that's easily fixed in practice.
Also, I doubt big PHP frameworks are the only ones with a disadvantage that only shows up in these kind of benchmarks. It's like testing cars solely for straight line speed.
[1] https://github.com/TechEmpower/FrameworkBenchmarks/pull/1510
* Impressive Dart
* JRuby > MRI (I'd like to see JRuby 9k)
* Padrino that offers basically everything that Rails does performs impressively well. [Shameless Plug]
That said, it's still a huge step up from the 1.7 series, and once the team starts knocking out performance problems it should be pretty magnificent.
Does anyone has experience with nim web stack? Is it ready for prime time? How much effort is required to create a simple CRUD json api?
On a sidenote I am really looking forward to comparing rust and elixir results in the next round of benchmarks.
I recommend you to read the nim code used for the benchmark [1] [2] [3] [4], and if you like what you see, to start with the Nim tutorial [5].
[0]: https://github.com/transfuturist/outlet
[1]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast... [2]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast... [3]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast... [4]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
https://github.com/TechEmpower/TFB-Round-10/blob/master/peak...
What am I missing?
The test passes in the preview runs, in the TechEmpower continuous integration tests and in the EC2 tests so it's probably some transient error that only occurred in the final bare metal test. Maybe there's a race condition in the Play 2 test scripts which only shows up sometimes.
I've spent a fair bit of time maintaining the Play 2 benchmark tests so it's very frustrating to get no result on the final test. Oh well!
Though I didn't think to check the classpath when I was poking around the TechEmpower github repo. I wonder if a logback.xml slipped in somewhere that's siphoning off stderr to some unknown destination?
https://github.com/TechEmpower/TFB-Round-10/blob/master/peak...
https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
Here's the corresponding error from the log in Play[2]. Not sure what else it could be...
[1] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
[2] https://github.com/playframework/playframework/blob/2.2.x/fr...
Consistently orders of magnitude faster than everything else out there in the Python landscape.
That's why benchmarks like these have to be scrutinized. I like looking them over but in reality, they're not Apples to Apples.
Mono is an implementation of the .NET runtime and framework.
1) Statistics
2) Running Overhead
3) Travis-CI
4) Memory/Bandwidth/Other info
5) Windows
6) IRC
7) Ease of Contributing
1) Currently, the TFB results are not statistically sound in any sense - for each round you're looking at one data point. EC2 has higher variability in performance, so that one data point is worth less than the bare metal data point. Re-running this round, I would expect to see at least 5 to 10% difference for each framework permutation. See point (2) to understand why we're not yet running 30 iterations and averaging (or something similar)2) Running a benchmark "round" takes >24 hours, and still (sadly) a nontrivial amount of manpower. It's currently really tough to do lots of previews before an official round, and therefore tough to let framework contributors "optimize" their frameworks iteratively. I'm working on continuous benchmarking over at https://github.com/hamiltont/webjuice - it's a bit early for PRs, but open an issue if you want to chat
3) As you can imagine, our resource usage on Travis-CI is much higher than other open source projects. They have been nothing but amazing, and even reached out to chat about mutual solutions to potentially reduce our usage. Really great team
4) We do record a lot of this using the dstat tool. dstat outputs a huge amount of data, and no one has sent in a PR to help us aggregate that data into something easy to visualize. If you want this info, it's available in the results github in raw form.
5) Sadly windows support is struggling at the minute. We need something setup like Travis-CI but for our Windows system. CUrrently windows PRs have to be manually tested, and few of the contributors have either a) time to do it manually in a responsive manner or b) windows setups (a few do, but many of us dont). Any takers to help set something up? FYI, we have put a ton of work into keeping Mono support just so we can at least test that changes to C# tests at least run and pass verification, but naturally that isn't as nice as really having native windows support
6) join us on freenode at #techempower-fwbm - it's really fun meeting the brilliant people behind the frameworks
7) If I had to pick one big thing that's happened in between R9 and R10, it would be the drastically reduced barrier to entry. Running these benchmarks requires configuring three computers, which is much harder than something like pip install. Adding vagrant support that can setup a development environment in one command, or deploy to a benchmarking-ready AWS EC2 environment, has really reduced the barrier to getting involved. Adding Travis-CI made it better - it will automatically verify that your changes check out! Adding documentation at https://frameworkbenchmarks.readthedocs.org/en/latest/Projec... made it even easier. Having a stable IRC community is even better! Tons of changes have added to mean that it's now easier than ever for someone to get involved
EDIT: Actually, let me just link everyone to github:
Here are the windows compatibility issues - https://github.com/TechEmpower/FrameworkBenchmarks/issues?q=...
Here is the specific issue asking for advice on what CI we should use to support windows: https://github.com/TechEmpower/FrameworkBenchmarks/issues/10...
They are also single node which is great if you're entire system is only going to ever need one machine's worth of capacity (Eg: vertical scaling)
I've noticed a weird trend where amazon created various slices of instance types a long time ago, and people have mentally gotten used to using larger ones far slower than moores law adds cores. So people will refer to something with 2 cores as "middle-of-the-road" and 32 as "extremely high-end" when in my brain thats "a cell phone" and "a 2 year old server".
This project gives you RPS/latency metrics for many frameworks, on a few hardware setups. This enables a rough comparison of "how does my framework perform relative to all these other well-known or established frameworks". Naturally, the comparison is not perfect - there are a ton of reasons that measuring just requests/sec and latency doesn't allow complete comparison between two frameworks. However, once you accept that it is basically impossible to fully compare any two frameworks using just quantitative methods and these numbers should inform your choice of framework (instead of totally control your choice of framework), we can talk about why it's valuable.
Want to run a low-cost server in language X that you happen to love? This project can provide guidance about which frameworks written in language X are performing the best. Want to ensure your service can support 50k requests per second without loosing latency? This project can provide latency numbers for you to examine that let you know which frameworks appear to maintain acceptable latency even under high load.
If you wanted to, you could re-create this project by running ab against 100+ frameworks - that's the cornerstone of what is happening here. Granted, we currently use https://github.com/wg/wrk instead of ab, but the principle is the same - start up framework, run load generation, capture result data. Most of the codebase is dedicated to ensuring that these 100+ frameworks don't interfere with each other, setting up pseudo-production environments with separate server/database/load generation servers, and other concerns that have to be addressed.
Over time, this project has started collect more statistics than just requests/second and latency, which makes it more valuable than just running ab. As more metrics are added and more frameworks are added, this becomes a really valuable project for understanding how frameworks perform relative to one another.