"Google probably has the best networking technology on the planet."
How do we quantify this?
"This is important for several reasons. On EC2, if a node has a hardware problem, it will likely mean that you'll need to restart your virtual machine."
I would much rather create a service that can tolerate single node outages than relying on "live migrations". I am not sure what he meant by the SSD comparison, Amazon EBS that can be SSD but still it is a network mounted storage.
"Most of GCP's technology was developed internally and has high standards of reliability and performance."
Guess what AWS was developed for.
I like hand-wavy, articles as much as any other guy, but it seems to me they picked GCP and wrote an article to justify it, an cooked up some numbers with single dimension comparisons to make it look like scientific. I wish I was working on single dimension problems in real life, but it is always more complex than that. I am more interested in worst case scenarios and SLAs than micro-benchmark results when comparing cloud vendors. Discarding Azure was purely arbitrary, in fact, Azure is more than happy running Linux or other non-Windows operating systems, I am not sure where he got the idea of " Linux-second cloud".
https://azure.microsoft.com/en-us/blog/running-freebsd-in-az...
> "Google probably has the best networking technology on the planet." How do we quantify this?
In the article they did a bunch of tests. Quote: GCP does roughly 7x better for the comparison of 4-core machines, but for the largest machine sizes networking performance is roughly equivalent.
There is also https://github.com/GoogleCloudPlatform/PerfKitBenchmarker if you want to benchmark things yourself.
Seriously, try it yourself. I think you will be pleasantly surprised.
> I would much rather create a service that can tolerate single node outages than relying on "live migrations".
Services should tolerate node failure even on GCP, live migration does not really help with that. It's more about reducing ops. With AWS, you have to manually reboot your machines when a infra upgrade happens. With GCP it is automatic.
> I am not sure what he meant by the SSD comparison, Amazon EBS that can be SSD but still it is a network mounted storage.
I'm not too sure what your question is?
> Discarding Azure was purely arbitrary
Agreed, would love to know more about why they didn't consider Azure
1. AWS does explicitly tells you up front that smaller instance sizes come with smaller network throughput. This is well known and well communicated even when you browse the instance offerings. Doing 7x better for a 4 core instance is hardly relevant (depending on the actual CPU type though), being able to saturate your pipe would probably consume much of your CPU time and you could hardly do anything else on the box. You can prove me wrong on this one. Synthetic benchmarks are not really relevant for production use cases.
A good read in the subject: http://www.brendangregg.com/activebenchmarking.html
2. On reducing OPS. You are implying that these OPSy things are not automated. You should ask your SRE co-workers about this one. For running a website this scale, you absolutely need to automate cases when the server is rebooted. Meaning, on shut down it needs to remove itself from the load-balancer or from the resource pool, and when it comes back it has to put itself back. Worst case scenario you can just terminate the instance and let auto-scaling do its job. All of these are completely human attention free operations in most cases, but I do understand that some smaller customers are not so advanced with automation and GCP might be optimizing for those clients.
3. I do not have any question, as I pointed out that in the article the author is talking about EBS while it might appear to the reader that he is talking about some sort of local SSD.
4. Great! I would like to know it too! We should petition together. :)
Every once in a while you'll hear some random spec from one of these companies, and it's always pretty surprising, but Amin's team has achieved 5 Petabit/s in bisectional bandwidth. It's more than surprising.
[1] https://www.youtube.com/watch?v=FaAZAII2x0w [2] https://www.youtube.com/watch?v=n4gOZrUwWmc [3] https://www.youtube.com/watch?v=RffHFIhg5Sc
EC2 being developed for internal use is more myth than fact. The original idea was for internal use, but didn't exist much beyond a "short paper"[1] until it was green-lighted (by Bezos) as an external/sellable service.
[1] https://docs.google.com/presentation/d/1B1jvWWh0ACaDv4ryEzLl...
I know, lots of dimensions for that comparison, but probably picking a few dimensions, or letting users select a few, could give a reasonable ranking of services and prices.
This is a huge advantage. For instance, some of our jobs are computationally-intensive but relatively light on memory. In GCE I can run 32 core machine with 28GB RAM and it will cost me $887.68/month (without any sustained use discounts).
In AWS, the closest option I have is c4.8xlarge (36 cores / 60 GB RAM) which will cost $1,226.10/mo.
And if I need local (ephemeral) storage in AWS, I'm severely limited in instance types I can choose from, while in GCE you can attach local SSD to any instance type, including custom.
If you factor in per-minute billing in GCE and automatic sustained use discounts, we are talking about serious savings without any advance planning (required for using reserved instances).
EC2 still has some advantages - it supports GPU-equipped instances, for example, but for our computational pipelines GCE is a clear winner for now (and Cloud Dataproc is so much nicer than EMR!).
I have a feeling this person never really dove into Azure, and just wrote it off because it had Microsoft services built in; and of course various sysadmins still have a strong bias against Microsoft, especially if they are Open Source advocates. Seems like the entire article is mostly just comparing AWS to GCP instead of giving an actual overview of the cloud landscape, just brushes off every other provider (that's not AWS or GCP) without diving into an actually reason -why.-
Yes I know Azure runs Linux, let me unpack that point: We had run previously on a cloud that wasn't focused on Linux hosting as their flagship OS. The effect we observed was that Linux was a second-class citizen in terms of features and performance. Perhaps its unfair to project that onto Azure, but I think its true that AWS and GCP think about Linux first, and Azure doesn't. Running a company on the cloud means relying on the compute product (GCE/EC2) as the foundation for your infrastructure, so we think this makes a difference.
It would be valuable for a lot of people to see more comprehensive stats across all clouds - I would love to see this personally and I think it would help people make better decisions about cloud infrastructure.
- data center power
- data center networking
- hardware provisioning
I don't think they are better thn GCE or AWS in terms of Linux support (maybe minor things) but they are not significantly worse either. What comes after that, pricing, machine types etc. is a different question. I see lots of companies using Azure because they got free credit for it.
I like Azure Web Apps, Sql, and Storage PaaS offerings, as well as hosted Mongo and similar 3rd party services. In general, my experience is that they are cheaper and better managed than most stuff my customers roll themselves.
I would suggest that any "100% anything" shop look at the PaaS offerings of the various clouds and see if the benefits outweigh the risks.
Bandwidth on GCP (and AWS and most of the other providers) is really, really, really expensive. $0.12 per gigabyte, upwards of $0.19 per gigabyte for Asia. Paying $0.12 for every time you send an Ubuntu ISO is crazy. A bored script kiddie could just run up your bandwidth costs to thousands of dollars just for the hell of it. A DDoS could make you declare bankruptcy.
I have a server with OVH I can theoretically push 100+TB per month through and only pay $100. I get DDoS protection included. It may not be perfect DDoS, but it's not the $6000/mo I'd need to pay for Cloudflare to get the same thing with GCP (I need wildcards), plus the $0.12 per GB for anything not cached by them.
I know from people in the industry that they pay less than a cent per GB. Google, if you want to differentiate your cloud services, start charging better prices for bandwidth and do something about DDoS (project shield should be baked into your offerings). $0.02 would be reasonable and you'll still make a profit. That goes for all the other "great value" cloud services that are actually very expensive for anybody doing work that actually needs bandwidth on the internet.
Up to 10 TB / month - $30/Mbps
Next 350 TB / month - $16.50/Mbps
Traffic within the same Region - $3.50/Mbps
Traffic to another region - $6.50/Mbps
The outbound traffic starts at $45/Mbps in AsiaPac and $85/Mbps in Latin America.
In the US and most of the EU, at >1Gbps (~350TB/mo) volume, transit pricing is well under $1/Mbps. Most of Asia should be under $10/Mbps, and south america is quite a bit higher, but not $70/Mbps.
See: https://www.telegeography.com/press/press-releases/2015/09/0...
http://blog.telegeography.com/bandwidth-and-ip-pricing-trend...
If I had a startup - i'd have a few of these servers as the baseline, then i'd scale up with AWS. I assume i'd be using Kubernetes for this or something similar. Basically i'd be using the cloud for what it's supposed to be - taking the extra load off.
Does anybody do this?
This is really interesting and I wonder if it's true? Do you know of this happening? I don't. Is that just because no-one thought about it or is it maybe not as easy as it seems? Or is there another reason?
The bandwidth costs under normal circumstances should be trivial to calculate, right? I guess many services do not serve that much outgoing data, especiall after caching. But, of course, use the right tool for the job etc :) If the job is serving ISOs, then maybe PaaS it not the right tool.
The idea that datacenter egress bandwidth can continue to be this expensive is ridiculous. A company using AWS or GCP is missing out on opportunities that are about to be created by very fast internet connections. It's an entire "disruptive tech" innovation that these cloud services will be ineligible to compete with (16-30x markups!) I've run the numbers on switching to AWS and GCP numerous times, and the numbers never add up to something I could sustain for Neocities.
I might consider AWS if I'm just making internal apps for a giant company that thinks it's a great deal because their previous vendor was charging 10x more, but as a small startup doing something internet-facing, there's no way I could ever operate safely with that infrastructure risk. I would need success insurance or something. Short term I'd be fine, but long term AWS would be eating my profit margin and possibly even my company.
To say nothing of malicious bandwidth leeching attacks. It's just dangerous all around. I'm not even sure this has a name yet - Economic Service Attack? I remember reading a story of how GreatFire got DDoSed by China and got a $10-30k+ bill from Amazon because of it.
The rest of their offerings are more or less reasonable (their EC2 instances are a bit overpriced IMHO, but reasonable). But the bandwidth prices are just simply not. GCP could get massive switchover from AWS if they simply lowered their bandwidth egress prices.
It's fairly telling to me, lastly, that AWS/GCP/etc. charge nothing for incoming bandwidth and then charge a LOT for outgoing. Just making a backup of the sites on Neocities from S3 to another service would cost over $20 each time I did it (I can do it based on timestamps if I track all the files stored there in a database (double databases == yuck), but I'd much rather have access to something like integrated rsync support to make this process simpler and much more efficient).
It's cheaper if you use the CDNs they provide for this purpose.
I see GB not TB?
It totally wrote off Azure (2nd in market size) because its a "Linux second" cloud (what does that even mean in a virtualized world).
Also, you forgot to analyze support and SLAs around functionality. Good luck with GCP when something goes wrong or they decide to sunset a feature.
One of the reasons why Spotify went with Google Cloud is because of their superior support.
Even if not, they are such a large well known name that they probably got special treatment. The real proof in the pudding is the support that the 99% get, not the special case 1%.
<<This should really be titled "A comparison of AWS and GCP.">>Also, nice to see someone finally identify DigitalOcean as a B2C provider.
https://quizlet.com/blog/287-million-events-per-day-and-1-en...
TL;DR: One engineer leveraged Google BigQuery's Streaming API to build a pipeline to analyze ~300 million events per day in realtime.
edit : typo
We had a gold level support ticket open about this for months and they recently responded that they are making it a "feature request". Yes, proper UDP packet reassembly is a " feature request".
While I'm surprised by your "almost once a day" (seems high), we have also made a lot of improvements in the last year to make them even less impactful.
Disclosure: I work on Compute Engine.
Also, why dismiss DigitalOcean as a niche provider for hobbyists? The simple pricing, with lots of data transfer included, should appeal to a lot of businesses too.
Sadly (and despite repeated pleading), Quizlet didn't bother to do anything -- at all -- with LX-branded zones. This was a bit dispiriting because they were part of the motivation for the work (namely, a customer of ours that was upfront with the "impossible" demand of the performance they saw in a SmartOS container but with their Linux stack). I think that even by the time the LX-branded zone work was clearly on a production trajectory (i.e., late 2014), they had already implicitly decided to move away from Joyent to a more established brand. That's fine, and I don't fault them for it (and I definitely appreciate their kind words for Joyent in general and our support and engineering teams in particular) -- but I do wish they'd been more upfront about their rationale.
http://www.cpcstrategy.com/blog/2015/08/amazon-product-ads-d...
http://www.cpcstrategy.com/blog/2015/10/amazon-text-ads-disc...
http://www.cnbc.com/2015/09/09/amazon-discontinues-disappoin...
http://uk.businessinsider.com/amazon-discontinues-amazon-ele... :)
It seems that if Google believes in GCP being almost as big as AdWords, the likelihood of GCP Compute being shut down is as likely as GMail for Business being shut down. Not saying that it couldn't happen, but with Spotify and Quizlet using GCP instances, I find it highly unlikely the compute platform would go away, especially with paying users. A free product on the other hand could die on a whim.
Disclosure: I work on Compute Engine (and launched Preemptible VMs).
The UI is also weird (at least by my taste), for example it is not possible to search instances by their addresses, it is not possible to spin up more than one instance at once, and so on. AWS has an ugly console, but it feels more productive.
https://cloud.google.com/bigquery/sla
which has a 99.9% monthly uptime target. Are you seeing errors more often than that?
Additionally (and this isn't required), do you have a support package? I'm curious where you've been asking questions without response. If it's StackOverlow, that is best effort, but we do really try.
Disclosure: I work on Compute Engine.
https://labs.spotify.com/2016/03/10/spotifys-event-delivery-...
Quizlet is now the ~50th biggest website in the U.S.Not to say 150th isn't impressive or likely a lot of traffic.. but if you're going to post a number and claim like that, it should be accurate.
Ballpark, I'd say that's between 700 and 1,000 hits/second on the main frontend. I sort of doubt either AWS or GCP is so much faster than the other for this kind of load.
Quantcast is usually more accurate and Quizlet US Rank on Quantcast is 37.
It has a great UI (material design), and the UX makes sense (the dashboard shows you a summary of your resources, resources are organized by project, notification/status icon animates when resources are changing, etc). Going back to the AWS dashboard feels clunky.
There aren't a million different image types for each region and zone - simple, autoupdated base images are available for Ubuntu, CoreOS, etc.
It has easy to understand base machine types and custom machine types with tailored specs can be created if needed. Product/service naming is clear (ex. Compute Engine vs EC2, Cloud Storage vs S3).
Addons like one-click secure web SSH sessions and Cloud Shell are amazing, no more key pairs to worry about.
Google Container Engine, with a hosted Kubernetes master, is a great concept and more transparent than closed source AWS ECS.
Their on-demand per minute pricing with sustained usage discounts is almost always significantly cheaper than AWS on demand instances, and your discounts are given automatically. Try the two calculators for yourself: Google (https://cloud.google.com/products/calculator/), AWS (https://calculator.s3.amazonaws.com/index.html).
Also, I have seen Google engineers all over HN (look at the comments on this post!) and other sites responding, commenting, and blogging - they seem actively engaged while I have seen very little from AWS.
That is not to say GCP is without problems. AWS IAM is still superior - it is easier to grant access to specific services for specific users, or have an account for a web server to upload to S3. Part of that is due to the fact that there is more plug-and-play tooling available for AWS today - boto comes to mind (boto GCP integration isn't as seamless as with AWS), as well as WAL-E. AWS's new certificate manager with free, auto-renewed SSL certs and installation on EC2 is awesome. S3 is cheaper than Google Cloud Storage. AWS has a longer free tier.
Luckily, tools like Terraform allow us to mix and match services from each cloud.
I disagree here. I find Google's UI to be the clunkier one. Sure, AWS is positively antique, but it's clean, readable, understandable and predictable in a homely Web 2.0 (or even 1.0) way.
Google's UI seems haphazardly put together by comparison, from the super tiny font to how common tasks are too often hidden away — the hamburger menu and the project selector being two examples. The progress of a task is also often hidden away and fairly inscrutable, such as when creating a container cluster.
When I started looking into the container support, I found that there's basically no web console for it. You can create clusters and see some summary of status about the cluster, but you can't see pods, replication controller settings, etc. — it turns out that the "Container Engine" is little more than a prebuilt Linux image with a startup shell script that starts up Kubernetes. AWS's ECS is the same way, but at least it has screens for creating jobs, adjusting resource settings and so on.
Google Cloud seems pretty great, but the web console definitely has a long way to go.
Thanks for your great review! We really appreciate it.
I hate to nitpick one point in what you said, but are you sure that S3 is cheaper than Google Cloud Storage? Glacier is definitely cheaper than we are, but they offer ~4 hour object retrieval latency. If you consider only services with real-time retrieval, I think we stack up quite well in both performance and price :).
Here are the calculators that show S3 is cheaper than GCS - hopefully I didn't type anything in wrong. I used 1 TB of storage, 10 million get operations and 10 million post operations, and 200 GB egress.
S3 monthly: $102.63
GCS monthly: $160.62 (about 56.5% more expensive)
GCS: https://cloud.google.com/products/calculator/#id=cddb4e9a-f2...
S3: https://calculator.s3.amazonaws.com/index.html#r=IAD&s=S3&ke...
https://cloud.google.com/preemptible-vms/
It would be interesting to see Quizlet thoughts on GCP preemptible VMs vs AWS spot instances and why they think they are better (or not?), but that could be the subject of a whole different post.
(Disclaimer: I work at Google - https://twitter.com/felipehoffa)
Buy X capacity reserved
Buy Y capacity spot instance
Buy Z capacity on demand to fill in the peaks
And it becomes a function of putting in more time = cheaper total bill.
GCE workflow is:
Buy the instances you need, bill will work out at the end of the month.
I would like know why they made that choice.
This is very recent though, and not considered Generally Available yet across the platform (meaning fully hardened, supported, and backed by an SLA).
Take another look!
Disclosure: I work on Compute Engine.
https://cloud.google.com/compute/docs/networks-and-firewalls
Anyone else have to read "Madame Bovary" in high school? Maybe this focus on developer is a form of provincialism.
Come on: someone smarter than me must have coined this already.
There are companies that help you run "seamlessly" on any cloud provider you want so in theory you can use them to balance your services between cloud provider for cost or performance.