Kubernetes on Hetzner: cutting my infra bill by 75% (opens in new tab)

(bilbof.com)

375 pointsBillFranklin1y ago220 comments

220 comments

We've [1] been using Hetzner's dedicated servers to provide Kubernetes clusters to our clients for a few years now. The performance is certainly excellent, we typically see request times half. And because the hardware is cheaper we can provide dedicated DevOps engineering time to each client. There are some caveats though:

1) A staging cluster for testing updates is really a must. YOLO-ing prod updates on a Sunday is no one's idea of fun.

2) Application level replication is king, followed by block-level replication (we use OpenEBS/Mayastor). After going through all the Postgres operators we found StackGres to (currently) be the best.

3) The Ansible playbooks are your assets. Once you have them down and well-commented for a given service then re-deploying that service in other cases (or again in the future) becomes straightforward.

4) If you can I'd recommend a dedicated 10G network to connect your servers. 1G just isn't quite enough when it comes to the combined load of prod traffic, plus image pulls, plus inter-service traffic. This also gives a 10x latency improvement over AWS intra-az.

5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).

6) MinIO clusters (via the operator) are not that tricky to operate as long as you follow the well trodden path. This provides you with local high-bandwidth, low-latency object storage.

7) The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.

8) You can still push ancillary/annoying tasks off onto cloud providers (personally I'm a fan of CloudFlare for HTTP load balancing).

[1]: https://lithus.eu

bigbones1y ago

> dedicated 10G network to connect your servers

Do you have to ask Hetzner nicely for this? They have a publicly documented 10G uplink option, but that is for external networking and IMHO heavily limited (20TB limit). For internal cluster IO 20TB could easily become a problem

adamcharnock1y ago

It is under their costing for 'additional hardware'[1]. You need to factor in the switch, uplink for each server, and the NIC for each server.

[1]: https://docs.hetzner.com/robot/general/pricing/price-list-fo...

nh21y ago

Hetzner does not charge for internal bandwidth.

bambambazooka1y ago

> 5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).

Are you willing to share example config for that part?

adamcharnock1y ago

I don't have one I can share publicly, but if you send me an email I'll see what I can do :-) Email is in my profile.

You'll need a bit of baseline networking knowledge.

bc569a80a344f9c1y ago

Should note that if you don't have enough networking knowledge, this is an excellent way to build a gun to shoot yourself in the foot with. If you misconfigure BGP or don't take basic precautions such as sanity filters on in- and outbound routes, you can easily do something silly like overwrite each server's default route, taking down all your services.

It's not rocket science, but it is complex, and building something complex you don't fully understand for production services can be a very bad idea.

lucasrattz1y ago

> The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.

Perhaps you could take a look at https://syself.com (Disclaimer: I'm an employee there). We built a platform that gives you production-ready clusters in a few minutes.

sureIy1y ago

> I'd put it at 2-4 months of undistracted skilled engineering time.

How much is that worth to your company/customer vs a higher monthly bill for the next 5 years?

As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all, unless I expect a 10k/month AWS bill.

xkcd comes to mind: https://xkcd.com/1319/

adamcharnock1y ago

> As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all.

Well I do rather agree, but as a consultancy I'm biased.

But let's do some math. Say it's 4 months (because who has uninterrupted time), a senior rate of $1000/day. 20 days a month, so 80 days, is an $80k outlay. That's assuming you can get the skills (because AWS et al like to hire these kinds of engineers).

Say one wants a 3 year payback, that is $2,200/month savings you need. Which seems highly achievable given some of the cloud spends I've seen, and that I think an 80-90% reduction in cloud spend is a good ballpark.

The appeal of a consultancy is that we'll remove the up-front investment, provide the skills, de-risk the whole endeavour, even put engineers within your team, but you'll _only_ save 50%.

The latter option is much more appealing in terms of hiring, risk, and cash-flow. But if your company has the skills, the cash, and the risk tolerance then maybe the former approach is best.

EDIT: I actually think the(/our) consultancy option is a really good idea for startups. Their infrastructure ends up being slightly over-built to start with, but very quickly they end up saving a lot of money, and they also get DevOps staffing without having to hire for it. Moreover, the DevOps resource available to them scales with their compute needs. (also we offer 2x the amount of DevOps days for startups for the first year to help them get up and running).

nkmnz1y ago

this assumes there are no devops/consulting cost to setup something with AWS. My experience is that "the aws way of doing XYZ" is almost as complicated as doing it the non-AWS-way. On top of that: the non-AWS-way is much more portable across hosting providers, so you decrease your business risks considerably.

2 more replies

tutfbhuf1y ago

I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.

From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.

On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.

In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.

In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.

Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

supriyo-biswas1y ago

This is definitely some ChatGPT output being posted here and your post history also has a lot of this "While X, Y also does Z. Y already overlaps with X" output.

I'd like to see your breakdowns as well, given that the cost difference between a 2 vCPU, 4GB configuration (as an example) and a similar configuration on AWS is priced much higher.

There's also https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne... to reduce the operational burden that you speak of.

tutfbhuf1y ago

It is my ouput, but I use ChatGPT to fix my spelling and grammar. Maybe my prompt for that should be refined in order to not alter the wording too much.

redbell1y ago

While using ChatGPT for enhancing your writings is not wrong by any means, reviewing the generated output and re-editing when necessary is essential to avoid robotic writing style that may smell unhuman. For instance, these successive paragraphs: "In contrast, using AWS.." and "In essence, with AWS.." leaves a bad taste in your brain when read consecutively.

1 more reply

lproven1y ago

> I use ChatGPT to fix my spelling and grammar

I have a better suggestion, which will save time, energy, money, and human work.

Don't.

Write it yourself. If you can't, don't post.

1 more reply

0xFF01231y ago

While I agree that your characterisation is true for a lot of chatgpt output, it can also be true for a human explaining their nuanced point of view.

ratg131y ago

Most humans don't say a couple sentences and then re-summarize them 3 more times unless they are speaking to someone with a learning disability.

MathMonkeyMan1y ago

I've never operated a kubernetes cluster except for a toy dev cluster for reproducing support issues.

One day it broke because of something to do with certificates (not that it was easy to determine the underlying problem). There was plenty of information online about which incantations were necessary to get it working again, but instead I nuked it from orbit and rebuilt the cluster. From then on I did this every few weeks.

A real kubernetes operator would have tooling in place to automatically upgrade certs and who knows what else. I imagine a company would have to pay such an operator.

_bare_metal1y ago

This.

I run BareMetalSavings.com[0], a toy for ballpark-estimating bare-metal/cloud savings, and the companies that have it hardest to move away from the cloud are those who are highly dependent on Kubernetes.

It's great for the devs but I wouldn't want to operate a cluster.

[0]: https://www.BareMetalSavings.com

declan_roberts1y ago

That's just not how it works on any scale other than "toy"

MathMonkeyMan1y ago

Right, but certs get out of date unless somebody does something about it, that was my point.

KaiserPro1y ago

Ceph is a bastard to run. Its expensive, slow and just not really ready. Yes I know people use it, but compared to a fully grown up system (ie lustre[don't its raid 0 in prod] or GPFS [great but expensive]) its just a massive time sync.

You are much better off having a bunch of smaller file systems exported over NFS make sure that you have block level replication. Single address space filesystems are ok and convenient, but most of the time are not worth the cost of admin to get reliable at scale. like a DB shard your filesystems, especially as you can easily add mapping logic to kubernetes to make sure you get the right storage to the right image.

mfld1y ago

I saw that Hetzner is beta testing ceph-based object storage. This could make the setup much easier. Anyone tested this already?

sgarland1y ago

I agree that it is hideously complicated (to anyone saying “just use Rook,” I’ll counter that if you haven’t read through Ceph’s docs in full, you’re deluding yourself that you know how to run it), but given that CERN uses it at massive scale, I think it’s definitely prod-ready.

KaiserPro1y ago

Oh it probably is prod ready, I just wouldn't use it unless I had to (ie I had the staff to look after it and no money to buy something better)

whether is a good fit for general purpose storage of stuff at a small scale is harder question. Its not easy to get good performance at small scale, and to get good performance requires a larger than you'd like number of storage nodes.

Yes it has inline FEC, (https://www.ibm.com/docs/en/storage-ceph/7?topic=components-...) but its lots of layers to get to a file system.

Personally I'd have a redundant array of storage nodes and be done with it. Its easier to debug a single server than 3 layers of ceph weirdness.

freedomben1y ago

I mostly agree, but it surprises me that people don't often consider a solution right in the center, such as openshift. Have a much, much less burden for devops and have all the power and flexibility of running on bare metal. It's a great hybrid between a fully managed and expensive service versus a complete build your own. It's expensive enough. Todd, for startups it is not likely a good option, but if you have a cluster with at least 72 GB of RAM or 36 CPUs going (about 9 mid size nodes), you should definitely consider something like openshift.

mountainriver1y ago

Manually updating k8s clusters is a huge tradeoff. I can’t imagine doing that to save a couple bucks unless I was desperate

TheDong1y ago

I dunno, I've had to spend like two or three hours each month on updating mine for its entire lifetime (of over 5 years now), and that includes losing entire nodes to hardware failure and spinning up new ones.

Originally it was ansible, and so spinning up a new node or updating all nodes was editing one file (k8s version and ssh node list), and then running one ansible command.

Now I'm using nixos, so updating is just bumping the version number, a hash, and typing "colmena apply".

Even migrating the k8s cluster from ansible to nixos was quite easy, I just swapped one node at a time and it all worked.

People are so afraid of just like learning basic linux sysadmin operations, and yet it also makes it way easier to understand and debug the system too, so it pays off.

I had to help someone else with their EKS cluster, and in the end debugging the weird EKS AMI was a nightmare and required spending more time than all the time I've had to spend on my own cluster over the last year combined.

From my perspective, using EKS both costs more money, gives you a worse K8s (you can't use beta features, their ami sucks), and also pushes you to have a worse understanding of the system so that you can't understand bugs as easily and when it breaks it's worse.

dijit1y ago

if the "couple of bucks" ends up being the cost of an entire team, then hire a small team to do it.

Then get mad at them because they don't "produce value", and fold it into a developers job with an even higher level of abstraction again. This is what we always do.

lucasrattz1y ago

We at https://syself.com have made a platform with "one-click updates". 100% vanilla Kubernetes on Hetzner.

p_l1y ago

The "couple bucks" in my experience were difference between viable business and bankrupt one - including time spent on maintaining k8s!

spwa41y ago

> Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

Sure, but the TLDR is going to be that if you employ n or more sysadmins, the cost savings will dominate. With 2 < n < 7. So for a given company size, Hetzner will start being cheaper at some point, and it will become more extreme the bigger you go.

Second if you have a "big" cost, whatever it is, bandwidth, disk space (essentially anything but compute), cost savings will dominate faster.

stackskipton1y ago

Not always. Employing Sysadmins doesn't mean Hetzner is cheaper because those "Sysadmin/Ops type people" are being hired to managed the Kubernetes cluster. And Ops type people who truly know Kubernetes are not cheap.

Sure, you can get away with legoing some K3S stuff together for a while but one major outage later, and that cost saving might have entirely disappeared.

srockets1y ago

More than that: the more you use, the more discounts you can get from a major CSP, which would also reduce the TCO for using a managed service.

UltraSane1y ago

Even a short outage can completely wipe out any savings.

kshri241y ago

Is it just me or do the last 3 paragraphs feel like ChatGPT output?

tutfbhuf1y ago

I used GPT4o to fix all my spelling and grammar mistakes, maybe it went a little too far, but this is 100% my comment

lproven1y ago

> this is 100% my comment

No, it is not.

runeks1y ago

Isn't the point of ChatGPT to mimic sentences written by humans?

perching_aix1y ago

Kind of. But which humans? It's a bit like how the average person doesn't exist, except in the LLM world, now it does.

murderfs1y ago

GPT-4 is, but ChatGPT is fine-tuned to emit sentences that get rated well (by humans, and by raters trained to mimic human evaluation) in a conversational agent context.

andai1y ago

Yeah, I was wondering the same thing.

dvfjsdhgfv1y ago

> Hetzner volumes are, in my experience, too slow for a production database. While you may in the past have had a good experience running customer-facing databases on AWS EBS, with Hetzner's volumes we were seeing >50ms of IOWAIT with very low IOPS.

There is a surprisingly easy way to address this issue: use (ridiculously cheap) Hetzner metal machines as nodes. The ones with nvme storage offer excellent performance for dbs and often have generous amounts of RAM. I'd go as far as to say you'd be better off to invest in two or more beefy bare metal machines for a master-replica(s) setup rather than run the db on k8s.

If you don't want to be bothered with the setup, you can use one of many modern packages such as Pigsty: https://pigsty.cc/ (not affiliated but a huge fan).

threeseed1y ago

There are plenty of options for running a database on Kubernetes whilst using local NVMe storage.

There are just pinning the database pods to specific nodes and using a LocalPathProvisioner or distributed solutions like JuiceFS, OpenEBS etc.

BillFranklinOP1y ago

Thanks, hadn’t heard of pigsty. As you say, I had to use nvme ssds for the dbs, the performance is pretty good so I didn’t look to get metal nodes.

lucasrattz1y ago

I've had great experiences with using the bare metal server's local storage.

This is the guide I wrote for our customers: https://syself.com/docs/hetzner/apalla/how-to-guides/storage...

gourneau1y ago

Thanks for the Pigsty link. I have been a big fan of running Postgres on metal machines.

mythz1y ago

Been a happy Hetzner customer for over a decade, previously using their dedicated servers in their German DC's before migrating to their Cloud US VMs for better latency with the US. Slightly disappointed with their recent cuts of their generous 20TB free traffic down to 3TB (€1.19 per additional TB), but they still look to be a lot better value than all other US cloud providers we've evaluated.

Whilst I wouldn't run Kubernetes by choice, we've had success moving our custom SSH / Docker compose deployments over to use GitHub Actions with kamal-deploy.org, easy to setup and nice UX tools for monitoring remote deployed apps [1]

[1] https://servicestack.net/posts/kamal-deployments

Voultapher1y ago

Seems to be a US thing, maybe their peering partners are forcing them to raise prices, the German DC still stells the 20TB bandwidth https://www.hetzner.com/cloud/, but US is an order of magnitude less for the same price :/

inemesitaffia1y ago

I don't see how traffic in Ashburn is more expensive than Frankfurt and Amsterdam.

It's the sort of place where people say Transit is cheaper than paid peering. (For eyeball networks at least).

I think carrying traffic from Europe for some images and videos might make sense financially. But there's always bulk CDN's

kuschku1y ago

> I don't see how traffic in Ashburn is more expensive than Frankfurt and Amsterdam.

The vast majority of Hetzner's traffic in europe (and tbh, anyone's traffic) is free peering. Telekom is the one major exception.

0xbadcafebee1y ago

I used to do my own car maintenance, because I wanted to save money, and it was fun. It turned out it was more complex than I thought, things slowly fell apart or broke. I spent a good deal of time "re-fixing" things. Spent probably thousands on tools over the years, partly replacing the cheap stuff that broke or rusted quickly. My cars were often up on blocks. But I learned a lot of great lessons. The biggest one? Some things are not worth DIYing; pay a mechanic or lease your car, especially if you depend on it for your livelihood.

Even something as simple as an oil change, really isn't worth doing yourself. First you buy the tools (oil drip pan, filter wrench, funnel, creeper). Then you set aside the time to use them, find your dingy work clothes. You go to the store and buy new oil and a filter. You go home and change the oil. Then that day or another day you go to a store that will take your used oil. Versus 20 minutes at an auto mechanic, for about $15 more than the cost of the oil and filter.

Kubernetes is an entire car (and a complex one). It's really not worth doing the maintenance yourself, I promise you. Unless you're just doing it for fun.

theappsecguy1y ago

I don’t know. My shop wanted 1800 to change my brakes. I bought the parts for 300 and got it done in a day (first time). Seems like a pretty good payback and good skill to have. My neighbour has a car lift which certainly helped.

0xbadcafebee1y ago

Did you make sure to use the right grease after cleaning the caliper slide pins and boots? If not, those puppies can wear out quick and you'll be replacing not just one caliper, but a pair, as they need to be at about the same wear level/make/model, costing you more money (and time). Don't ask me how I know...

(This is what I think about when someone says "hey, my monthly bill is cheaper!" and later ends up with unhappy customers when their cluster goes kaput and they can't get it working again for days. Don't ask me how I know...)

p_l1y ago

Unless you find out that calling AA for everything and going to big name mechanics garage every time costs you significant portion of your budget.

A lot of it is finding balance between what to do yourself, what to outsource, and it's not as easy or clean as some people here like to claim.

infecto1y ago

+1 to this. I agree with the takeaway that paying for managed Kubernetes is the way to go 9 times out of 10 but I think in life its quite powerful to attempt to do things yourself sometimes. No, you should not go out and buy thousands of dollars worth of tools and do everything yourself but its pretty powerful to be able to do simple things in your life...change oil, replace light switches/sockets. The reason I personally find it powerful is that it helps me quantify what is worth outsourcing and what is not and it also builds mental models for how things work which is important when outsourcing work.

wvh1y ago

Depends on one: how interested/motivated are you, now and down the line; and two: how likely is your dependency on a third party going to screw you over in the long run.

My opinion, from the viewpoint of a consultant often involved in Kubernetes, is to get initial help and a persistent help line, but get somebody internally interested enough to ride along and learn.

Consultants and experts in general can save you from a lot of bad up-front decisions and banging your head against the wall for months. It's not trivial to learn your way around technologies or ecosystems, including common dark corners and pitfalls, in a reasonable amount of time while also having to focus on your core business. Accept help but learn to fish and to make a fire.

jonas211y ago

This is an interesting writeup, but I feel like it's missing a description of the cluster and the workload that's running on it.

How many nodes are there, how much traffic does it receive, what are the uptime and latency requirements?

And what's the absolute cost savings? Saving 75% of $100K/mo is very different from saving 75% of $100/mo.

jpgvm1y ago

In my experience noone bothers unless they are using GPUs or they are already at 100k/mo.

I do think 100k/mo is the tipping point actually, that is $1.2M/yr.

It costs around $400k/yr in engineering salaries to reasonably support a sophisticated bare metal deployment (though such people can generally do that AND provide a lot of value elsewhere in the business, so really it's actual cost is lower than this) and about $100k/yr in DC commitments, HW amortisation, and BW roughly. So you save around $700k a year which is great but the benefit becomes much greater when your equiv cloud spend is even bigger than that.

wrd831y ago

If you want to do HA kubernetes, you need oncalls and at least 10 engineers to get a stable rotation.

If you do that in Europe you have to pay them during standby hours.

400k/year seems very low to me.

jpgvm1y ago

You really don't need all 10 people on-call to know k8s to that level. They just need to know enough as to when to wake someone else up.

Everywhere I have worked where we have run clusters in the 100s to 1000s of nodes we have rarely had a team larger than 4-5 of true k8s folks and even then it's been a split between folks that are very hardware provisioning/network/etc focused and more higher level k8s folk which also take on a large portion of CI/CD work also.

At smaller scale (in the $1M/yr ballpark) I have done all the k8s bare metal ops myself along with all CI/CD and been responsible for a ton of the backend programming too. This is feasible because with distros like Talos etc it doesn't take a lot of manpower once it's setup and upgrades aren't too painful at small scale if you aren't running stateful services.

So tbh no, you just need ideally 2 folks at around ~200k/yr each that are competent and have done it before. The rest of the folks on the on-call rotation are just the rest of your engineers (and if you are at $1m/yr cloud spend you have more than 10 of those).

slillibri1y ago

When I worked in web hosting (more than 10 years ago), we would constantly be blackholeing Hetzner IPs due to bad behavior. Same with every other budget/cheap vm provider. For us, it had nothing to do with geo databases, just behavior.

You get what you pay for, and all that.

SoftTalker1y ago

Yep I had the same problem years ago when I tried to use Mailgun's free tier. Not picking on them, I loved the features of their product but the free tier IPs had a horrble reputation and mail just would not get accepted especially by hotmail or yahoo.

Any free hosting service will be overwhelmed by spammers and fraudsters. Cheap services the same but less so, and the more expensive they are the less they will be used for scams and spams.

thwarted1y ago

Tragedy of the Commons Ruins Everything Around Me.

haroldp1y ago

It's always evolving, but these days the most common platforms attacking sites that I host are the big cloud providers, especially Azure. But AWS, Google, Digital Ocean, Linode, Contabo, etc all host a lot of attacks trying to brute-force logins and search for common exploits.

UltraSane1y ago

AWS tries hard to keep its public IPs from getting on banlists.

oblio1y ago

They could put the backend on Hetzner, if it makes sense (for example queues or batch processors).

1 more reply

mzhaase1y ago

I had to try multiple floating IPs on hcloud before I got one that wasn't blacklisted on the k8s repos.

Keyframe1y ago

depending on the prices, maybe a valid strategy would be to have servers at hetzner and then tunnel ingress/egress somewhere more prominent. Maybe adding the network traffic to the calculation still makes financial sense?

srockets1y ago

At 0.02$/GB, it rarely does.

wvh1y ago

I work for a consultancy company that helps companies building and securing infrastructure. We have a lot of customers running Kubernetes at low-cost providers (like Hetzner), more local middle-tier and top-three (AWS, GCP, Azure). We also have some governmental, financial and medical companies that can not or will not run in public clouds, so they usually host on-prem.

If Hetzner has an issue or glitch once a month, the middle-tier providers have one every 2-3 months, and a place like AWS maybe every 5-6 months. However, prices also follow that observation, so you have to carefully consider on a case-by-case basis whether adding some extra machines and backup and failure scenarios is a better deal.

The major benefit by using basic hosting services is that their pricing is a lot more predictable; you pay for machines and scale as you go. Once you get hooked into all the extra services a provider like AWS provides, you might get some unexpectedly high bills and moving away might be a lot harder. For smaller companies, don't make short-sighted decisions that threaten your ability to survive long-term by choosing the easy solution or "free credits" scheme early on.

There is no right answer here, just trade-offs.

Volundr1y ago

I haven't used it personally, but https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne... looks amazing as a way to setup and manage kubernetes on Hetzner. At the moment I'm on Oracle free tier, but I keep thinking about switching to it to get off... Well Oracle.

mkreis1y ago

I'm running two clusters on it, on for production and one for dev. Works pretty good. With a schedule to reboot machines every sunday for automatic security updates (SuSE Micro OS). Also expanded machines for increased workloads. You have to make sure to inspect every change terraform wants to do, but then you're pretty save. The only downside is that every node needs a public IP, even though they are behind a firewall. But that is being worked on.

not_elodin1y ago

I've used this to set up a cluster to host a dogfooded journalling site.

In one evening I had a cluster working.

It works pretty well. I had one small problem when the auto-update wouldn't run on arm nodes which stopped the single node I had running at that point (with the control plane taint blocking the update pod running on them).

preisschild1y ago

Ive also been using Cluster-API + Cluster-API-Provider-Hetzner

https://github.com/syself/cluster-api-provider-hetzner

works rock solid

maestrae1y ago

i recently read an article about running k8s on the oracle free tier and was looking to try it. i'm curious, are there any specific pain points that are making you think of switching?

Volundr1y ago

Nope, just Oracle being a corp with a nasty reputation. Honesty it was easy to set up and has been super stable, and if you go ARM the amount of resources you get for free is crazy. I actually do recommend it for personal projects on the like. I'd just be hesitant about building a business based on any Oracle offering.

maestrae1y ago

Got it, thanks for the clarification! I’ll be using it for a personal project so that sounds great.

davidgl1y ago

I've got a couple of free arm machines setup as a cluster for learning k8 + a few LB in front of it. I use k3s, with pg rather than etcd. Been a great learning experience.

no_carrier1y ago

> While DigitalOcean, like other providers, offers a free managed control plane, there is typically a 100% markup on the nodes that belong to these managed clusters.

I don't think this is true. With Digital Ocean, the worker nodes are the same cost as regular droplets, there's no additional costs involved. This makes Digital Ocean's offering very attractive - free control plane you don't have to worry about, free upgrades, and some extra integrations to things like the load balancer, storage, etc. I can't think of a reason to not go with that over self-managed.

czhu121y ago

The actual nodes are still way more expensive on digital ocean than they are in Hetzner. That’s probably the main reason.

8GB RAM, shared cpu on hetzner is ~$10

Equivalent on digital ocean is $48

lucasrattz1y ago

Besides what czhu12 mentioned, DOKS charging extra for HA control planes make me feel as if the platform is not production-grade.

If you want a managed experience on Hetzner, you could take a look at https://syself.com

Disclaimer: I'm an employee there

1 more reply

chipdart1y ago

I loved the article. Insightful, and packed with real world applications. What a gem.

I have a side-question pertaining to cost-cutting with Kubernetes. I've been musing over the idea of setting up Kubernetes clusters similar to these ones but mixing on-premises nodes with nodes from the cloud provider. The setup would be something like:

- vCPUs for bursty workloads,

- bare metal nodes for the performance-oriented workloads required as base-loads,

- on-premises nodes for spiky performance-oriented workloads, and dirt-cheap on-demand scaling.

What I believe will be the primary unknown is egress costs.

Has anyone ever toyed around with the idea?

mhuffman1y ago

For dedicated they say this:

>All root servers have a dedicated 1 GBit uplink by default and with it unlimited traffic.

>Inclusive monthly traffic for servers with 10G uplink is 20TB. There is no bandwidth limitation. We will charge € 1/TB for overusage.

So it sounds like it depends. I have used them for (I'm guessing) 20 years and have never had a network problem with them or a surprise charge. Of course I mostly worked in the low double digit terabytes. But have had servers with them that handled millions of requests per day with zero problems.

lyu072821y ago

20TB egress on AWS runs you almost $2,000 btw. one of the biggest benefits of Hetzner

pdpi1y ago

1 / 8 * 3600 * 24 * 30 = 324000 so that 1GBit/s server could conceivably get 324TB of traffic per month "for free". It obviously won't, but even a tenth of data is more than the data included with the 10G link.

jorams1y ago

They do have a fair use policy on the 1GBit uplink. I know of one report[1] of someone using over 250TB per month getting an email telling them to reduce their traffic usage.

The 10GBit uplink is something you need to explicitly request, and presumably it is more limited because if you go through the trouble of requesting it, you likely intend to saturate it fairly consistently, and that server's traffic usage is much more likely to be an outlier.

[1]: https://lowendtalk.com/discussion/180504/hetzner-traffic-use...

chipdart1y ago

> We will charge € 1/TB for overusage.

It sounds like a good tradeoff. The monthly cost of a small vCPU is equivalent to a few TB of bandwidth.

adamcharnock1y ago

We've toyed around with this idea for clients that do some data-heavy data-science work. Certainly I could see that running an on-premise Minio cluster could be very useful for providing fast access to data within the office.

Of course you could always move the data-science compute workloads to the cluster, but my gut says that bringing the data closer to the people that need it would be the ideal.

threeseed1y ago

> Has anyone ever toyed around with the idea?

Sidero Omni have done this: https://omni.siderolabs.com

They run a Wireguard network between the nodes so you can have a mix of on-premise and cloud within one cluster. Works really well but unfortunately is a commercial product with a pricing model that is a little inflexible.

But at least it shows it's technically possible so maybe open source options exist.

SOLAR_FIELDS1y ago

You could make a mesh with something like Netmaker to achieve similar using FOSS. Note I haven’t used Netmaker in years but I was able to achieve this in some of their earlier releases. I found it to be a bit buggy and unstable at the time due to it being such young software but it may have matured enough now that it could work in an enterprise grade setup.

The sibling comments recommendation, Nebula, does something similar with a slightly different approach.

1 more reply

chipdart1y ago

> They run a Wireguard network between the nodes so you can have a mix of on-premise and cloud within one cluster.

Interesting.

A quick search shows that some people already toyed with the idea of rolling out something similar.

https://github.com/ivanmorenoj/k8s-wireguard

nullify881y ago

I believe the Cilium CNI has this functionality built in. Other CNIs may do also.

sneak1y ago

Slack’s Nebula does something similar, and it is open source.

oblio1y ago

I'm a bit sad the aggressive comment by the new account was deleted :-(

The comment was making fun of the wishful thinking and the realities of networking.

It was a funny comment :-(

bdcravens1y ago

Enable "showdead" on your profile and you can see it.

rad_gruchalski1y ago

It wasn’t funny. I can still see it. The answer was vpn. If you want to go fancy you can do istio with vms.

ffsm81y ago

And if you wanna be lazy, there is a tailscale integration to run the cluster communication over it.

https://tailscale.com/kb/1236/kubernetes-operator

They've even improved it, so you can now actually resolve the services etc via the tailnet dns

https://tailscale.com/learn/managing-access-to-kubernetes-wi...

I haven't tried that second part though, only read about it.

2 more replies

jillesvangurp1y ago

The key take home point here is not how amazingly cheap Hetzner is, which it is. But how much of an extortion game Google, Amazon, MS, etc. are playing with their cloud services. These are trillion dollar companies because they are raking in cash with extreme margins.

Yes, there is some added value in the level of convenience provided. But maybe with a bit more competition, pricing could be more competitive. A lot more competitive.

surrTurr1y ago

I set up rook ceph on a talos k8s cluster (with vm volumes) and experienced similar low performance; however, I always thought that was because of the 1Gi vSwitch (i.e. networking problem)?! The SSD volumes were quite fast.

tehlike1y ago

SSD volumes are physically on the same node, and afaik not redundant. The cloud vms are ceph clusters behind the scenes, and writes need to commit for 3+ machines. It's both network latency and inherent process latency

Additionally, hetzner has an IOPS limit of 5000 and write limit of some amount that does not scale with the size of database.

50G has the same limits as 5TB.

For this reason, people are sometimes using different table spaces in postgres for example.

Ceph puts another burden on top of already-ceph-based cloud volumes, btw, so don't do that.

p_l1y ago

RAID10 on local SSDs is pretty performant option, but yeah, it's per node.

merpkz1y ago

In my limited experience with rook-ceph it is strictly bare metal technology to deploy. On virtualization it will basically replicate your data to VM disks which usually are already replicated, so quite a bit of replication amplification will happen and tank your performance.

hipadev231y ago

Be careful with Hetzner, they null routed my game server on launch day due to false positives from their abuse system, and then took 3 days for their support team to re-enable traffic.

By that point I had already moved to a different provider of course.

danpalmer1y ago

Digital Ocean did this to my previous company. They said we’d been the target of a DOS attack (no evidence we could see). They re-enabled the traffic, then did it again the next day, and then again. When we asked them to stop doing that they said we should use Cloudflare to prevent DOS attacks… all the box did was store backups that we transferred over SSH. Nothing that could go behind Cloudflare, no web server running, literally only one port open.

teitoklien1y ago

where did you move, asking to keep a list of options for my game servers, i’m using ovh game servers atm

hipadev231y ago

I went back to AWS. Expensive but reliable and support I can get ahold of. I’d still like to explore OVH someday though.

teitoklien1y ago

Nothing beats aws tbh, the level of extra detail aws adds, like emailing and alerting a gazillion times before making any changes to underlying hardware, even if non disruptive. Robust <24 hour support from detailed, experienced and technical support staff, a very visible customer obsession laced experience all-around. Ovh has issues with randomly taking down vps/baremetal instances at random with their support staff having no clue/late non-real time data on their instance state, they lost a ton of customer data in their huge datacenter fire 2 yrs ago, didnt even replicate the backups across multiple datacentres like they were supposed to, got sued a ton too.

I use OVH because the cost reduction supremely adds up for my workloads (remote video editing/ custom rendering farm at scale with a lot more cheaper OVH s3 suitable for my temporary but too many asset workload with high egress requirements) but otherwise I miss AWS and get now, just how much superior their support and attention to detail is.

1 more reply

ronsor1y ago

Reading comments from the past few days makes it seem like dealing with Hetzner is a pain (and as far as I can tell, they aren't really that cheaper than the competitors).

gurchik1y ago

> (and as far as I can tell, they aren't really that cheaper than the competitors)

Can you say more? Their Cloud instances, for example, are less than half the cost of OVH's, and less than a fifth of the cost of a comparable AWS EC2 instance.

lurking_swe1y ago

even free servers are of no use if it’s not usable during a product launch. :) You get what you pay for i guess.

But i do agree, it is much cheaper.

1 more reply

victorbjorklund1y ago

I don't think so. We see the outliers. Those happens at Linode, Digital Ocean, etc also. And yes even at Google Cloud and AWS you sometimes get either unlucky or unfairly treated.

jgalt2121y ago

> they aren't really that cheaper than the competitors

This is demonstrably false.

jjeaff1y ago

What competitors are similar to Hetzner in pricing? Last I checked, they seemed quite a bit cheaper than most.

Frotag1y ago

Forum for cheap hosts:

https://lowendtalk.com/

Wouldn't reccomend any of these outside of personal use though.

riku_iki1y ago

OVH is larger provider, servers usually not significantly more expensive than hetzner.

jacooper1y ago

Honestly hetzner supoort has bren outstanding from my experience. They are always there and very responsive using email

jpgvm1y ago

If you prefer no bullshit communications they are great. They are to the point, terse and very German. I find this both refreshing and exactly what I want/need out of support. The few times I have needed to contact them it's been HW related. One was a SSD that was clearly having issues even though SMART reported nothing wrong, I sent them blktrace output and they said yup that checks out, scheduled disk replacement right away. The other time was a network related problem with their transit, I had some ASNs that I was trying to talk to suddenly getting some pretty damn cursed paths and a big increase in latency as a result, they sorted out the path weights super fast and everything has been great since.

The only other time I have received better support was from Aussie ISPs. Back in the day when you called Internode the guy who answered the phone was a bona-fide network engineer and would go as far as getting a shell on the DSLAM to check out what is going on. To me that is peak support, live debugging of the problem!

Similarly I called into Aussie Broadband to do my first NBN setup, explained I did "BYO" modem because I was going to initiate the PPPoE session with my Linux router and they said no problem. She even offered to send me a cookie cutter pppd config along with the info to set it up myself. Easily the some of the most knowledgeable and "can do" attitude for first layer support I have encountered.

Needless to say when I encounter damn good support I stay even when it costs more.

1 more reply

esher1y ago

As far as I see, no one is mentioning sustainability AKA environmental impact or 'green hosting' here. Don't you care about that?

I believe that Hetzner data centers in Europe (Germany, Finland) are powered by green energy, but not the locations in US.

preisschild1y ago

They probably just use the local power grid. You can use ElectricityMaps to look up the average carbon intensity per kWh of those grids

https://app.electricitymaps.com/

kuschku1y ago

You can choose which electricity company provides the amount of power to the grid that you're using. While you don't get "your" electricity, overall you can still affect the carbon balance of the electricity that's produced in your name.

Hetzner is using 100% green hydro and wind power for that, which is as sustainable as any grid-connected company can be.

sofixa1y ago

> They probably just use the local power grid

A lot of EU datacenter providers specifically pick green electricity providers/sources, and pride themselves on it, and use it in advertising their sustainability.

Scaleway in particular are 100% no-CO2 (they have it easy, most of their DCs are in France where it's easy to be fully nuclear+renewable). Hetzner are the same.

huijzer1y ago

Data centers used 460 TWh, or about 2% of total worldwide electricity use, according to IEA in 2022.

In comparison, 30% of total energy (energy! Not electricity) goes to transport!

As another point of comparison, transport in Sweden in 2022 used 137 TWh [1]. So the same order of magnitude as total datacenter energy use.

And datacenters are powered by electricity which increases the chance that it comes from renewable energy. Conversely, the chance that diesel comes from a renewable source is zero.

So can we please stop talking about data center energy use? It’s a narrative that the media is currently pushing but as so many things it makes no sense. It’s not the thing we should be focusing on if we want to decrease fossil fuel use.

[1]: https://www.energimyndigheten.se/en/energysystem/energy-cons...

davedx1y ago

2% of total worldwide electricity use in 2022 is a shit load of electricity and emissions. Your argument is the same as those who argue "our country shouldn't care about emissions when China is the biggest emitter".

If you dive into a detailed breakdown of emissions you'll find that it's a complex hierarchy of categories. You can't just fix "all of transport" or treat it like a "low hanging fruit", just look at how much time it's taken for EV penetration to be in any way significant; look at how much of transport emissions are from aviation or shipping or other components.

Any energy use that's measurable in whole percentage points of global emissions needs addressing. That includes data centers.

alt2271y ago

> our country shouldn't care about emissions when China is the biggest emitter

To be fair, until China does something about their emissions, the rest of us are just pissing in the ocean.

3 more replies

huijzer1y ago

> Your argument is the same as those who argue "our country shouldn't care about emissions when China is the biggest emitter".

China and the US are in the same order of magnitude in emissions. So NO that's absolutely not the argument I am making.

> Any energy use that's measurable in whole percentage points of global emissions needs addressing

But it isn't! That's my point. Electricity use is about 20% of total energy use. So if we talk about global emissions, data center is only about 20% * 2% = 0.4% of total energy use.

And then if we talk about total emittance, it's even lower because 40% of electricity is generated from nuclear and renewables.

> just look at how much time it's taken for EV penetration to be in any way significant

Yes so let's focus on that instead of data centers. Data centers are not the problem!

EDIT: Also CPUs and GPUs are still becoming more energy efficient. So I'm a bit skeptical of extrapolations which say that data centers will consume a large percentage of US energy. If the number of CPUs and GPUs doubles each 2 years, but energy efficiency doubles too, then overall energy usage doesn't grow so fast. Especially if old CPUs and GPUs are taken out of the system over time because they become too expensive to operate.

1 more reply

esher1y ago

Thanks for sharing. I care about it. I run a small hosting company. Sure, there are many low hanging fruits for fighting CO2 emissions that should be tackled first. I am also hopeful that energy from directly available renewables will be the most economic choice for building data centers anyhow, so that this is not a matter of believes any more.

But on the other side, to bring down CO2 levels, fast change everywhere is required. As far as I see data center energy consumption continues to grow, specifically with AI.

If I am not mistaken, data centers produce more CO2 than aviation.

And sure, most 'green hosting' is probably 'green washing', yet I would still support and link initiatives such as: https://www.thegreenwebfoundation.org/

postepowanieadm1y ago

> I believe that Hetzner data centers in Europe (Germany, Finland) are powered by green energy, but not the locations in US.

Green lignite.

kuschku1y ago

While fans of nuclear energy like to meme about the German power grid, Hetzner is — in so far as anyone with a grid connection can be — powered by 100% green wind and hydro energy.

You can see the paperwork here:

- https://cdn.hetzner.com/assets/Uploads/oekostrom-zertifikat-...

- https://cdn.hetzner.com/assets/Oomi-sertifikaatti-tuuli+vesi...

esher1y ago

This does not apply to Hetzner US data centers, as far as I know. That's just for Germany and Finland.

Hetzner_OL1y ago

Hi Bill, Wow! Thanks for the amazing write-up and for sharing it on your blog and here! I am so happy that we've helped you save so much money and that you're happy with our support team! It's a great way to start off the week! --Katie

ArtTimeInvestor1y ago

Can anybody speak to the pros and cons of Hetzner vs OVH?

There ain't many large European cloud companies, and I would like to understand how they differentiate.

Ionos is another European one. Currently, it looks like their cloud business is stagnating, though.

Aachen1y ago

My main complaint with OVH is that their checkout process is broken in various ways (missing translations so you get French bits, broken translations so placeholders like ACCEPT_BUTTON leak through, legally binding terms with typos and weird formatting because someone copied them from a PDF into a textarea, UIs from the 90s plastered in between modern ones, missing option to renew a domain for longer than a year, confusing automatic renewal setup, and so on). The control panel in general is quite confusing. They also don't allow hosting an email server (port 25 blocked), iirc the docs tell you to go away and use a competitor

I didn't have any of these web UI issues with Hetzner, but iirc OVH is cheaper for domain names, as well as having very reliable and fast DNS servers (measured various query types across some 6 months), and that's why I initially chose them — until my home ISP gave me a burned IP address and I needed an externally hosted server for originating email data (despite it coming from an old and trusted domain that permitlists the IP address) so now I'm with both OVH and Hetzner... Anyway, another thing I like in OVH is that you can edit the raw zone file data and that they support some of the more exotic record types. I don't know how Hetzner compares on domain hosting though

thenaturalist1y ago

I'd say stay clear of Ionos.

Bonkers first experience in the last two weeks.

Graphical "Data center designer", no ability to open multiple tabs, instead always rerouting to the main landing page.

Attached 3 IGWs to a box, all public IPs, GUI shows "no active firewall rules".

IGW 1: 100% packet loss over 1 minute.

IGW 2: 85% packet loss over 1 minute.

IGW3: 95% packet loss over 1 minute.

Turns out "no active Firewall rules" just wasn't the case and explicit whitelisting is absolutely required.

But wait, there's more!

Created a hosted PostgreSQL instance, assigned a private subnet for creation.

SSH into my server, ping the URL of the created Postgres instance: The DB's IP is outside the CIDR range of the assigned subnet and unreachable.

What?

Deleted the instance, created another one, exact same settings. Worked this time around.

Support quality also varies extremely.

Out of 3 encounters, I had a competent person once.

Other two straight out said they have no idea what's going on.

ArtTimeInvestor1y ago

It is not possible to configure the setup without the graphical interface?

Are there cloud providers you prefer?

mkesper1y ago

You could use their cloud API https://api.ionos.com/docs/cloud/v6/ or e.g. terraform provider: https://docs.ionos.com/reference/config-management-tools/con... I don't have any practical experience with this provider, though.

j16sdiz1y ago

I use Scaleway for my EU cloud needs.

This is a very low usage toy server, can't speak for performance/cost.

usrme1y ago

This is probably out of left field, but what is the benefit of having a naming scheme for nodes without any delimiters? Reading at a glance and not knowing the region name convention of a given provider (i.e. Hetzner), I'm at a loss to quickly decipher the "<region><zone><environment><role><number>" to "euc1pmgr1". I feel like I'm missing something because having delimiters would make all sorts of automated parsing much easier.

BillFranklinOP1y ago

Quicker to type and scan! Though I admit this is preference, delimiters would work fine too.

Parsing works the same but is based on a simple regex rather than splitting on a hyphen.

euc=eu central; 1=zone/dc; p=production; wkr=worker; 1=node id

usrme1y ago

Thanks for getting back to me! Now that you've written it out, it's plainly obvious, but for me the readability and flexibility of delimiters beats the speed of typing and scanning. Many a times I've been grateful that I added delimiters because then I was no longer be hamstrung by any potential changes to the length of any particular segment within the name.

adastra221y ago

You can more easily double-click-select the full hostname when there are no delimiters.

stackskipton1y ago

Yea, not putting in delimiter and then us having to change our format has bitten me so many times. Delimiter or bust.

o11c1y ago

You can treat the numeric parts as self-delimiting ... that leaves only the assumption that "environment" is a single letter.

s3rius1y ago

That's a really good article. Actually, recently we were migrating as well and we were using dedicated nodes in our setup.

In order to integrate a load-balancer provided by hetzner with our k8s on dedicated servers we had to implement a super thin operator that does it: https://github.com/Intreecom/robotlb

If anyone will be inspired by this article and would want to do the same, feel free to use this project.

aliasxneo1y ago

I’m planning on doing something similar but want to use Talos with bare metal machines. I suspect to see similar price reductions from our current EKS bill.

threeseed1y ago

Depending on your cluster size I highly recommend Omni: https://omni.siderolabs.com

It took minutes to setup a cluster and I love having a UI to see what is happening.

I wish there were more products like this as I suspect there will be a trend towards more self-managed Kubernetes clusters given how expensive the cloud is becoming.

MathiasPius1y ago

I set up a Talos bare metal cluster about a year ago, and documented the whole process on my website. Feel free to reach out if you have any questions!

cedws1y ago

Any thoughts/feelings about Talos vs Bottlerocket?

MathiasPius1y ago

I've only used Bottlerocket in relation to EKS, and even then my interaction with it was pretty limited so I have no idea how it fares as a standalone operating system.

My one big experience with it was the recent bug which (as I recall) attempted to harden the system by marking memory pages as no-execute, which caused virtual runtime languages like Java to basically break entirely when running on a node using this version of Bottlerocket.

It was fixed pretty quickly, but it did feel like a weird thing to slip through...

p_l1y ago

Did bottlerocket get any closer to stable and usable outside AWS walled garden?

Last time I tried was admittedly in 2022, but in testing which distro to go with bottlerocket lost because we couldn't setup local builds...

sureglymop1y ago

I went hetzner baremetal, set up a proxmox cluster over it and then have kubernetes on top. Gives me a lot of flexibility I find.

bittermandel1y ago

We're very happy to use Hetzner for our bare-metal staging environments to validate functionality, but I still feel reluctant to put our production there. Disks don't quite work as intended at all times and our vSwitch setup has gotten reset more than once.

All of this makes sense considering the extremely low price.

Scotrix1y ago

Very nicely written article. I’m also running a k8s cluster but on bare metal and qemu-kvms for the base load. Wonder why you would chose VMs instead of bare metal if you looking for cost optimisation (additional overhead maybe?), could you share more about this or did I miss it?

BillFranklinOP1y ago

Thank you! The cloud servers are sufficiently cheap for us that we could afford the extra flexibility we get from them. Hetzner can move around VMs without us noticing but in contrast they are rebooting a number of metal machines for maintenance now and for the last little while, which would have been disruptive especially during the migration. I might have another look next year at metal but I’m happy with the cloud VMs currently.

karussell1y ago

Note, they usually do not reboot or touch your servers. But yes, the current maintenance of their metal routers (rare, like once every 2 years) requires you to juggle a bit with different machines in different datacenters.

mnming1y ago

I feel lots of the work described in the article can be automated by kops, probably in a much better way, especially when it comes to day 2 operations.

I wonder what is the motivation behind manually spinning up a cluster instead of going with more established tooling?

lucasrattz1y ago

We at Syself.com also have great experiences with Kubernetes on Hetzner. We built a platform on top of Cluster API and brought a managed Kubernetes experience to Hetzner. Now we have self-healing, automated updates and 100% reproductibility, with full bare metal support.

> Hetzner volumes are, in my experience, too slow for a production database.

That's true, though. To solve that we developed a way to persist the local storage of bare metal servers across reprovisionings. This way it's both faster and cheaper. Now we are adding an automated database deployment layer on top of it.

MuffinFlavored1y ago

https://github.com/puppetlabs/puppetlabs-kubernetes

What do the fine people of HN think about the size/scope/amount of technology of this repo?

It is referenced in the article here: https://github.com/puppetlabs/puppetlabs-kubernetes/compare/...

KaiserPro1y ago

Puppet's original design was that it was meant to be agent based on the things it was meant to configure. It was never very good at bringing up stuff before the agent could be connected.

The general flow was Imager->pre-configured puppet agent->connect to controller->apply changes to make it perform as x

originally it never really had the capacity to kick off the imaging/instantiation. THis meant that it scaled better (shared state is better handled than ansible)

However ansible shined because although it was a bastard to get running on more than a couple of hundred hosts in any speed, you could tell it to spin up 100x EC2(or equivalent) machines and then transform them into which every role that was needed. In puppet that was impossible to do in one go.

I assume thats changed, but I don't miss puppet.

bigfatkitten1y ago

It's not really that much of an embuggerance.

Kickstart or cloud-init to get the OS up and Puppet agent installed, then let Puppet do the rest.

mkesper1y ago

Honestly I was surprised to hear about puppet at all. Thought that was dead and buried, like chef.

bigfatkitten1y ago

Both are still extremely widely used.

czhu121y ago

Funnily enough, we made the exact same transition from heroku to DigitalOceans managed Kubernetes service, and saved about 75%. Presumably this means that had we moved from heroku to hetzner, it would have been 93% savings!

The costs of cloud hosting are totally out of control, would love to see more efforts that lets developers move down the stack.

I’ve been humbly working on https://canine.sh which basically provides a Heroku like interface to any K8 cluster

Neil441y ago

When I first started hosting servers/services for customers I was using EC2 and Rackspace, then I discovered Linode and was happy it was so much cheaper with apparently no downside. After the first couple of interactions with support I started to relax. Then I discovered OVH, same story. I haven't needed the support yet though.

acac101y ago

// Taking another slant at the discussion: Why kubernetes?

Thank you for sharing your experience. I also have my 3 personal servers with Hetzner, plus a couple VM instances in Scaleways (French outfit).

Disclaimer: I’m a Googler, was SRE for ~10 years for GMail, identity, social, apps (gsuites nowadays) and more, managed hundreds of jobs in Borg, one of the 3 founders of the current dev+devops internal platform (and I focused on the releases,prod,capacity side of the platform), dabbled in K8s on my personal time. My opinions, not Google’s.

So, my question is: given the significant complexity that K8s brings (I don’t think anyone disputes this) why are people using it outside medium-large environments? There are simpler and yet flexible & effective job schedulers that are way easier to manage. Nomad is an example.

Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

The emergence of tools like Terraform and the many other management layers in top of K8s that try to make it easier but just introduce more complexity and their own abstractions are in itself a sign of that inherent complexity.

I would say that only a few companies in the world need that level of complexity. And then they will need it, for sure. But, for most is like buying a Formula 1 to commute in a city.

One other aspect that I also noticed is that technical teams tend to carry on the mess they had in their previous “legacy” environment and just replicate in K8s, instead of trying to do an architectural design of the whole system needs. And K8s model enables that kind of mess: a “bucket of things”.

Those two things combined, mean that nowadays every company has soaring cloud costs, are running things they know nothing about but are afraid to touch in case of breaking something. And an outage is more career harming than a high bill that Finance will deal with it later, so why risk it, right? A whole new IT area has been coined now to deal with this: FinOps :facepalm:

I’m just puzzled by the whole situation, tbh.

KaiserPro1y ago

I too used to run a large clustered environment (VFX) and now work at a FAANG which has a "borg-like" scheduler.

K8s has a whole kit of parts which sound really grand when you are starting out on a new platform, but quickly become a pain when you actually start to implement it. I think thats the biggest problem, is by the time you've realised that actualy you don't need k8s, you've invested so much time into learning the sodding thing, its difficult to back out.

The other seductive thing is helm provides "AWS-like" features (ie fancy load balancing rules) that are hard to figure out unless you've dabbled with the underlying tech before (varnish/nginx/etc are daunting, so is storage and networking)

this tends to lead to utterly fucking stupid networking systems because unless you know better, that looks normal.

p_l1y ago

I'll put it this way:

Every time I try to use Nomad, or any of the other "simpler" solutions, I hit a wall - there turns out to be a critical feature that is not available, and which if I want to retrofit into them, will be a hacky one-off that is badly integrated into API.

Additionally, I don't get US-style budgets or wages - this means that cloud prices which target such budgets are horrifyingly expensive to me, to the point that kubernetes pays itself off at the scale of single server

Yes, single server. The more I make it fit the proper kubernetes mold, the cheaper it gets, even. If I need to extend something, the CustomResourceDefinition system makes it easy to use a sensible common API.

Was there a cost to learning it? Yes, but honestly not so bad. And with things like k3s deploying small clusters on bare metal became trivial.

And I can easily wrap kubernetes API into something simpler for developers to use - create paved paths that reduce the amount of what they have to know, provide, and that will enforce certain deployment standards. At lowest cost I have encountered in my life, funnily enough.

riku_iki1y ago

> Every time I try to use Nomad, or any of the other "simpler" solutions, I hit a wall - there turns out to be a critical feature that is not available

Maybe you could give example of feature in case of nomad?

p_l1y ago

I will give example of just few things that literally bought me lots and lots of savings in hours spent on working, that are all in use on "single server cluster":

1. Ingress and Service objects vs. Nomad/Consul Service Discovery + Templating

This one is big, as in really big thing. Ingress and Service API let me easily declaratively connect things with multiple implementations involved, and it's all handled cleanly with type-safe API.

For comparison, Nomad's own documentation tells you how to majorly use text templating to generate configuration files for whatever load balancer you decide to use, or use one of two they point to that have specific nomad/consul integration. And even for those, configuring specific application's connectivity happens though cumbersome K/V tags for apparently everything except port name itself.

You might consider it silly, but Ingress API with it's easy way to route different path prefixes to different services, or specify multiple external hosts and TLS, especially given how easily that integrates (regardless of used load balancer) with LetsEncrypt and other automated solutions, is an ability you're going to pick out from my cold dead hands.

Similarly the more pluggable nature of Service objects turns out critical when redirecting traffic to appropriate proxy, or doing things like exposing some services using one subsystem and others with another (example: servicelb + tailscale).

In comparison Nomad is like going back to Kubernetes 1.2 if not worse. Sure, I can use service discovery. It's very primitive service discovery where I have to guide the system by hand with custom glue logic. Meanwhile the very first kubernetes in production I set up had something like 60 Ingress objects setting up 250 domains which totaled about 1000 host/path -> service rules. And it was a puny two node cluster.

2. Persistent Storage handling

As far as I could figure out from Nomad docs, you can at best reuse CSI drivers to mount existing volumes to docker containers - you can't automate storage handling within Nomad, more or less you're being told to manually create necessary storage, maybe using terraform, then register it with Nomad.

Compared to this, Kubernetes' PersistentVolumeClaim system is a breeze - I specify what kinds of storage I provide through StorageClasses, then can just throw a PVC into definitions of whatever I am actually deploying. Setting up a new workload with persistent storage is reduced to me saying "I want 50G generic file storage and 10G database-oriented storage" (two different storage classes with real impact of performance/buck for both).

Could I just point to a directory? Sure, but then I'd have to keep track of those directories. OpenEBS-ZFS handles it for me and I can spend time on other tasks.

3. Extensibility, the dark horse of kubernetes.

As far as I know none of the "simpler" alternatives have anything like CustomResourceDefinition, or the very simple API model of Kubernetes that makes it easy to extend. As far as I understand Nomad's plugins are nowhere close to the same level of capability.

The smallest cluster I have currently uses following "operators" or other components usind CRDs: openebs-zfs (storage provisioning), traefik (easy trackable middleware configuration beyond unreadable tags approach), tailscale (also provides alternative Ingress and Service implementation), CloudNative PG (automated Postgres setup with backups, restores, easy access with psql, etc.), cert-manager (LetsEncrypt et all, in more flexible ways than embedded into traefik), external-dns (let's me integrate global DNS updates with my service definitions), k3s' helm controller (makes life easier in loading external software sometimes).

There's more but I kept to things I'm directly interacting with instead of all CRDs currently deployed. All of them significantly reduce my workload, all of them have either no alternative under Nomad or very annoying options (stuffing configuration for traefik inside service tags)

And last, some stats from my cluster:

  4, soon to be 5 or 6, "tenants" (separate namespaces), without counting system ones or ones that provide services like OpenEBS
  Runs 2 VPN services with headscale, 3 SSOs, one big java issue tracker, 1 Git forge (gitea, soon to get another one with gerrit), one nextcloud instance, one dumb webserver (using Caddy). Additionally runs 7 separate postgres instances providing SQL database for aforementioned services, postfix relays connecting cluster services with sendgrid, one vpn relay connecting gitea with VPN, some dashboards, etc.

And because its kubernetes, my configuration to setup for example new Postgres looks like this:

  local k = import "kube.libsonnet";
  local pg = import "postgres.libsonnet";
  local secret = k.core.v1.secret;
  {
    local app = self,
    local cfg = app.cfg,
    local labels = app.labels,
    labels:: {
      "app.kubernetes.io/name": "gitea-db",
      "app.kubernetes.io/instance": "gitea-db",
      "app.kubernetes.io/component": "gitea"
    },
    dbCluster: pg.cluster.new("gitea-db", storage="20Gi") +
      pg.cluster.metadata.withNamespace("foo") +
      pg.cluster.metadata.withLabels(app.labels) +
      pg.cluster.withInitDb("gitea", "gitea-db") +
      pg.cluster.withBackupBucket("gs://foo-backups/databases/gitea", "gitea-db") +
      pg.cluster.withBackupRetention("30d"),
   secret: secret.new("gitea-db", null) +
      secret.metadata.withNamespace("foo") +
      secret.withStringData({
        username: "gitea",
        password: "FooBarBazQuux",
        "credentials.json": importstr "foo-backup-gcp-key.json"
      })
  }

And this is older version that I haven't updated (because it still works) - if I were to setup the specific instance that it's taken from it would have even less writing.

bigfatkitten1y ago

> Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

Because it looks amazing on my CV and in my promo pack.

0xbadcafebee1y ago

Same reason they'll make 10 different microservices for a single product that isn't even 5K LoC. People chase trends because they don't know any better. K8s is a really big trend.

kakoni1y ago

Anybody running k3s/k8s on Hetzner using cax servers? How's that working?

lucasrattz1y ago

We are running them at https://syself.com for six months now, after adding support for it on our platform. So far so good.

james_sulivan1y ago

For those considering Hetzner, there is also Contabo which is another German hosting company that is also good, at least in my experience

devops0001y ago

Did you try Cloud66 for deploy?

cjr1y ago

What about cluster autoscaling?

BillFranklinOP1y ago

I didn’t touch on that in the article, but essentially it’s a one line change to add a worker node (or nodes) to the cluster, then it’s automatically enrolled.

We don’t have such bursty requirements fortunately so I have not needed to automate this.

preisschild1y ago

Works rather well. I use CAPI + Cluster-Autoscaler + Talos and new nodes are provisioned and ready within 2-3 minutes.

awinter-py1y ago

cut my kube bill 100% on GKE by switching from regional to zonal cluster bc the first zonal cluster is free

aravindputrevu1y ago

Do you know that they are cutting their free tier bandwidth? Did not read too much into it, but heard a few friends were worried about.

End of they day, they are a business!

lucasrattz1y ago

It seems to be only for the US-based servers. Sounds like they talked with a pricing consultant :p

segmondy1y ago

Great write up Bill!

Iwan-Zotow1y ago

this is good

well, running on bare metal would be even better

lucasrattz1y ago

You could take a look at https://syself.com - 100% support for bare metal (I'm an employee).

Iwan-Zotow1y ago

thanks, will look

postepowanieadm1y ago

Lovely website.

j / k navigate · click thread line to collapse

220 comments

adamcharnock1y ago

1) A staging cluster for testing updates is really a must. YOLO-ing prod updates on a Sunday is no one's idea of fun.

2) Application level replication is king, followed by block-level replication (we use OpenEBS/Mayastor). After going through all the Postgres operators we found StackGres to (currently) be the best.

5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).

6) MinIO clusters (via the operator) are not that tricky to operate as long as you follow the well trodden path. This provides you with local high-bandwidth, low-latency object storage.

7) The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.

8) You can still push ancillary/annoying tasks off onto cloud providers (personally I'm a fan of CloudFlare for HTTP load balancing).

[1]: https://lithus.eu

bigbones1y ago

> dedicated 10G network to connect your servers

adamcharnock1y ago

It is under their costing for 'additional hardware'[1]. You need to factor in the switch, uplink for each server, and the NIC for each server.

[1]: https://docs.hetzner.com/robot/general/pricing/price-list-fo...

nh21y ago

Hetzner does not charge for internal bandwidth.

bambambazooka1y ago

> 5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).

Are you willing to share example config for that part?

adamcharnock1y ago

I don't have one I can share publicly, but if you send me an email I'll see what I can do :-) Email is in my profile.

You'll need a bit of baseline networking knowledge.

bc569a80a344f9c1y ago

It's not rocket science, but it is complex, and building something complex you don't fully understand for production services can be a very bad idea.

lucasrattz1y ago

> The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.

Perhaps you could take a look at https://syself.com (Disclaimer: I'm an employee there). We built a platform that gives you production-ready clusters in a few minutes.

sureIy1y ago

> I'd put it at 2-4 months of undistracted skilled engineering time.

How much is that worth to your company/customer vs a higher monthly bill for the next 5 years?

As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all, unless I expect a 10k/month AWS bill.

xkcd comes to mind: https://xkcd.com/1319/

adamcharnock1y ago

> As a consultancy company, you want to sell that. As a customer, I don't see how that's worth it at all.

Well I do rather agree, but as a consultancy I'm biased.

The appeal of a consultancy is that we'll remove the up-front investment, provide the skills, de-risk the whole endeavour, even put engineers within your team, but you'll _only_ save 50%.

The latter option is much more appealing in terms of hiring, risk, and cash-flow. But if your company has the skills, the cash, and the risk tolerance then maybe the former approach is best.

nkmnz1y ago

2 more replies

tutfbhuf1y ago

I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.

From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.

In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.

supriyo-biswas1y ago

This is definitely some ChatGPT output being posted here and your post history also has a lot of this "While X, Y also does Z. Y already overlaps with X" output.

I'd like to see your breakdowns as well, given that the cost difference between a 2 vCPU, 4GB configuration (as an example) and a similar configuration on AWS is priced much higher.

There's also https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne... to reduce the operational burden that you speak of.

tutfbhuf1y ago

It is my ouput, but I use ChatGPT to fix my spelling and grammar. Maybe my prompt for that should be refined in order to not alter the wording too much.

redbell1y ago

1 more reply

lproven1y ago

> I use ChatGPT to fix my spelling and grammar

I have a better suggestion, which will save time, energy, money, and human work.

Don't.

Write it yourself. If you can't, don't post.

1 more reply

0xFF01231y ago

While I agree that your characterisation is true for a lot of chatgpt output, it can also be true for a human explaining their nuanced point of view.

ratg131y ago

Most humans don't say a couple sentences and then re-summarize them 3 more times unless they are speaking to someone with a learning disability.

MathMonkeyMan1y ago

I've never operated a kubernetes cluster except for a toy dev cluster for reproducing support issues.

A real kubernetes operator would have tooling in place to automatically upgrade certs and who knows what else. I imagine a company would have to pay such an operator.

_bare_metal1y ago

This.

It's great for the devs but I wouldn't want to operate a cluster.

[0]: https://www.BareMetalSavings.com

declan_roberts1y ago

That's just not how it works on any scale other than "toy"

MathMonkeyMan1y ago

Right, but certs get out of date unless somebody does something about it, that was my point.

KaiserPro1y ago

mfld1y ago

I saw that Hetzner is beta testing ceph-based object storage. This could make the setup much easier. Anyone tested this already?

sgarland1y ago

KaiserPro1y ago

Oh it probably is prod ready, I just wouldn't use it unless I had to (ie I had the staff to look after it and no money to buy something better)

Yes it has inline FEC, (https://www.ibm.com/docs/en/storage-ceph/7?topic=components-...) but its lots of layers to get to a file system.

Personally I'd have a redundant array of storage nodes and be done with it. Its easier to debug a single server than 3 layers of ceph weirdness.

freedomben1y ago

mountainriver1y ago

Manually updating k8s clusters is a huge tradeoff. I can’t imagine doing that to save a couple bucks unless I was desperate

TheDong1y ago

Originally it was ansible, and so spinning up a new node or updating all nodes was editing one file (k8s version and ssh node list), and then running one ansible command.

Now I'm using nixos, so updating is just bumping the version number, a hash, and typing "colmena apply".

Even migrating the k8s cluster from ansible to nixos was quite easy, I just swapped one node at a time and it all worked.

People are so afraid of just like learning basic linux sysadmin operations, and yet it also makes it way easier to understand and debug the system too, so it pays off.

dijit1y ago

if the "couple of bucks" ends up being the cost of an entire team, then hire a small team to do it.

Then get mad at them because they don't "produce value", and fold it into a developers job with an even higher level of abstraction again. This is what we always do.

lucasrattz1y ago

We at https://syself.com have made a platform with "one-click updates". 100% vanilla Kubernetes on Hetzner.

p_l1y ago

The "couple bucks" in my experience were difference between viable business and bankrupt one - including time spent on maintaining k8s!

spwa41y ago

Second if you have a "big" cost, whatever it is, bandwidth, disk space (essentially anything but compute), cost savings will dominate faster.

stackskipton1y ago

Sure, you can get away with legoing some K3S stuff together for a while but one major outage later, and that cost saving might have entirely disappeared.

srockets1y ago

More than that: the more you use, the more discounts you can get from a major CSP, which would also reduce the TCO for using a managed service.

UltraSane1y ago

Even a short outage can completely wipe out any savings.

kshri241y ago

Is it just me or do the last 3 paragraphs feel like ChatGPT output?

tutfbhuf1y ago

I used GPT4o to fix all my spelling and grammar mistakes, maybe it went a little too far, but this is 100% my comment

lproven1y ago

> this is 100% my comment

No, it is not.

runeks1y ago

Isn't the point of ChatGPT to mimic sentences written by humans?

perching_aix1y ago

Kind of. But which humans? It's a bit like how the average person doesn't exist, except in the LLM world, now it does.

murderfs1y ago

GPT-4 is, but ChatGPT is fine-tuned to emit sentences that get rated well (by humans, and by raters trained to mimic human evaluation) in a conversational agent context.

andai1y ago

Yeah, I was wondering the same thing.

dvfjsdhgfv1y ago

If you don't want to be bothered with the setup, you can use one of many modern packages such as Pigsty: https://pigsty.cc/ (not affiliated but a huge fan).

threeseed1y ago

There are plenty of options for running a database on Kubernetes whilst using local NVMe storage.

There are just pinning the database pods to specific nodes and using a LocalPathProvisioner or distributed solutions like JuiceFS, OpenEBS etc.

BillFranklinOP1y ago

Thanks, hadn’t heard of pigsty. As you say, I had to use nvme ssds for the dbs, the performance is pretty good so I didn’t look to get metal nodes.

lucasrattz1y ago

I've had great experiences with using the bare metal server's local storage.

This is the guide I wrote for our customers: https://syself.com/docs/hetzner/apalla/how-to-guides/storage...

gourneau1y ago

Thanks for the Pigsty link. I have been a big fan of running Postgres on metal machines.

mythz1y ago

[1] https://servicestack.net/posts/kamal-deployments

Voultapher1y ago

inemesitaffia1y ago

I don't see how traffic in Ashburn is more expensive than Frankfurt and Amsterdam.

It's the sort of place where people say Transit is cheaper than paid peering. (For eyeball networks at least).

I think carrying traffic from Europe for some images and videos might make sense financially. But there's always bulk CDN's

kuschku1y ago

> I don't see how traffic in Ashburn is more expensive than Frankfurt and Amsterdam.

The vast majority of Hetzner's traffic in europe (and tbh, anyone's traffic) is free peering. Telekom is the one major exception.

0xbadcafebee1y ago

Kubernetes is an entire car (and a complex one). It's really not worth doing the maintenance yourself, I promise you. Unless you're just doing it for fun.

theappsecguy1y ago

0xbadcafebee1y ago

p_l1y ago

Unless you find out that calling AA for everything and going to big name mechanics garage every time costs you significant portion of your budget.

A lot of it is finding balance between what to do yourself, what to outsource, and it's not as easy or clean as some people here like to claim.

infecto1y ago

wvh1y ago

Depends on one: how interested/motivated are you, now and down the line; and two: how likely is your dependency on a third party going to screw you over in the long run.

My opinion, from the viewpoint of a consultant often involved in Kubernetes, is to get initial help and a persistent help line, but get somebody internally interested enough to ride along and learn.

jonas211y ago

This is an interesting writeup, but I feel like it's missing a description of the cluster and the workload that's running on it.

How many nodes are there, how much traffic does it receive, what are the uptime and latency requirements?

And what's the absolute cost savings? Saving 75% of $100K/mo is very different from saving 75% of $100/mo.

jpgvm1y ago

In my experience noone bothers unless they are using GPUs or they are already at 100k/mo.

I do think 100k/mo is the tipping point actually, that is $1.2M/yr.

wrd831y ago

If you want to do HA kubernetes, you need oncalls and at least 10 engineers to get a stable rotation.

If you do that in Europe you have to pay them during standby hours.

400k/year seems very low to me.

jpgvm1y ago

You really don't need all 10 people on-call to know k8s to that level. They just need to know enough as to when to wake someone else up.

slillibri1y ago

You get what you pay for, and all that.

SoftTalker1y ago

Any free hosting service will be overwhelmed by spammers and fraudsters. Cheap services the same but less so, and the more expensive they are the less they will be used for scams and spams.

thwarted1y ago

Tragedy of the Commons Ruins Everything Around Me.

haroldp1y ago

UltraSane1y ago

AWS tries hard to keep its public IPs from getting on banlists.

oblio1y ago

They could put the backend on Hetzner, if it makes sense (for example queues or batch processors).

1 more reply

mzhaase1y ago

I had to try multiple floating IPs on hcloud before I got one that wasn't blacklisted on the k8s repos.

Keyframe1y ago

srockets1y ago

At 0.02$/GB, it rarely does.

wvh1y ago

There is no right answer here, just trade-offs.

Volundr1y ago

mkreis1y ago

not_elodin1y ago

I've used this to set up a cluster to host a dogfooded journalling site.

In one evening I had a cluster working.

preisschild1y ago

Ive also been using Cluster-API + Cluster-API-Provider-Hetzner

https://github.com/syself/cluster-api-provider-hetzner

works rock solid

maestrae1y ago

i recently read an article about running k8s on the oracle free tier and was looking to try it. i'm curious, are there any specific pain points that are making you think of switching?

Volundr1y ago

maestrae1y ago

Got it, thanks for the clarification! I’ll be using it for a personal project so that sounds great.

davidgl1y ago

I've got a couple of free arm machines setup as a cluster for learning k8 + a few LB in front of it. I use k3s, with pg rather than etcd. Been a great learning experience.

no_carrier1y ago

> While DigitalOcean, like other providers, offers a free managed control plane, there is typically a 100% markup on the nodes that belong to these managed clusters.

czhu121y ago

The actual nodes are still way more expensive on digital ocean than they are in Hetzner. That’s probably the main reason.

8GB RAM, shared cpu on hetzner is ~$10

Equivalent on digital ocean is $48

lucasrattz1y ago

Besides what czhu12 mentioned, DOKS charging extra for HA control planes make me feel as if the platform is not production-grade.

If you want a managed experience on Hetzner, you could take a look at https://syself.com

Disclaimer: I'm an employee there

1 more reply

chipdart1y ago

I loved the article. Insightful, and packed with real world applications. What a gem.

- vCPUs for bursty workloads,

- bare metal nodes for the performance-oriented workloads required as base-loads,

- on-premises nodes for spiky performance-oriented workloads, and dirt-cheap on-demand scaling.

What I believe will be the primary unknown is egress costs.

Has anyone ever toyed around with the idea?

mhuffman1y ago

For dedicated they say this:

>All root servers have a dedicated 1 GBit uplink by default and with it unlimited traffic.

>Inclusive monthly traffic for servers with 10G uplink is 20TB. There is no bandwidth limitation. We will charge € 1/TB for overusage.

lyu072821y ago

20TB egress on AWS runs you almost $2,000 btw. one of the biggest benefits of Hetzner

pdpi1y ago

jorams1y ago

They do have a fair use policy on the 1GBit uplink. I know of one report[1] of someone using over 250TB per month getting an email telling them to reduce their traffic usage.

[1]: https://lowendtalk.com/discussion/180504/hetzner-traffic-use...

chipdart1y ago

> We will charge € 1/TB for overusage.

It sounds like a good tradeoff. The monthly cost of a small vCPU is equivalent to a few TB of bandwidth.

adamcharnock1y ago

Of course you could always move the data-science compute workloads to the cluster, but my gut says that bringing the data closer to the people that need it would be the ideal.

threeseed1y ago

> Has anyone ever toyed around with the idea?

Sidero Omni have done this: https://omni.siderolabs.com

But at least it shows it's technically possible so maybe open source options exist.

SOLAR_FIELDS1y ago

The sibling comments recommendation, Nebula, does something similar with a slightly different approach.

1 more reply

chipdart1y ago

> They run a Wireguard network between the nodes so you can have a mix of on-premise and cloud within one cluster.

Interesting.

A quick search shows that some people already toyed with the idea of rolling out something similar.

https://github.com/ivanmorenoj/k8s-wireguard

nullify881y ago

I believe the Cilium CNI has this functionality built in. Other CNIs may do also.

sneak1y ago

Slack’s Nebula does something similar, and it is open source.

oblio1y ago

I'm a bit sad the aggressive comment by the new account was deleted :-(

The comment was making fun of the wishful thinking and the realities of networking.

It was a funny comment :-(

bdcravens1y ago

Enable "showdead" on your profile and you can see it.

rad_gruchalski1y ago

It wasn’t funny. I can still see it. The answer was vpn. If you want to go fancy you can do istio with vms.

ffsm81y ago

And if you wanna be lazy, there is a tailscale integration to run the cluster communication over it.

https://tailscale.com/kb/1236/kubernetes-operator

They've even improved it, so you can now actually resolve the services etc via the tailnet dns

https://tailscale.com/learn/managing-access-to-kubernetes-wi...

I haven't tried that second part though, only read about it.

2 more replies

jillesvangurp1y ago

Yes, there is some added value in the level of convenience provided. But maybe with a bit more competition, pricing could be more competitive. A lot more competitive.

surrTurr1y ago

tehlike1y ago

Additionally, hetzner has an IOPS limit of 5000 and write limit of some amount that does not scale with the size of database.

50G has the same limits as 5TB.

For this reason, people are sometimes using different table spaces in postgres for example.

Ceph puts another burden on top of already-ceph-based cloud volumes, btw, so don't do that.

p_l1y ago

RAID10 on local SSDs is pretty performant option, but yeah, it's per node.

merpkz1y ago

hipadev231y ago

Be careful with Hetzner, they null routed my game server on launch day due to false positives from their abuse system, and then took 3 days for their support team to re-enable traffic.

By that point I had already moved to a different provider of course.

danpalmer1y ago

teitoklien1y ago

where did you move, asking to keep a list of options for my game servers, i’m using ovh game servers atm

hipadev231y ago

I went back to AWS. Expensive but reliable and support I can get ahold of. I’d still like to explore OVH someday though.

teitoklien1y ago

1 more reply

ronsor1y ago

Reading comments from the past few days makes it seem like dealing with Hetzner is a pain (and as far as I can tell, they aren't really that cheaper than the competitors).

gurchik1y ago

> (and as far as I can tell, they aren't really that cheaper than the competitors)

Can you say more? Their Cloud instances, for example, are less than half the cost of OVH's, and less than a fifth of the cost of a comparable AWS EC2 instance.

lurking_swe1y ago

even free servers are of no use if it’s not usable during a product launch. :) You get what you pay for i guess.

But i do agree, it is much cheaper.

1 more reply

victorbjorklund1y ago

I don't think so. We see the outliers. Those happens at Linode, Digital Ocean, etc also. And yes even at Google Cloud and AWS you sometimes get either unlucky or unfairly treated.

jgalt2121y ago

> they aren't really that cheaper than the competitors

This is demonstrably false.

jjeaff1y ago

What competitors are similar to Hetzner in pricing? Last I checked, they seemed quite a bit cheaper than most.

Frotag1y ago

Forum for cheap hosts:

https://lowendtalk.com/

Wouldn't reccomend any of these outside of personal use though.

riku_iki1y ago

OVH is larger provider, servers usually not significantly more expensive than hetzner.

jacooper1y ago

Honestly hetzner supoort has bren outstanding from my experience. They are always there and very responsive using email

jpgvm1y ago

Needless to say when I encounter damn good support I stay even when it costs more.

1 more reply

esher1y ago

As far as I see, no one is mentioning sustainability AKA environmental impact or 'green hosting' here. Don't you care about that?

I believe that Hetzner data centers in Europe (Germany, Finland) are powered by green energy, but not the locations in US.

preisschild1y ago

They probably just use the local power grid. You can use ElectricityMaps to look up the average carbon intensity per kWh of those grids

https://app.electricitymaps.com/

kuschku1y ago

Hetzner is using 100% green hydro and wind power for that, which is as sustainable as any grid-connected company can be.

sofixa1y ago

> They probably just use the local power grid

A lot of EU datacenter providers specifically pick green electricity providers/sources, and pride themselves on it, and use it in advertising their sustainability.

Scaleway in particular are 100% no-CO2 (they have it easy, most of their DCs are in France where it's easy to be fully nuclear+renewable). Hetzner are the same.

huijzer1y ago

Data centers used 460 TWh, or about 2% of total worldwide electricity use, according to IEA in 2022.

In comparison, 30% of total energy (energy! Not electricity) goes to transport!

As another point of comparison, transport in Sweden in 2022 used 137 TWh [1]. So the same order of magnitude as total datacenter energy use.

And datacenters are powered by electricity which increases the chance that it comes from renewable energy. Conversely, the chance that diesel comes from a renewable source is zero.

[1]: https://www.energimyndigheten.se/en/energysystem/energy-cons...

davedx1y ago

Any energy use that's measurable in whole percentage points of global emissions needs addressing. That includes data centers.

alt2271y ago

> our country shouldn't care about emissions when China is the biggest emitter

To be fair, until China does something about their emissions, the rest of us are just pissing in the ocean.

3 more replies

huijzer1y ago

> Your argument is the same as those who argue "our country shouldn't care about emissions when China is the biggest emitter".

China and the US are in the same order of magnitude in emissions. So NO that's absolutely not the argument I am making.

> Any energy use that's measurable in whole percentage points of global emissions needs addressing

But it isn't! That's my point. Electricity use is about 20% of total energy use. So if we talk about global emissions, data center is only about 20% * 2% = 0.4% of total energy use.

And then if we talk about total emittance, it's even lower because 40% of electricity is generated from nuclear and renewables.

> just look at how much time it's taken for EV penetration to be in any way significant

Yes so let's focus on that instead of data centers. Data centers are not the problem!

1 more reply

esher1y ago

But on the other side, to bring down CO2 levels, fast change everywhere is required. As far as I see data center energy consumption continues to grow, specifically with AI.

If I am not mistaken, data centers produce more CO2 than aviation.

And sure, most 'green hosting' is probably 'green washing', yet I would still support and link initiatives such as: https://www.thegreenwebfoundation.org/

postepowanieadm1y ago

> I believe that Hetzner data centers in Europe (Germany, Finland) are powered by green energy, but not the locations in US.

Green lignite.

kuschku1y ago

While fans of nuclear energy like to meme about the German power grid, Hetzner is — in so far as anyone with a grid connection can be — powered by 100% green wind and hydro energy.

You can see the paperwork here:

- https://cdn.hetzner.com/assets/Uploads/oekostrom-zertifikat-...

- https://cdn.hetzner.com/assets/Oomi-sertifikaatti-tuuli+vesi...

esher1y ago

This does not apply to Hetzner US data centers, as far as I know. That's just for Germany and Finland.

Hetzner_OL1y ago

ArtTimeInvestor1y ago

Can anybody speak to the pros and cons of Hetzner vs OVH?

There ain't many large European cloud companies, and I would like to understand how they differentiate.

Ionos is another European one. Currently, it looks like their cloud business is stagnating, though.

Aachen1y ago

thenaturalist1y ago

I'd say stay clear of Ionos.

Bonkers first experience in the last two weeks.

Graphical "Data center designer", no ability to open multiple tabs, instead always rerouting to the main landing page.

Attached 3 IGWs to a box, all public IPs, GUI shows "no active firewall rules".

IGW 1: 100% packet loss over 1 minute.

IGW 2: 85% packet loss over 1 minute.

IGW3: 95% packet loss over 1 minute.

Turns out "no active Firewall rules" just wasn't the case and explicit whitelisting is absolutely required.

But wait, there's more!

Created a hosted PostgreSQL instance, assigned a private subnet for creation.

SSH into my server, ping the URL of the created Postgres instance: The DB's IP is outside the CIDR range of the assigned subnet and unreachable.

What?

Deleted the instance, created another one, exact same settings. Worked this time around.

Support quality also varies extremely.

Out of 3 encounters, I had a competent person once.

Other two straight out said they have no idea what's going on.

ArtTimeInvestor1y ago

It is not possible to configure the setup without the graphical interface?

Are there cloud providers you prefer?

mkesper1y ago

j16sdiz1y ago

I use Scaleway for my EU cloud needs.

This is a very low usage toy server, can't speak for performance/cost.

usrme1y ago

BillFranklinOP1y ago

Quicker to type and scan! Though I admit this is preference, delimiters would work fine too.

Parsing works the same but is based on a simple regex rather than splitting on a hyphen.

euc=eu central; 1=zone/dc; p=production; wkr=worker; 1=node id

usrme1y ago

adastra221y ago

You can more easily double-click-select the full hostname when there are no delimiters.

stackskipton1y ago

Yea, not putting in delimiter and then us having to change our format has bitten me so many times. Delimiter or bust.

o11c1y ago

You can treat the numeric parts as self-delimiting ... that leaves only the assumption that "environment" is a single letter.

s3rius1y ago

That's a really good article. Actually, recently we were migrating as well and we were using dedicated nodes in our setup.

In order to integrate a load-balancer provided by hetzner with our k8s on dedicated servers we had to implement a super thin operator that does it: https://github.com/Intreecom/robotlb

If anyone will be inspired by this article and would want to do the same, feel free to use this project.

aliasxneo1y ago

I’m planning on doing something similar but want to use Talos with bare metal machines. I suspect to see similar price reductions from our current EKS bill.

threeseed1y ago

Depending on your cluster size I highly recommend Omni: https://omni.siderolabs.com

It took minutes to setup a cluster and I love having a UI to see what is happening.

I wish there were more products like this as I suspect there will be a trend towards more self-managed Kubernetes clusters given how expensive the cloud is becoming.

MathiasPius1y ago

I set up a Talos bare metal cluster about a year ago, and documented the whole process on my website. Feel free to reach out if you have any questions!

cedws1y ago

Any thoughts/feelings about Talos vs Bottlerocket?

MathiasPius1y ago

I've only used Bottlerocket in relation to EKS, and even then my interaction with it was pretty limited so I have no idea how it fares as a standalone operating system.

It was fixed pretty quickly, but it did feel like a weird thing to slip through...

p_l1y ago

Did bottlerocket get any closer to stable and usable outside AWS walled garden?

Last time I tried was admittedly in 2022, but in testing which distro to go with bottlerocket lost because we couldn't setup local builds...

sureglymop1y ago

I went hetzner baremetal, set up a proxmox cluster over it and then have kubernetes on top. Gives me a lot of flexibility I find.

bittermandel1y ago

All of this makes sense considering the extremely low price.

Scotrix1y ago

BillFranklinOP1y ago

karussell1y ago

mnming1y ago

I feel lots of the work described in the article can be automated by kops, probably in a much better way, especially when it comes to day 2 operations.

I wonder what is the motivation behind manually spinning up a cluster instead of going with more established tooling?

lucasrattz1y ago

> Hetzner volumes are, in my experience, too slow for a production database.

MuffinFlavored1y ago

https://github.com/puppetlabs/puppetlabs-kubernetes

What do the fine people of HN think about the size/scope/amount of technology of this repo?

It is referenced in the article here: https://github.com/puppetlabs/puppetlabs-kubernetes/compare/...

KaiserPro1y ago

Puppet's original design was that it was meant to be agent based on the things it was meant to configure. It was never very good at bringing up stuff before the agent could be connected.

The general flow was Imager->pre-configured puppet agent->connect to controller->apply changes to make it perform as x

originally it never really had the capacity to kick off the imaging/instantiation. THis meant that it scaled better (shared state is better handled than ansible)

I assume thats changed, but I don't miss puppet.

bigfatkitten1y ago

It's not really that much of an embuggerance.

Kickstart or cloud-init to get the OS up and Puppet agent installed, then let Puppet do the rest.

mkesper1y ago

Honestly I was surprised to hear about puppet at all. Thought that was dead and buried, like chef.

bigfatkitten1y ago

Both are still extremely widely used.

czhu121y ago

The costs of cloud hosting are totally out of control, would love to see more efforts that lets developers move down the stack.

I’ve been humbly working on https://canine.sh which basically provides a Heroku like interface to any K8 cluster

Neil441y ago

acac101y ago

// Taking another slant at the discussion: Why kubernetes?

Thank you for sharing your experience. I also have my 3 personal servers with Hetzner, plus a couple VM instances in Scaleways (French outfit).

Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

I would say that only a few companies in the world need that level of complexity. And then they will need it, for sure. But, for most is like buying a Formula 1 to commute in a city.

I’m just puzzled by the whole situation, tbh.

KaiserPro1y ago

I too used to run a large clustered environment (VFX) and now work at a FAANG which has a "borg-like" scheduler.

this tends to lead to utterly fucking stupid networking systems because unless you know better, that looks normal.

p_l1y ago

I'll put it this way:

Was there a cost to learning it? Yes, but honestly not so bad. And with things like k3s deploying small clusters on bare metal became trivial.

riku_iki1y ago

> Every time I try to use Nomad, or any of the other "simpler" solutions, I hit a wall - there turns out to be a critical feature that is not available

Maybe you could give example of feature in case of nomad?

p_l1y ago

I will give example of just few things that literally bought me lots and lots of savings in hours spent on working, that are all in use on "single server cluster":

1. Ingress and Service objects vs. Nomad/Consul Service Discovery + Templating

This one is big, as in really big thing. Ingress and Service API let me easily declaratively connect things with multiple implementations involved, and it's all handled cleanly with type-safe API.

2. Persistent Storage handling

Could I just point to a directory? Sure, but then I'd have to keep track of those directories. OpenEBS-ZFS handles it for me and I can spend time on other tasks.

3. Extensibility, the dark horse of kubernetes.

And last, some stats from my cluster:

  4, soon to be 5 or 6, "tenants" (separate namespaces), without counting system ones or ones that provide services like OpenEBS
  Runs 2 VPN services with headscale, 3 SSOs, one big java issue tracker, 1 Git forge (gitea, soon to get another one with gerrit), one nextcloud instance, one dumb webserver (using Caddy). Additionally runs 7 separate postgres instances providing SQL database for aforementioned services, postfix relays connecting cluster services with sendgrid, one vpn relay connecting gitea with VPN, some dashboards, etc.

And because its kubernetes, my configuration to setup for example new Postgres looks like this:

  local k = import "kube.libsonnet";
  local pg = import "postgres.libsonnet";
  local secret = k.core.v1.secret;
  {
    local app = self,
    local cfg = app.cfg,
    local labels = app.labels,
    labels:: {
      "app.kubernetes.io/name": "gitea-db",
      "app.kubernetes.io/instance": "gitea-db",
      "app.kubernetes.io/component": "gitea"
    },
    dbCluster: pg.cluster.new("gitea-db", storage="20Gi") +
      pg.cluster.metadata.withNamespace("foo") +
      pg.cluster.metadata.withLabels(app.labels) +
      pg.cluster.withInitDb("gitea", "gitea-db") +
      pg.cluster.withBackupBucket("gs://foo-backups/databases/gitea", "gitea-db") +
      pg.cluster.withBackupRetention("30d"),
   secret: secret.new("gitea-db", null) +
      secret.metadata.withNamespace("foo") +
      secret.withStringData({
        username: "gitea",
        password: "FooBarBazQuux",
        "credentials.json": importstr "foo-backup-gcp-key.json"
      })
  }

And this is older version that I haven't updated (because it still works) - if I were to setup the specific instance that it's taken from it would have even less writing.

bigfatkitten1y ago

> Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

Because it looks amazing on my CV and in my promo pack.

0xbadcafebee1y ago

Same reason they'll make 10 different microservices for a single product that isn't even 5K LoC. People chase trends because they don't know any better. K8s is a really big trend.

kakoni1y ago

Anybody running k3s/k8s on Hetzner using cax servers? How's that working?

lucasrattz1y ago

We are running them at https://syself.com for six months now, after adding support for it on our platform. So far so good.

james_sulivan1y ago

For those considering Hetzner, there is also Contabo which is another German hosting company that is also good, at least in my experience

devops0001y ago

Did you try Cloud66 for deploy?

cjr1y ago

What about cluster autoscaling?

BillFranklinOP1y ago

I didn’t touch on that in the article, but essentially it’s a one line change to add a worker node (or nodes) to the cluster, then it’s automatically enrolled.

We don’t have such bursty requirements fortunately so I have not needed to automate this.

preisschild1y ago

Works rather well. I use CAPI + Cluster-Autoscaler + Talos and new nodes are provisioned and ready within 2-3 minutes.

awinter-py1y ago

cut my kube bill 100% on GKE by switching from regional to zonal cluster bc the first zonal cluster is free

aravindputrevu1y ago

Do you know that they are cutting their free tier bandwidth? Did not read too much into it, but heard a few friends were worried about.

End of they day, they are a business!

lucasrattz1y ago

It seems to be only for the US-based servers. Sounds like they talked with a pricing consultant :p

segmondy1y ago

Great write up Bill!

Iwan-Zotow1y ago

this is good

well, running on bare metal would be even better

lucasrattz1y ago

You could take a look at https://syself.com - 100% support for bare metal (I'm an employee).

Iwan-Zotow1y ago

thanks, will look

postepowanieadm1y ago

Lovely website.

j / k navigate · click thread line to collapse