I was surprised to find this on Hacker News, I wanted to wait until the stable release before posting on HN, but thank you for posting :)
This project is still in alpha stage, but please feel free to critique; I'd appreciate it.
Edit 1: After reading some of the comments, I want to clarify a few things:
- Because it is currently in the alpha stage, I do not host anything important on it.
- This is also my learning environment, I use Kubernetes in my day job.
- Yes it is overkill ;)
Edit 2: Wording
Mine is based around VMs using K3OS but I have also been looking at thinkstations and other cheap Intel-T CPU workstations.
How do you find disk performance and reliability?
Of course none of that is necessary for a self hosted home lab, but neither is gitops.
This is a very nice example of how to set stuff up properly.
OP, I would love to see Robusta (https://robusta.dev) as part of this too. It's definitely in line with your vision of automating everything, as it let's you automate the response to alerts and other events in your cluster. (Disclaimer: I'm one of the maintainers)
This was before kubernetes, arguably learning kubernetes is harder than implementing auto discovery and auto healing because you have to learn a lot about cluster semantics, the various network providers, service accounts, role based access control, integrations with secrets manager or the very disparate implementations of custom operators which may operate in your environment.
Kubernetes is a platform, I don’t doubt it solves a lot of problems. But it’s a complicated wee beastie, and I say that as a person who did it the easy way and used GKE on Google Cloud
It might not matter on a small scale, but on a large scale that's what makes it robust. It's also not trivial to implement a distributed version of this in house at scale. At the very least you need something like etcd at the core
Have you considered something like Tailscale so you can securely access it from outside your home? I've been thinking about spinning up my own home server that way, seeing as Tailscale makes it easy to securely access it from my phone when I'm out and about.
Exposing Home Assistant over it allows me to do things like turn my lights off when I'm away from home, without exposing stuff to the public internet. https://www.blog.akhil.cc/shelly-dimmer
Done right, you get a pretty simple second line of defence - you can validate the client has an appropriate certificate (running your own CA is pretty straightforward for small-scale home use or for a small group of users). Without such a certificate, users can't access the web service.
If your goal around a VPN is a second line of defence against the application's own authentication logic failing, client certificate authentication might be worth a look. If your threat model needs to cover a major issue in your web server, you might still want to stick with a VPN-based setup.
(You can of course do both, and bind services to an internal-only IP that you can only reach via a VPN, then have certificate auth on that too if you so desire)
I think I will wait until a C variant appears after they've settled.
I really like the service, works like a charm. Sometimes I use one of my servers as an exit node when I'm out as a VPN.
For a small setup thats the big thing, but for anything a little more it does key rotation, handles ips for you, and offers "magic dns" which makes all the devices get a nice DNS address when tailscale is on.
I actually did use to have a bunch of virtualization and kubernetes and crap, but got rid of it because it ate literally half the systems' resources, in a death by a thousand containers-type of way. There was also a lot of just jank, stuff was constantly breaking and I was always trying to piece together why from half a dozen out-of-date tutorials. Felt like herding cats.
The baremetal server was the easiest to set up, obviously, but the hardest to maintain. Where did I put that random config? Wait, why did this break when I upgraded X? Switching to Docker (and later Compose) wasn't that difficult, and made maintaining services much easier. Going to k8s has been challenging, mostly because of how I'm doing it - a controlplane and worker node on each physical node in Proxmox. Writing the YAML is, well YAML.
I'm mostly looking forward to not having to keep my VMs updated. I automated a lot of that (VMs are Packer templates) with Ansible, but it's still annoying. Upgrading k8s nodes is a lot easier, IMO.
Maybe I don’t know what I don’t know, but my setup works for me and I don’t really have any problems maintaining it so I figure why add all the complexity?
It always feels weird to see threads and threads of people talking about dozens of software programs I’ve never even heard of, let alone used. Maybe I’m living in the past but to me a “stack” is: OS, server, database, application. Like LAMP. Wonder when this changed!
It makes me curious about what kinds of stuff people do in their home networks that I never even considered doing.
1. Unless you have crazy dynamic scaling needs, there's literally no point in using k8s or nomad 2. Unless you want RBAC and fine tuned ACL, theres literally no point in using k8s or Nomad
I hate this idea that you need a Ceph cluster and a 3 node HA cluster to have a home lab.
Not saying don't do it because you want to play/test/learn any of those platforms, just that if you are going to run a server service maybe consider if you are doing it for learning a task orchestration service or just want to run XYZ reliably.
Agree that Ceph is ridiculously overkill though, I tried. Longhorn is much easier and perfectly fine for distributed storage.
I don’t want to build a good chunk of Kubernetes myself just to host my software. It’s nice to be able to Google for solutions to my problems or to ask on various forums/chat rooms (or in many cases, not have the problems in the first place because someone else committed the fixes to Kubernetes long ago).
Now I'm building a small platform to host a business I've started and I'm going back to ASGs with simple EC2 Instances.
There's really no need for the features Nomad/K8s offer at almost any scale outside of a few big companies.
I've had great success with Ansible+Proxmox. The proxmox CLI interface is good enough that you can basically do DIY terraform like setup straight against CLI without any agents / plugins etc. (I know declarative vs imperative but hopefully the comparison makes sense)
Lately I've been toying with local gitlab-CI building and deploying images directly into the environment so that any changes to the code redeploy it (to prod...this is home after all haha)
Considered terraform too but not a big fan of their mystery 3rd party plugin provider system
Sadly, I leaned this the hard way by setting up a 4-node k8s cluster and realizing bare-metal support is abysmal and the maintainers simply don’t care about you if you’re not running a cluster on AWS or GCP. Unfortunately my day job still involves k8s and people give me strange looks when I suggest metal so substantially less complex, cheaper, more performant, and easier to orchestrate and maintain.
A few months back I built a Nomad based cluster with Consul and was like: wow! This is super cool... but it's also expensive and complex.
I'm going to be building out the platform with an ALB, two EC2 Instances in an AutoScaling Group based on an AMI I bake with Ansible and Packer. I'll use Terraform for the entire infrastructure.
The release pipeline for content updates (it's an MkDocs/Material based book with videos blended in) will be me updating the AMI and firing off a `terraform apply`.
I won't be using Docker. I won't require or need an orchestration engine.
It's a static website delivered via a custom web server written in Go (as I need custom "Have you paid to view this content?" logic.) That's about as complex as it gets for now.
I'm actually looking forward to implementing it and watching as something so simple hopefully goes on to allow me to build and scale a business.
k8s and nomad are on my "maybe eventually later" list though.
I can only suspect how much time, trial and error this must've taken. This is my main issue with IaC. The concept really lends itself to any kind of modern infra, however I'm really put off by the sheer amount of time it takes me to whip out a bunch of Ansible playbooks and helper scripts, and on top of that make sure that all of them are idempotent.
Maybe I'm doing something wrong and this should be easy?
What's often overlooked (I believe) is that when you're doing this work in your day job you've for existing infra to support your infra along with other folks you can rely on to for help.
With home infra, you first must invent the universe (apologies to Carl). Having built 3 variations of home infra (Swarm on x64/Beelink, K3s on Pi, K3s on x64/Odroid) I've gained a strong admiration for anyone who take this on irregardless of there relative success.
What I've learnt over time is to add as little accidental complexity as is possible which I think is what you're getting at. One incarnation of the Pi K3s was provisioned by Ansible (on it's own stand alone Pi that the cluster would netboot from). Was Ansible better that imaging the usb drives manually and running a small script on each of the nodes? - probably a wash. I did however learn a bit of Ansible.
I agree with the sentiment of your comment, but when was this ever not the case, other than the days when you built your own OS and tooling from scratch?
Especially cloud setups that just run containers are relatively easy to get idempotent with terraform
While it's challenging up front I do enjoy being able to freshly install the latest version of Fedora, run my playbook and be more or less up and running.
It feels cleaner and more reliable (at least until this week when a broken selinux policy made it to stable) rather then trying to upgrade packages across major release versions in place.
<hat material="tinfoil"> If I've somehow acquired secret hidden malware or someone has opened up something to come back in later that's also going to get flushed out at least once every six months.
Also I believe the tag filter is malfunctioning: https://kubesail.com/templates-by-tag/Media
And you cannot do when you deploy tools worth millions of lines of code.
Complexity matters. Those popular products make sense only if you have a 20 engineers is your team or you don't care about reliability.
Also it's important to use software packaged in distros like Debian in order to receive security updates for many years.
Well done. I was not aware of the Cloudflare solution. Is this something someone can use, _with_ their Cloudflare Access offering, for personal dev/lab envs without breaking the bank?
I'm using it for personal use at the moment and I'm considering changing to a paid user to my friends and family can access Emby over the internet.
I did have a few dramas getting in to work for my LxC environments but nothing a quick Google resolved for me.
Are you just running it on a system behind a firewall/router/NAT-ed network device or on a terminating device itself?
But, as usual, far behind on my personal projects ...
If you're looking for some nice stuff to develop on an environment like this, check out VS Code and Google's cloud code addon: https://marketplace.visualstudio.com/items?itemName=GoogleCl... It's not totally tied to their GCE cloud offerings and is actually just a VS Code integration of their Skaffold tool (a fantastic inner dev loop tool for k8s). It works great with minikube or any other k8s cluster, presumably this one too. You can very quickly get up and running with code in your IDE now running and debugging on your k8s environment.
Although it's still incomplete and not one click yet, that's the direction I'm heading in: anyone can try my homelab on their PC, and if they want to use it on their real hardware, they can continue with the full install.
I have done some home lab going on over the years, and find that trying to do more on less W has been the most fun.
For a single server deployment docker-compose is very useful.
Regarding my client connection while away from home: If my client doesn't have a connection, cloud services don't work any better. But a lot of apps (e.g. bitwarden, some jellyfin clients, etc) are smart enough to work offline to some extent via operating on cashed resources.
What has been an obstacle is the availability of officially maintained Docker images for some of the components I've been wanting to use - afaict neither Argo CD nor Rook have official armv7/aarch64 images (though it seems Argo will release one soon).
Until then, I'll hold off on that pet project until I get my hands on a reasonably priced x86 SFF PC (the ThinkCentre M700 Tiny from TFA looks interesting!).
How many Rpi4 servers are we talking? I have never run Rook before (though I ran Ceph, when it was called deis-store way back before it was cool) and I always remember that you needed at least 3-5 servers with at least one dedicated disk for OSD per each to make it work well, if at all!
I'm looking at synology-csi right now for my homelab, and wondering if there is hope that I can stop using local-path-storage this way, it looks a little bit outdated, so I'm afraid that may keep me from running the latest version of Kubernetes on it... unless I hack the support into it myself :gasp: surely that's gonna void the warranty
Running you own registry is super simple.
This looks like a great setup by the author, but difficult to maintain in the long run without significant time investment.
It was a relief. Coupled with watchtower the updates are automatic except for the two services I really rely upon.
I used to be on Ubuntu for ages and moved to Arch a few days ago - my server only runs docker today.
I still want to keep access to a shell do it is Arch and not Rancher or something (I did not research much the bare metal hypervisors).
My maintenance is minutes every month if everything works fine, up to a max of an hour when Home Assistant broke things once two years ago.
DRP from bare metal is an hour.
Adding a service is a few minutes.
docker also adds a big attack surface.
The problem was, at some point Debian stable got so distant from modern infrastructure you had to patch it from the beginning with newer versions, so reluctantly I switched to testing (that was years ago). I was surprised to find out things were working just fine.
The problem is today everybody distributes software in their own way. Especially web software - native packages are more and more rare. So automatic updates that work are indeed a challenge.
Doing just the fun parts of self-hosting will not get you to infrastructure.
I'd love to prove Moxie Marlinspike wrong that "people don't want to run their own servers, and never will." (https://moxie.org/2022/01/07/web3-first-impressions.html) This is the key bottleneck in getting people to run their own servers.
All this work is nice and beautiful, the problem will come when you try to update different components.
It amazes me how much of the internet doesn’t run on a single node, when it probably could.
If anyone has one handy, I'd appreciate a link.
Having my own personal CloudBox has allowed me to experiment and fail fast. I am ahead in experience and knowledge than the rest of my team as a result. I have a tool they do not. I realize it would be better if we all had a tool like this.
So that’s the pitch, a single node “cloud in a box’, the CloudBox for IT students or professionals to learn any aspect of IT.
Now I just need lots more time to actually turn my prototype into a product.
There's flatcar and k3os and fedora coreos and talos and lokomotive. There are maybe a dozen others as well, but those are the ones I know something about.
The real problem is that the orchestration of PXE boot, DHCP, name services, server registration, cluster bootstrapping, while simultaneously building a light distribution that makes the right calls on persistence, security, etc. is just *really hard*.
I took at a stab at it myself (failed startup) and have a custom distribution based on alpine, but the amount of work to go from there to everything is so large that it's tough to take on if you're small (and there is the constant desire to go enterprise because of the money)
I'd be satisfied with a home-user-oriented manual tutorial, like install Debian with these packages, a nftables setup that firewalls everything but these 3 ports, how to setup auto-update, turn off root & password-only logins, and general things to be reasonably secure; as well as tips for on-going maintenance and so on.
Also, when you bump the image tag in a git commit for a given helm chart, how does that get deployed? Is it automatic, or do you manually run helm upgrade commands?
When you make a change in git, it is automatically deployed without the need for human intervention.
Good first version, i am excited for the beta!
I knew I’d want to change the LAN architecture and the services, and everything; but it was definitely intended as a learning experience for me.
Kinda looking forward to doing it over again with my new/refreshed skillset. Automated everything.
> For a homelab it seems severely overkill
Isn't that the point of homelab? ;)
For some people (including me), the risk involved in any SaaS product suddenly either dropping or imposing unworkable restrictions on free tier is high enough to make the extra work involved in self-hosting worth it. (Granted, my current self-hosting setup is a lot simpler than the one described in this article, but even if mine were more complex I would still say the same thing.)
Most free tiers requires a card on file. I shudder to think at having to check dozens or hundreds of such services regularly to make sure they have not sent out an email about how we are now charging you. Then I don't notice for 6 months and find out I spend $150 on something I could have setup myself in less than an hour that will never charge me.
I have had hundreds of such products subscribed at work. At least every year or so an entire day gets burnt updating credit card info from logging in and looking up the bespoke way every single site requires the updates to be done for payments.
The services that suddenly shut their doors and again require me to research and setup an alternative.
Services get acquired all the time then switch to pay only and they have your CC on file already for easy billing.
Do you experiment with your tech stack? Swapping things in and out to see which apps are best?
I think the OP discovered my homelab through that post.
The nice thing about Kubernetes is that it’s probably not much harder (if at all) than the alternative (assuming you already know which tools to use to compose your bespoke platform) and you can easily find online resources and support (never mind all of the problems you don’t run into because they’re already solved in Kubernetes).
Or just a total DIY xen or KVM hypervisor on debian or centos approach. But with considerably more RAM in the host machine than 16GB (like 64-128GB).
the kubernetes sort of makes sense if the purpose is also for the person who owns/operates it to test kubernetes things, learn kubernetes, or use it as a home lab of some larger similar environment that they have elsewhere.
Or just install kubevirt on the existing cluster and manage kvm virtual machines with k8s
But Kubernetes exists for a reason and makes a lot of things easier.
If you are familiar with the ecosystem, k3s is a great foundation for self-hosted setups.
Especially thanks to the many helm charts and operators that exist.
There are 19 stacks in this repository. 19 pieces of software that require their own maintenance, JUST TO RUN YOUR APPLICATIONS! The amount of extra work required just to host the software that views your pictures, plays your videos, and allows chat with people is absolutely insane.
I'm not talking about this particular repo; it's just indicative of how complicated things have become that you must do the equivalent of building and maintaining your own operating system just to get even simple things done. And I belive that it's unnecessarily complicated (once again, not this repo, but rather the state of affairs in general). We're at an inflection point in the industry, and haven't come out the other side yet.
Bash as infrastructure+maintenance is almost like SQL for data.
Everyone agrees is not fantastic, many have tried to improve and yet SQL + BASH is still around chunking along doing their unsexy things just good enough :)
Never underestimate software that is 'just good enough' :)
Actually, I would find that not boring :)
Maybe it's unnecessary for tiny companies, but if you're dealing with infrastructure at scale then the complexity is unavoidable. The only question is whether you want to invent it in house or use a robust system that someone else built properly.
I elaborated more in my comment below
Systemd units and a bit of sh is enough if you just want your applications.
There's tools like Yunohost [0] that can save a lot of work and are focused on making the subsequent maintenance burden easier too. There's an application catalogue [1] with a lot of popular self-hosted apps to choose from and new ones being packaged all the time. I haven't used it myself yet, but hear very good experiences from people that have. Especially those that lack the more sysadmin tech skills and just wanna click-install & go.
There's also Freedombox [2]. I'd be interested to know about other such tools and your experience with them.
Now we've come full circle back to the bad old days where you need an entire team of dedicated people and arcane knowledge just to run your application software again.
I suspect that this will continue until we reach a point where the big players out of necessity come to an agreement for a kind of "distributed POSIX" (I mean in spirit, not actual POSIX). These are exciting times living on the edge of a paradigm shift, but also frustrating!
If your only goal is to serve Nextcloud, Plex or whatever to your family, you can get away with much less than that.
Plugging together all these different tools has become so much work though that in many organizations the platform team(s) who are mainly occupied with doing just that take up a lot of engineering resources.
In my opinion the next evolutionary step would be for all of this to be bundled and abstracted away. Funnily enough, we pretty much have that product already with autoscaled "serverless" cloud services, GitHub Actions/Azure DevOps pipelines etc.
The biggest problem is probably that things like AWS Lambda, SQS, and DDB lack versatility and user friendliness. If we get some improvements on that front, many organizations might opt for that instead of dealing with their own K8s deployments. Even better would be if we had something like a stripped-down version of OpenStack, just focussed on doing "serverless" right, and rebuilt from the ground up.
if this is for a home lab where any one of the services run on it are not actually going to affect you if it goes belly up? or the whole host machine? sure, okay, but that's self hosting a home lab, not self-hosting actual infrastructure...
clearly the hardware shown in the image is meant to be small, not noisy, and not take up a lot of space, and operate in somebody's house.
but the people I know who seriously self host all their stuff have it on something like a few old Dell R620 1U servers with 1+1 dual hot swap power supplies, RAID-1 for the operating system boot, RAID-1 or RAID-6 (or something like mdadm raid6) for storage drives (again all hot swap capable), etc.
note that I am not talking about a lot of money here, you can buy the hardware I described in the preceding paragraph for $300 on eBay and then add your own choice of SSDs.
and not in a house, but in something like a small to medium sized ISP's colocation environment, with UPS, generator backed power, etc. also attached to network equipment and a DIA circuit that's considerably more serious than a tp-link unmanaged switch attached to a residential internet service.
I setup something like this to control some farm equipment when I was in college 20 years ago. The farmers’ son called me a few months ago because they were having an issue with the equipment. Turned out oily dust coated the thing and got inside - I had him pop the device open (old Wyse box running Slackware iirc) clean up the goop and hit it with rubbing alcohol. He dried it and everything is working again.