That sense of the term isn't loaded with any specific notion of how attack surfaces should work. I think modern "Docker"'s security properties are underrated†. But you still can't run multitenant workloads from arbitrary untrusted tenants on shared-kernel isolation. It turns out to be pretty damned useful to be able to ship application components as containers, but have them run as VMs.
If you’re building a system that’s handling classified information, there is probably not an accreditation authority in the world that would let you use containers or even hypervisors as a way to separate different information classes.
Other implementation like podman get even better security by not running as root.
100% agree.
The docker/CRI-de-jour (by default) strips off many "dangerous" system capabilities. By default a pid on linux gets something like over one hundred system capabilities, and most container runtimes strip that down to around 50. Those number are not exact.
Stripping down the system level capabilities of your workload is assuredly a security improvement over running that workload "bare metal" on the system.
Ref: https://www.redhat.com/en/blog/secure-your-containers-one-we...
All those bargain basement OpenVZ "VPS" providers beg to differ :)
There's also gVisor
What I find interesting, is that many uses of containers are just reinventing statically linked binaries in a more complicated form.
The answer to this is a container. Containers don't "reinvent" static linking, they solve problems that go beyond static linking.
I have found that to be true in at least one case—I had built a custom DNS server in Go (statically linked), and originally planned to run it in a container, but on further reflection realized the container brought no added value, and it was much simpler to write a systemd service control script than to bring in the extra baggage of a container ecosystem to run the DNS server.
What containerization enables is that it allows you to confer some of the advantages of static linking to languages and libraries that don’t natively support it.
A Docker image is really just a chroot + some cgroups resource limits.
For example resources isolation with the Solaris / Illumos container implementation (zones) works just as well as full blown virtualization. You are just as well equipped to handle noisy neighbors with zones as you are with hardware VM's.
> Much as you’d likely choose to live in a two-bedroom townhouse over a tent, if what you need is a lightweight operating system, containers aren’t your best option.
So I think this is true for Docker but doesn't really do justice to other container implementations such as FreeBSD jails and Solaris / Illumos zones. Because those containers are really just lightweight operating systems.
In the end Docker started out and was designed to be a deployment tool. Not necessarily an isolation tool in all aspects. And yeah, it shows.
One could argue that zones are distinct from containers (a Linux implementation), with both being OS specific versions of jails.
Containers won dev mindshare because of ease packaging and distribution of the artifacts. Somehow it is Docker, not VM vendors came up with a standard for packaging, distributing and indexing for glorified tarballs and it quickly picked up.
IMO the important, catalyzing difference is that Docker containers have a standard interface for logging, monitoring, process management, etc which allow us to just think in terms of “the app” rather than the app plus the SSH daemon, log exfiltration, host metrics daemon, etc. In other words, Docker got the abstraction right: I only care about the app, not all of the ceremony required to run my app in a VM. These common interfaces allow orchestration tools to provide more value: they aren’t just scheduling VMs, they’re also managing your log exfiltration, your process management, your SSH connection, your metrics, etc, and all of those things are configurable in the same declarative format rather than configuring them with some fragile Ansible playbook that requires you to understand each of the daemons it is configuring, possibly including their unique configuration file/filesystem conventions and syntaxes.
Packaged VM's existed for a while already with thing like Vagrant on top, there was also already LXC which leaned more into the VM concept. Where Docker made the difference imho is with Dockerfiles and the layered/cached build steps.
Calling container images glorified tarballs is like calling cars glorified lawnmowers.
Because with an apartment each tenant gets to share certain infrastructure like heating and plumbing from the apartment building, just like containers get to the share things from the Linux host they run on. In the end both houses and apartments protect you from outside guests, just in their own way.
I went into this analogy in my Dive into Docker course. Here's a video link to this exact point: https://youtu.be/TvnZTi_gaNc?t=427, that video was recorded back in 2017 but it still applies today.
> Tents, after all, aren’t a particularly secure place to store your valuables. Your valuables in a tent in your living room, however? Pretty secure.
Containers do provide strong security features, and sometimes the compromises you have to make hosting something on a VM vs. a container will make the container more secure.
> While cgroups are pretty neat as an isolation mechanism, they’re not hardware-level guarantees against noisy neighbors. Because cgroups were a later addition to the kernel, it’s not always possible to ensure they’re taken into account when making system-wide resource management decisions.
Cgroups are more than a neat resource isolation mechanism, they work. That's really all there is to it.
Paranoia around trusting the Linux kernel is unnecessary if at the end of the day you end up running Linux in production. If anything breaks, security patches will come quick and the general security attitude of the Linux community is improving everyday. If you are really paranoid, perhaps run BSD, use grsec, or the best choice is to use SELinux IMO.
If anything, you will be pwned because you have a service open to the world, not because cgroups or containers let you down.
The author seems to largely ignore this. I would consider that a bit stronger than a "tent wall". Comparing it to a tent seems more akin to a plain chroot.
If I have my tent right next to someone else, I can trivially "IPC" just speaking out loud which would be prevented by an IPC namespace (which is Docker's current default container setup)
Also worth mentioning, turning a container into a VM (for enhanced security) is generally easier than trying to do the opposite. AWS Lambda basically does that as do a lot of the minimal "cloud" Linux distributions that just run Docker with a stripped down userland (like Container Linux and whatever its successors are)
I don't think virtualization really offers hardware-level guarantees against noisy neighbours either.
I agree somewhat but there has been significant progress to sandbox containers with the same security we'd expect from a VM. It isn't a ridiculous idea that VMs will one day be antiquated, but probably won't happen for a few more years.
https://kubernetes.io/docs/concepts/policy/pod-security-poli...
> As of Kubernetes v1.19, you can use the seccompProfile field in the securityContext of Pods or containers to control use of seccomp profiles.
If you're looking for a more general abstraction, there is gVisor and others as well.
I imagine CPU and memory namespaces coming implemented on hardware isolation features like VT-d io-mmus and alike thus making virtual machines integrated into some sandboxing feature.
Marketing. Because of Marketing.
The only reason I use Docker is that I can access the system design knowledge that is available with docker-compose.yml's. Last example: Gitlab. Could not get it running on unrivileged LXC using the official installation instructions, with Docker it was simply editing the `.env` and then `docker-compose up -d`. All of this in a local, non-public (ipsec-distributed) network. I often find myself creating a single, separate unprivileged LXC container->Docker nesting for each new container, because I do not need to follow the complicated and error prone installation instructions for native installs.
You should never login to the shell of a container for config. All application state lives elsewhere, and any new commit to your app triggers a new build.
If that's not for you, then while containers like proxmox/LXC can still be handy, you're just doing VM at a different layer.
The article was a bit hand wavey about how "they" complain about containers, and then uses the analogy more than explaining the problems and solutions.
I absolutely do this and think it works great. Fedora has built a tool called "toolbox" which is basically a wrapper on podman which can create and enter containers where you can install development tools without touching your actual OS.
I basically do all of my development inside a container where the source code is volume mounted in but git/ruby/etc only exist in the container.
This has the benefit of letting me very quickly create a fresh env to test something out. Recently I wanted to try doing a git pull using project access tokens on gitlab and containers let me have a copy of git which does not have my ssh keys in it.
This is somewhat of an edge case though, for a server deployment, yes you shouldn't rely on anything changed inside the container and should use volume mounts or modify the image.
I'd be interested in trying it out but I don't want to spend some hours reading documentation trying to get it working.
Become root.
Install debootstrap, which is in the Debian and Ubuntu repositories, at least.
Make a directory to contain your embedded system. It can be anywhere. Let's use /var/lib/machine/machinename.
This command will install a new, minimal Debian system in that directory:
debootstrap --include=systemd-container stable /var/lib/machines/machinename http://deb.debian.org/debian
It will download everything and, if I recall correctly, works unattended (doesn’t ask questions).
Enter the container with
systemd-nspawn -D /var/lib/machines/machinename/
and set the root container password with passwd.
Then do
echo 'pts/0' >> /etc/securetty
so the guest OS will let you log in after it's booted up. You may have to add other pts/x entries. I'm not sure about this part; it may be that if there is no /etc/securetty file that there is no problem.
Now log out of the container.
To boot up the guest OS, use
systemd-nspawn -b -D /var/lib/machines/machinename
You will see the familiar console messages.
You will find advice on the web to include the -U flag here, which causes files in the guest OS to only use UIDs known to the guest OS when determining ownership and permissions. This leads to headaches, because you have to set the ownership of any file you copy in from the host system. Leave it out, and you can have parallel users on the host and guest OSs, which is more convenient. But you may have to change the UIDs of the users on the guest OS so that they match.
Now, on the host OS, you can use the `machinectl` command to control all your guest OSs. `machinectl list` shows you what’s running, `machinectl login` lets you log in to them, and there are several commands for killing them with various levels of violence.
If you want your machine to be a long-running service, just `nohup` the spawn command, and direct output as desired.
If you want to be able to communicate with your machine from the internet, opening sockets from within the guest OS works, as they share the network interface. For a public-facing web service, you can install (for example) Apache and pick a port number to listen on, then set up a reverse proxy on the host OS, using a dedicated domain or a subdomain, so the users don’t have to use the custom port number. I’ve found that certificates for HTTPS need to be installed on both the host and guest OSs.
Good luck!
More information: https://wiki.debian.org/nspawn
Virtualizing the kernel like the Amazon Machine Image (AMI) virtualizes a chip core sounds great. But now, in the "puff", all of those networking details that AWS keeps below the hypervizor line confront us. Storage, load balancing, name services, firewalls. . .
Containers can solve packaging issues, but wind up only relocating a vast swath of other problems.
An analogy can go a long way. Both ways.
So, assuming I understood correctly, treating them like tents is infinitely the better choice.
What makes a given component – server, vm, container, whatever – is not the runtime, but how you deal with it when it gets seriously ill. Pets are taken to the vet or hospital to get treatment. Cattle are.. well, read the article I linked.