That sense of the term isn't loaded with any specific notion of how attack surfaces should work. I think modern "Docker"'s security properties are underrated†. But you still can't run multitenant workloads from arbitrary untrusted tenants on shared-kernel isolation. It turns out to be pretty damned useful to be able to ship application components as containers, but have them run as VMs.
If you’re building a system that’s handling classified information, there is probably not an accreditation authority in the world that would let you use containers or even hypervisors as a way to separate different information classes.
Other implementation like podman get even better security by not running as root.
The win of virtualization is that the machinery required to hypervise a kernel is much, much smaller than the kernel itself; to use the 70s terminology, it's a minimized trusted computing base.
100% agree.
The docker/CRI-de-jour (by default) strips off many "dangerous" system capabilities. By default a pid on linux gets something like over one hundred system capabilities, and most container runtimes strip that down to around 50. Those number are not exact.
Stripping down the system level capabilities of your workload is assuredly a security improvement over running that workload "bare metal" on the system.
Ref: https://www.redhat.com/en/blog/secure-your-containers-one-we...
All those bargain basement OpenVZ "VPS" providers beg to differ :)
There's also gVisor
What I find interesting, is that many uses of containers are just reinventing statically linked binaries in a more complicated form.
The answer to this is a container. Containers don't "reinvent" static linking, they solve problems that go beyond static linking.
The problems they solve might go beyond "static linking", but are accidental complexity problems that doesn't go beyond namespacing.
With proper namespacing support, one could trivially build "with GCC 11 and its associated standard libraries and toolchain" on a CI service that ships with "an older version of GCC".
Ideally, one would just need a local folder with the GCC11 and its dependencies, and at most an ENV entry for where to pick up deps (optimally not even that, the GCC11 binary should give precedence to the local versions within the same folder by default).
But, no, instead we need to juggle 20 folder locations, PATH and EVN variables, burn-in build paths into libraries and executables, and so on...
This.
I would also point out that containers handle also other critical features like it's virtual network, which obviously goes way beyond what may be simplistically described as "static linking".
Containers as a form of static linking means that you ship one thing to prod and it has everything you need locally in it and it can’t be changed without you releasing a new one thing. If someone else upgrades MySQL client version on the host, your code keeps using the version you tested with, like a static binary or like a Python venv with pinned versions or vendored dependencies. It is a lot simpler to manage dependencies this way; downside is if you one of your dependencies has a security advisory, it can’t be updated by rolling out a new version by someone else. You have to update it, so unowned code becomes more expensive in that scenario.
I have found that to be true in at least one case—I had built a custom DNS server in Go (statically linked), and originally planned to run it in a container, but on further reflection realized the container brought no added value, and it was much simpler to write a systemd service control script than to bring in the extra baggage of a container ecosystem to run the DNS server.
With containers you can trivially ensure those are always present and with the correct versions.
Plus containers do give you some security benefits when compared to running natively.
What containerization enables is that it allows you to confer some of the advantages of static linking to languages and libraries that don’t natively support it.
Not really. It seems the keyword "static linking" is being abused to refer to stand-alone executables, because that's what some people know. Yet, calling containers a kind of "static linking" is simplistic and incorrect, even taking the standalone executable interpretation info account.
If anything, container images are installers, and containers are the end-result of installing and configuring these containers, which is barely noticed because it works so well even and specially the networking part. More importantly, containers are designed to be both ephemeral and support multiple instances running in parallel on the same machine.
Then there's also the support for healthchecks, which allows container engines to not only determine when they should regenerate containers, but also provides out-of-the-box support for blue-green deployments.
And absolutely none of this fits the "static linking" metaphor.
A Docker image is really just a chroot + some cgroups resource limits.
No, because an image specifies nothing about the runtime. Just add a Kernel and bootloader and one can boot most images. Further most container runtimes include a lot more than chroot and resource limits. Namespace isolation (process, user, network), seccomp rules, SELinux contexts, etc.
That's simply doing it wrong! If people are doing this, you can't point to containers as the problem.
For example resources isolation with the Solaris / Illumos container implementation (zones) works just as well as full blown virtualization. You are just as well equipped to handle noisy neighbors with zones as you are with hardware VM's.
> Much as you’d likely choose to live in a two-bedroom townhouse over a tent, if what you need is a lightweight operating system, containers aren’t your best option.
So I think this is true for Docker but doesn't really do justice to other container implementations such as FreeBSD jails and Solaris / Illumos zones. Because those containers are really just lightweight operating systems.
In the end Docker started out and was designed to be a deployment tool. Not necessarily an isolation tool in all aspects. And yeah, it shows.
One could argue that zones are distinct from containers (a Linux implementation), with both being OS specific versions of jails.
Containers won dev mindshare because of ease packaging and distribution of the artifacts. Somehow it is Docker, not VM vendors came up with a standard for packaging, distributing and indexing for glorified tarballs and it quickly picked up.
IMO the important, catalyzing difference is that Docker containers have a standard interface for logging, monitoring, process management, etc which allow us to just think in terms of “the app” rather than the app plus the SSH daemon, log exfiltration, host metrics daemon, etc. In other words, Docker got the abstraction right: I only care about the app, not all of the ceremony required to run my app in a VM. These common interfaces allow orchestration tools to provide more value: they aren’t just scheduling VMs, they’re also managing your log exfiltration, your process management, your SSH connection, your metrics, etc, and all of those things are configurable in the same declarative format rather than configuring them with some fragile Ansible playbook that requires you to understand each of the daemons it is configuring, possibly including their unique configuration file/filesystem conventions and syntaxes.
Packaged VM's existed for a while already with thing like Vagrant on top, there was also already LXC which leaned more into the VM concept. Where Docker made the difference imho is with Dockerfiles and the layered/cached build steps.
Calling container images glorified tarballs is like calling cars glorified lawnmowers.
Because with an apartment each tenant gets to share certain infrastructure like heating and plumbing from the apartment building, just like containers get to the share things from the Linux host they run on. In the end both houses and apartments protect you from outside guests, just in their own way.
I went into this analogy in my Dive into Docker course. Here's a video link to this exact point: https://youtu.be/TvnZTi_gaNc?t=427, that video was recorded back in 2017 but it still applies today.
> Tents, after all, aren’t a particularly secure place to store your valuables. Your valuables in a tent in your living room, however? Pretty secure.
Containers do provide strong security features, and sometimes the compromises you have to make hosting something on a VM vs. a container will make the container more secure.
> While cgroups are pretty neat as an isolation mechanism, they’re not hardware-level guarantees against noisy neighbors. Because cgroups were a later addition to the kernel, it’s not always possible to ensure they’re taken into account when making system-wide resource management decisions.
Cgroups are more than a neat resource isolation mechanism, they work. That's really all there is to it.
Paranoia around trusting the Linux kernel is unnecessary if at the end of the day you end up running Linux in production. If anything breaks, security patches will come quick and the general security attitude of the Linux community is improving everyday. If you are really paranoid, perhaps run BSD, use grsec, or the best choice is to use SELinux IMO.
If anything, you will be pwned because you have a service open to the world, not because cgroups or containers let you down.
The author seems to largely ignore this. I would consider that a bit stronger than a "tent wall". Comparing it to a tent seems more akin to a plain chroot.
If I have my tent right next to someone else, I can trivially "IPC" just speaking out loud which would be prevented by an IPC namespace (which is Docker's current default container setup)
Also worth mentioning, turning a container into a VM (for enhanced security) is generally easier than trying to do the opposite. AWS Lambda basically does that as do a lot of the minimal "cloud" Linux distributions that just run Docker with a stripped down userland (like Container Linux and whatever its successors are)
I don't think virtualization really offers hardware-level guarantees against noisy neighbours either.
I agree somewhat but there has been significant progress to sandbox containers with the same security we'd expect from a VM. It isn't a ridiculous idea that VMs will one day be antiquated, but probably won't happen for a few more years.
https://kubernetes.io/docs/concepts/policy/pod-security-poli...
> As of Kubernetes v1.19, you can use the seccompProfile field in the securityContext of Pods or containers to control use of seccomp profiles.
If you're looking for a more general abstraction, there is gVisor and others as well.
Which leaves you to either use something like firecracker or gvisor which are either virtualization solutions or the next closest thing in that they intermediate all of your syscalls?
*not a security person or an expert on singularity but it advertises that it doesn’t do file system or user isolation by default
I imagine CPU and memory namespaces coming implemented on hardware isolation features like VT-d io-mmus and alike thus making virtual machines integrated into some sandboxing feature.
Marketing. Because of Marketing.
The only reason I use Docker is that I can access the system design knowledge that is available with docker-compose.yml's. Last example: Gitlab. Could not get it running on unrivileged LXC using the official installation instructions, with Docker it was simply editing the `.env` and then `docker-compose up -d`. All of this in a local, non-public (ipsec-distributed) network. I often find myself creating a single, separate unprivileged LXC container->Docker nesting for each new container, because I do not need to follow the complicated and error prone installation instructions for native installs.
You should never login to the shell of a container for config. All application state lives elsewhere, and any new commit to your app triggers a new build.
If that's not for you, then while containers like proxmox/LXC can still be handy, you're just doing VM at a different layer.
The article was a bit hand wavey about how "they" complain about containers, and then uses the analogy more than explaining the problems and solutions.
I absolutely do this and think it works great. Fedora has built a tool called "toolbox" which is basically a wrapper on podman which can create and enter containers where you can install development tools without touching your actual OS.
I basically do all of my development inside a container where the source code is volume mounted in but git/ruby/etc only exist in the container.
This has the benefit of letting me very quickly create a fresh env to test something out. Recently I wanted to try doing a git pull using project access tokens on gitlab and containers let me have a copy of git which does not have my ssh keys in it.
This is somewhat of an edge case though, for a server deployment, yes you shouldn't rely on anything changed inside the container and should use volume mounts or modify the image.
I'd be interested in trying it out but I don't want to spend some hours reading documentation trying to get it working.
Become root.
Install debootstrap, which is in the Debian and Ubuntu repositories, at least.
Make a directory to contain your embedded system. It can be anywhere. Let's use /var/lib/machine/machinename.
This command will install a new, minimal Debian system in that directory:
debootstrap --include=systemd-container stable /var/lib/machines/machinename http://deb.debian.org/debian
It will download everything and, if I recall correctly, works unattended (doesn’t ask questions).
Enter the container with
systemd-nspawn -D /var/lib/machines/machinename/
and set the root container password with passwd.
Then do
echo 'pts/0' >> /etc/securetty
so the guest OS will let you log in after it's booted up. You may have to add other pts/x entries. I'm not sure about this part; it may be that if there is no /etc/securetty file that there is no problem.
Now log out of the container.
To boot up the guest OS, use
systemd-nspawn -b -D /var/lib/machines/machinename
You will see the familiar console messages.
You will find advice on the web to include the -U flag here, which causes files in the guest OS to only use UIDs known to the guest OS when determining ownership and permissions. This leads to headaches, because you have to set the ownership of any file you copy in from the host system. Leave it out, and you can have parallel users on the host and guest OSs, which is more convenient. But you may have to change the UIDs of the users on the guest OS so that they match.
Now, on the host OS, you can use the `machinectl` command to control all your guest OSs. `machinectl list` shows you what’s running, `machinectl login` lets you log in to them, and there are several commands for killing them with various levels of violence.
If you want your machine to be a long-running service, just `nohup` the spawn command, and direct output as desired.
If you want to be able to communicate with your machine from the internet, opening sockets from within the guest OS works, as they share the network interface. For a public-facing web service, you can install (for example) Apache and pick a port number to listen on, then set up a reverse proxy on the host OS, using a dedicated domain or a subdomain, so the users don’t have to use the custom port number. I’ve found that certificates for HTTPS need to be installed on both the host and guest OSs.
Good luck!
More information: https://wiki.debian.org/nspawn
Virtualizing the kernel like the Amazon Machine Image (AMI) virtualizes a chip core sounds great. But now, in the "puff", all of those networking details that AWS keeps below the hypervizor line confront us. Storage, load balancing, name services, firewalls. . .
Containers can solve packaging issues, but wind up only relocating a vast swath of other problems.
An analogy can go a long way. Both ways.
So, assuming I understood correctly, treating them like tents is infinitely the better choice.
What makes a given component – server, vm, container, whatever – is not the runtime, but how you deal with it when it gets seriously ill. Pets are taken to the vet or hospital to get treatment. Cattle are.. well, read the article I linked.