With the shell script, you can literally read it in an editor to make sure it isn't doing anything that weird. A single pass through shellcheck would likely tell you if it's doing anything that is too weird/wrong in terms of structure.
Auditing a docker container is way more difficult/complex.
"Dockerize all the things", especially in cases when the prereqs aren't too weird, seems like it wastes space, and also is harder to maintain - if any of the included components has a security patch, it's rebuild the container time...
Multus sits at the demarc line between the container and the NIC channel. I'm not saying it's possible or ever been done but if I were going to set up a traffic mirror somewhere it'd logically have to be there or after the NIC..
I wrote it 5 years ago. I have no idea what version of multus it's running but even today it's getting pulls, last pull 19 days ago. Overall pulls over 5 years is over 10k.
These containers would spin up every time a container starts on k8s that attaches an ovf interface. So, it's pretty much guaranteed that this is in use somewhere in someones scaling infra. I don't know if I SHOULD delete the image and potentially take down someones infra or just let them keep chugging at it. I'm not paying for dockerhub.
https://hub.docker.com/repository/docker/swozey/multus/gener...
edit: Looks like it's installing the latest multus package so not AS terrible but .. multus is not something to play loose with versioning..
Also I really wish Dockerhub gave you more stats/analytics. It really means nothing in the end but I'm curious. They don't even tell you the number beyond 10k, it just says 10k+ downloads.
I assume you mean auditing docker images. In which case, sure. That's why you grab their dockerfile and build it yourself.
Though using dive[1] it's pretty easy to inspect docker images too, as long as they extend a base image you trust.
Then you still didn't audit anything. What you need to do is inspect the docker file, follow everything it pulls in and audit that, finally audit the script itself that the whole container gets built for in the first place. Whereas when you just download the script and run that directly, you only need to do the last step.
Running anything without understanding what it does it is more dangerous than trying to understand it before running it.
I'm arguing for less complexity and easier auditing, instead of a series of complex layers that each add to a security story, but make the overall result much harder to audit.
As long as it doesn't have access to outside of the container, who cares?
You check the dockerfile, see what access it allows, and build the container.
Besides a shell script can be 100s of lines, not very fun auditing it.
That was more snark than HN likes, but it feels like forgetting promises of the past in a dangerous way.
https://snyk.io/blog/cve-2024-21626-runc-process-cwd-contain...
> With the shell script, you can literally read it in an ...
Can't do that if all of the work is hidden in a Dockerfile's RUN statement. I commit shell scripts in shell script files, and the Dockerfile just runs that shell script. Then the shell script can be version controlled for purpose, static analyzed, and parsed in an IDE with a plugin supporting a shell script language server.
Yes, because inspection aside, at least with a docker invocation you can specify the volumes
chrooting the unknown script is being 90% there.
Or would they wrap it in yet another shell script that calls docker with a set of options, or a compose file, etc?
This quickly turns into complexity stacked on complexity...
As for patching, you can tell your Dockerfile to always pull the latest versions of the items you are most concerned about. At that point rebuilding the container is as simple as deleting it with "docker container stop <id> && docker container rm <id>" and then run your docker-compose command again.
There would already be implicit trust in whatever the local OS's package manager laid down, and trying to add another set of hard to audit binaries on top is not really an improvement.
Overall this approach results in an image so fragile I would never use the resulting product in a high-priority production environment or even just my local dev environment as I want to code in it, not have to fix numerous compatibility issues in my tools all over 14MB of space.
> copying the various standardized CLI tools and related library files into the image versus installing them with APK can introduce _many_ compatibility challenges down the road as new base Alpine versions are released which can be difficult to detect if they don't immediately generate total build errors
I'm maybe missing some context here, so you are saying that the default location of these binaries can change (the one's that get copied directly)? Or is it about the shared libraries getting updated and the tools depending on these libraries will eventually break?
They could, Debian is in the process of unifying the bin directories, see: https://wiki.debian.org/UsrMerge
Realistically it's not much of an issue.
Given that you start out with a 31.4 MB image, I don't honestly think the introduced complexity in your build is worth the it. It's a good lesson, for people would doesn't know about build images and ships an entire build pipeline in their Docker image, for a bash script and a <50 MB image the complexity is a bit weird.
If the underlying system has a newer version of git than the one freeze-dried into your container, repositories managed there by native-git might be in a new format which container-git can't handle. (There might be some new, spiffier way of handling packs, for instance, or they might have finally managed to upgrade the hash function.) And similar issues potentially arise for everything else you're packaging.
COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/
COPY --from=ugit-ops /lib/libc.musl-* /lib/
COPY --from=ugit-ops /lib/ld-musl-* /lib/
No, what I'm saying is you're blanket copying fully different versions of common library files into operating system lib folders as shown above, possibly breaking OS lib symlinks and/or wholly overwriting OS lib files themselves in the process for _current_ versions used in Alpine OS if they exist now or in the future, potentially destroying OS lib dependencies, and also overwriting the ones possibly included in the future by Alpine OS itself to get your statically copied versions of the various CLI tools your shell script needs to work. The same goes for copying bash, tr, git, and other binaries to OS bin folders. No No NO!
That is _insanely_ shortsighted. There's a safe way to do that and then there is the way you did it. If you want to learn to do it right and are deadset against static binary versions of those tools for the sake of file size, look at how Exodus does it so that they don't destroy OS bin folders and library dependency files in the process of making a binary able to be moved from one OS to another.
Exodus: https://github.com/intoli/exodus
This is why I'm saying your resulting docker image is incredibly fragile and something I would never depend on long-term as it's almost guaranteed to crash and burn as Alpine OS upgrades OS bins and lib dependency files in the future. That it works now in this version is an aberration at best and in reality, there probably are things that are broken in Alpine OS that you aren't even aware of because you may not be using the functionality you broke yet.
OS package managers handle dependencies for a reason.
[...]
COPY --from=ugit-ops /usr/bin/tr /usr/bin/tr
COPY --from=ugit-ops /bin/bash /bin/
COPY --from=ugit-ops /bin/sh /bin/
# copy lib files
COPY --from=ugit-ops /usr/lib/libncursesw.so.6 /usr/lib/
COPY --from=ugit-ops /usr/lib/libncursesw.so.6.4 /usr/lib/
COPY --from=ugit-ops /usr/lib/libpcre* /usr/lib/
COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/
[...]
For me, insane sh*t like this proves that those who do not learn from distribution and package management infrastructure engineering history are condemned to reinvent it, poorly.I understand that you might have some context about package managers that I am missing. Would genuinely like some resources about your comment or maybe a bit of explanation.
Thanks
I am in a bit of a rush right now (which is why I try my absolute best to keep procrastinating on HN at the the absolute minimum, I swear! ;)), but I will try to share some insight later (potentially as a comment on your blog).
On unrelated note, --chmod parameter of the COPY instruction provides a way to avoid additional layers just to set the executable bits:
# instead of
COPY ugit .
RUN chmod +x ugit && mv ugit /usr/local/bin/
COPY --from=builder /usr/local/bin/ugit /usr/bin/
# could just be this
COPY ugit --chmod=755 /usr/bin/
In all seriousness though, that Dockerfile is basically one big uglified red flag, please don't do this, people.One easy low hanging fruit I see a LOT for ballooning image sizes is people including the kitchen sink SDK/CLI for their cloud provider (like AWS or GCP), when they really only need 1/100 of that. The full versions of both of these tools are several hundred mb each
FROM alpine
RUN chmod -R a+rwx bin dev home lib media mnt opt root run sbin srv tmp usr varI think it would be more accurate to say, in the Alpine ecosystem, it is generally not advised to pin versions of packages at all. Actually, this is not so much a recommendation as it is a statement of impossibility: You can't pin package versions (without your Docker builds starting to fail in a week or two), period. In other words: Don't use Alpine if you want reproducible (easily cacheable) Docker builds.
I had to learn this the hard way:
- There is no way to pin the apk package sources ("cache"), like you can on Debian (snapshot.debian.org) and Ubuntu (snapshot.ubuntu.com). The package cache tarball that apk downloads will disappear from pkgs.alpinelinux.org again in a few weeks.
- Even if you managed to pin the sources (e.g. by committing the tarball to git as opposed to pinning its URL), or if you decided to pin the package versions individually, package versions that are up-to-date today will likely disappear from pkgs.alpinelinux.org in a few weeks.
- Many images that build upon Alpine (e.g. nginx) don't pin the base image's patch version, so you get another source of entropy in your builds from that alone.
Personally, I'm very excited about snapshot images like https://hub.docker.com/r/debian/snapshot where all package versions and the package sources are pinned. All I, as the downstream consumer, will have to do in order to stay up-to-date (and patch upstream vulnerabilities) is bump the snapshot date string on a regular basis.
Unfortunately, the images don't seem quite ready for consumption yet (they are only published once a month) but see the discussion on https://github.com/docker-library/official-images/issues/160... for a promising step in this direction.
Agreed, should have been clear with my sentiment there. Thanks for stating this :)
> Personally, I'm very excited about snapshot images like https://hub.docker.com/r/debian/snapshot where all package versions and the package sources are pinned. All I, as the downstream consumer, will have to do in order to stay up-to-date (and patch upstream vulnerabilities) is bump the snapshot date string on a regular basis.
This is really helpful, thanks for sharing. Looks like it will be a good change, fingers crossed.
I think my experience in similar pursuits would have led me to stop very early on - 31.4 MB is already pretty good, to be fair. Looking at the amount of potential maintenance required in the future, for example if the original ugit tool starts to need more dependencies which then have to be wrangled and inspected, makes me think that the size I didn't reduce is worth the tradeoff. Since the dependencies can be managed with package managers, without having to think too much, and as the author says, Linux is pretty awesome about these things already.
True, 31.4 MB is definitely a stopping point. But my the nerd inside me kicked in and wanted to know what "exactly" is required to run ugit. It was a fun experience.
We can definitely go smaller than 20MB and six layers.
Here's a solution that compresses everything into a single 8.7MB layer using tar and an intermediate staging stage: https://gist.github.com/carlosonunez/b6af15062661bf9dfcb8688...
Remember, every layer needs to be pulled individually and Docker will only pull a handful of layers at a time. Having everything in a single layer takes advantage of TCP scaling windows to receive the file as quickly as the pipe can send it (and you can receive it) and requires only one TCP session handshakes instead of _n_ of them. This is important when working within low-bandwidth or flappy networks.
That said, in a real-world scenario where I care about readability and maintainability, I'd either write this in Go with gzip-tar compression in the middle (single statically-compiled binaries for the win!) or I'd just use Busybox (~5MB base image) and copy what's missing into it since that base image ships with libc.
Hey this looks interesting, will try it out. Thanks for writing it!
> That said, in a real-world scenario where I care about readability and maintainability, I'd either write this in Go with gzip-tar compression in the middle (single statically-compiled binaries for the win!) or I'd just use Busybox (~5MB base image) and copy what's missing into it since that base image ships with libc.
Agreed, rewriting was not the option (as mentioned in the beginning). Moreover, It would have taken longer to build a nice TUI interface then it took to dockerize it.
In that case, maybe it could be helpful, but to make it convenient, don't I need a script that stays in my main system and invokes the docker run command for me?
So if you do that and just give me a one liner install command to copy paste then I guess this actually makes sense. A small docker container could eliminate a lot of potential gotchas with trying to install dependencies in arbitrary environments.
Except it's a bash script. I guess it would make more sense to get rid of the dependency on fzf or something nonstandard. Then they can just install your bash script.
For cases where you have more dependencies that really can't be eliminated then this would make more sense to me.
Why does it need fzf? Is it intended to run the container interactively?
You can refer to usage guidelines on dockerhub https://hub.docker.com/r/bhupeshimself/ugit
> So if you do that and just give me a one liner install command to copy paste then I guess this actually makes sense. A small docker container could eliminate a lot of potential gotchas with trying to install dependencies in arbitrary environments.
Yes, that was also an internal motivation behind doing this.
> Why does it need fzf? Is it intended to run the container interactively?
Hey fzf is required by ugit (the script) itself. I didnt want to rely on cli arguments to give ability to users undo command per a matching git command. Adding a fuzzy search utility makes it easier for people to search what they can undo about "git tag" for example.
I don't see what value the author's side project is bringing other than adding complexity to a simple task (or, more likely, bolstering their resume).
Sponsorship for a 500 line shell script. Wow!
If someone has no inherent interest in doing something, is not othewise obligated to do it, it is not done as a favor to friends or something, paying that person to do the job anyway is a very accepted practice in our society. Almost all of our employers pay us to do things we might otherwise not do.
alias lsa=ls -a
Sponsor me!
I find it mildly ironic, however, that bundling the dependencies of a shell script is - in some ways - the exact opposite of saving space, even if it is likely to make running your script more convenient.
Unfortunately, I don't have a great alternative to offer. The obvious approach is to either let the users handle dependencies (which you can also do with ugit) or write package definitions for every major distribution. And if I were the author, I wouldn't want to do that for a small side project either.
Well... There's nix. Complete packaging system, fully deterministic results, lots of features, huge number of existing packages to draw from, works on your choice of Linux distro as well as Darwin and WSL. All at the tiny cost of a little bit of your sanity and being its own very deep rabbit hole.
I'd argue writing a Nix derivation isn't that different from writing a package definition for any one Linux distribution. It solves the distribution problem for people who use that particular distribution/tool, not everyone. Now, Nix can be installed on any distribution, but if I was going for widespread adoption, I might point to Nix being a solution, but I probably wouldn't advertise it as the main one.
We can't pass around bash scripts anymore. Every system has to be fungible, reproducible en masse and as agnostic to the underlying technology its on as possible.
I love going and making containers smaller and faster to build.
I don't know if it's useful for alpine, but adding a --mount=type=cache argument to the RUN command that `apk add`s might shave a few seconds off rebuilds. Probably not worth it, in your case, unless you're invalidating the cached layer often (adding or removing deps, intentionally building without layer caching to ensure you have the latest packages).
Hadolint is another tool worth checking out if you like spending time messing with Dockerfiles: https://github.com/hadolint/hadolint
Unless your tool is converted to a service how would anyone ever use this? Do you expect them to run their project inside of your container?
This is very bizarre.
In the best case, that first layer will be reused. Meaning that creating a different base layer may actually increase the size in the end, even if the image in isolation may appear smaller!
These screeds get more and more random.
The standard advice was always to just not let a program in Bash get beyond X lines. Then move to a real programming language. Like Python (est. 1991).
And man I love it. It's totally against the 12 microservice laws and shoudl NOT be done in most cases, but when it comes to troubleshooting- I can exec into a container anywhere restart services because supervisord sits there monitoring for the service (say mysql) to exit and will immediately restart it. And because supervisor is pid1 as long as that never dies your container doesn't die. You get the benefit of the containerization and servers without the pain of both, like having to re-image/snapshot a server once you've thoroughly broken it enough vs restarting a container. I can sit there for hours editing .conf files trying to get something to work without ever touching my dockerfile/scripts or restarting a container.
I don't have to make some changes, update the entrypoint/dockerfile, push build out, get new image, deploy image, exec in..
I can sit there and restart mysql, postgres, redis, zookeeper, as much as I want until I figure out what I need done in one go and then update my scripts/dockerfiles THEN prepare the actual production infra where it is split into microservices for reliability and scaling, etc.
I've written a ton of these for our QA teams so they can hop into one container and test/break/qa/upgrade/downgrade everything super quick. Doesn't give you FULL e2e but it's not we'd stop doing what tests we already do now.
I mention this because it was something I did once a long long time ago but completely forgot something that you could do until I recently went that route and it really does have some useful scenarios.
https://gdevillele.github.io/engine/admin/using_supervisord/
I'm also really tired of super tiny containers that are absolute nightmares to troubleshoot when you need to. I work on prod infra so I need to get something online immediately when a fire is happening and having to call debug containers or manually install packages to troubleshoot things is such a show stopper. I know they're "attack vectors" but I have a vetted list of aliases, bash profiles and troubleshooting tools like jq mtr etc that are installed in every non-scratch container. My containers are all standardized and have the exact same tools, logs, paths, etc. so that everyone hopping into one knows what they can do.
If you're migrating your architecture to ARM64 those containers spin up SO fast that the extra 150-200mb of packages to have a sane system to work on when you have a fire burning under you is worth it. For some scale the cross datacenter/cluster/region image replication would be problematic but you SHOULD have a container caching proxy in front of EVERY cluster anyway. Or at least at the datacenter/rack. It could be a container ON your clusters with it's storage volume a singular CEPH cluster, etc.
…with that, building those nix images on Mac is still a bit rough—there’s some official docs and work on getting a builder VM set-up, but it’s still a bit rough around the edges.
I did this by tracking the output of the ldd command and moving only needed libraries into the container.
Why is docker so big?
For the last goddamn time: Docker is not a package manager!
https://xeiaso.net/blog/i-was-wrong-about-nix-2020-02-10/
I’ve got plenty of gripes with nixlang, but being worse than Dockerfile-lang isn’t one of them.
Most of it seems to be from git:
$ nix path-info --closure-size --human-readable nixpkgs#legacyPackages.x86_64-linux.gitMinimal
/nix/store/f7b2yl226nbikiv6sbdhmaxg2452c8h5-git-minimal-2.42.0 112.9M
$ nix path-info --closure-size --human-readable nixpkgs#legacyPackages.x86_64-linux.pkgsMusl.gitMinimal
/nix/store/25807yw3143a94dpr3a3rffya7vg5r24-git-minimal-2.42.0 73.5M
Apparently "gitMinimal" is not all that minimal:
$ ls /nix/store/25807yw3143a94dpr3a3rffya7vg5r24-git-minimal-2.42.0/libexec/git-core/ | wc -l
172
[0]: https://gist.github.com/jbboehr/3a5d0dd52a0c1139ce88b76ab82a...
Could you maybe modify stage 2 to:
FROM scratch as stage2
COPY ...
COPY ...
...
and finally at the end: FROM SCRATCH
COPY --from=stage2 / /I wasn't able to dig deep enough on why that was the case, considering the "env" utility was coming from busybox which on copy averages close to 900Kb.
When using a shebang line, the reason for 'env' is actually something different.
You can just leave out 'env' and do a shebang with 'bash' directly like this:
#! /usr/bin/bash
But the problem with that is portability. On different systems, the correct path may be /bin/bash or /usr/bin/bash. Or more unusual places like /usr/local/bin/bash. On old Solaris systems that came with ksh, bash might be somewhere under /opt with all the other optional software.But 'env' is at /usr/bin/env on most systems, and it will search $PATH to find bash for you, wherever it is.
If you're defining a Docker container, presumably you know exactly where bash is going to be, so you can just put that path on the shebang line.
TLDR: You don't have to have a shebang, but you can have a shebang at no cost because your shebang doesn't need an env.
Fly.io, deliver us.