The package definition describes how to fetch the source code from a source (like a Git repo or a hosted archive) and build it. The built result only contains what is necessary at runtime.
A sizeable amount of packages don't even fetch source code but a prebuilt binary which is then fixed up to work with Nix.
There is a source cache, but it is optional.
As an example, check out ripgrep [1]. It uses `fetchFromGithub` to retrieve the code.
[1] https://github.com/NixOS/nixpkgs/blob/3bb54189b0c8132752fff3...
I'm currently doing some work for ML and data science companies where full reproducibility and introspection is very much desired.
So you need to run your own source cache to provide that guarantee, because you can't count on cache.nixos.org still providing the source code from a package built 4 years ago.
But that's why I love the IPFS cache efforts. [1] Running your own node to pin all required sources should then be relatively easy.
It also contains source code.
Also, they are only available when hydra builds the package anyways, right? So if some package is not built by hydra (like how it used to be for the texlive packages), it'll still download the sources from the various places they are hosted.
As for the hash, it's good that the source code is hashed, but my main concern was that it was downloading from external sources in the first place. This is bad for privacy, as those hosts know I'm downloading from them, as well as for reliability, because the hosts might not have as good uptime as a debian package mirror.
>Also, they are only available when hydra builds the package anyways, right? So if some package is not built by hydra (like how it used to be for the texlive packages), it'll still download the sources from the various places they are hosted.
Yes.
>As for the hash, it's good that the source code is hashed, but my main concern was that it was downloading from external sources in the first place. This is bad for privacy, as those hosts know I'm downloading from them, as well as for reliability, because the hosts might not have as good uptime as a debian package mirror.
That's a true and valid concern, but note that it's the same situation as with Debian: If the package is built upstream by Debian/the NixOS Hydra instance, then you have reliable, private access to its source code so you can rebuild it. If it's not built/packaged upstream, then you need to get the source from somewhere else.
The discrepancy is just that there's packages in Nixpkgs which are not built upstream, and which get built only locally on your machine or your own Hydra instance. There are not many of these, but yeah, it would be nice to fully get rid of them.
Or, an interesting option would be to build the source for more packages on Hydra, without actually building the binary for the package. That wouldn't be too hard, if someone adds an expression for doing it.
Fixed-output derivations are used for sources (they are content-addressed in the store), so the latter.
It starts to become a bit of a grey area in some cases though. For instance - java packages. Is a .jar a binary? Probably. But so many java applications rely on pulling loads of .jars down from maven. Are we going to sit down and figure out how to build all those jars from source? It's not uncommon for there to be literally hundreds.
I agree that it would be nice to tag (with meta) FOSS packages that aren't built from source, though. Every instance of that is a bug, IMO...
Unfortunately this isn’t the case with some languages that have their own package managers, the prime example being Java as the parent commenter mentioned. It’s near impossible to build Java applications without fetching tons of binary jars from maven that Debian just gives up on providing their own package in many cases[1]. While Nix does build Java applications from source, the dependencies are fetched from maven in binary form.