Python packages traditionally use setup.py to install code, and setup.py is all executable code under the installed package's control.
Native Ruby Gems execute arbitrary code via extconf.rb.
Pre .NET Core, NuGet packages could ship scripts like `install.ps1`. That's been removed, but they can still ship `.targets` and `.props` files that are incorporated into your build (and so can run code at build time).
PHP Composer packages can ship install scripts or configure themselves as Composer plugins.
The venerable .tar.gz approach to packaging, covering decades of C and C++ code, is all about executing code during installation.
There are measures that can help (e.g., PHP Composer doesn't run install scripts of _transitive_ dependencies) but the JS space is adopting measures that can help too (like pnpm's approve-builds).
But nowadays prefer pyproject.toml, and most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages.
> PHP Composer packages can ship install scripts
Which requires the user to say yes to running them, but they can also say they only want a specific package to run scripts with something like "composer -n config allow-plugins.foo/bar true && composer -n require foo/bar"
> The venerable .tar.gz approach to packaging
Which most people don't install directly, but have already had built for them by their distro.
As more and more languages get "package managers", there's an expectation that installing what should just be inert package/library code should not run commands. Sometimes generated files are needed, and the direction seems to be that these package managers should be like distro package managers, where they take the risk of running the build instructions and generate those files for you, serving up os/architecture-specific builds.
This is the direction npm ought to take, and furthermore shouldn't allow things like electron being a small bundle of javascript code that fetches large lumps of binary code from somewhere else on the internet to install. It should all be uploaded to, and sourced from, NPM.
Yes, and these are positive changes. But they aren't security boundaries, and they don't mean that pip won't execute arbitrary code: a malicious update could ship an update with sdist instead of wheels, a malicious pyproject.toml could provide an arbitrary-code `build-backend`, etc., and pip would still function as designed.
I appreciate the clarifications/corrections on PHP.
> Which most people don't install directly, but have already had built for them by their distro.
Yes, but the original claim was that npm is "particularly susceptible to these attacks" because "npm can execute code after install and most package managers don't do that." I don't think that's accurate: we've seen hundreds of NPM packages compromised in multiple high-profile attacks over the last several months, while .tar.gz was used for decades with nowhere near the same number of compromises.
Rather, I suspect it's a combination of factors: Early JS had a relatively anemic standard library in the early days, and NPM made code reuse dramatically simpler than before. This normalized the use of large and deep dependency trees among JS projects. And the extreme popularity of JS, the centralization of NPM + GitHub, and increased usage of automation makes attacks more practical and more lucrative.
Taking a step back from that particular debate, I'm very much in favor of changes like what you describe.
Taking still another step back, I'm not sure that even those will be enough. If I download a package, it's because I intend to run its code at some point: if it's malicious, I may be less automatically hosed than if its postinstall script runs, but I'm still hosed at execution time. I trust my distro packages, not because they don't execute arbitrary code on installation (RPMs and .debs both do), but because I _trust my distro_. NPM et al. simply cannot vouch for every package they host.
Thanks for the reply!
This always seems like a very convenient excuse. C also have a very small standard library. And unless you're doing system programming, you often have to find utility library. It's just that those libraries tries to solve their domain instead of splitting themselves into molecules. Before npm, we had good js libraries too like jQuery as a fundamental one, backbone.js, dropzone.js,... where you import a few files (and vendor them in your project) and be done with it.
The issue with NPM is that it led to the creation of weird ecosystem like babel, webpack, eslint,... where instead of having a good enough solution, it was plugins ad infinitum. And other maintainers started doing the same thing, splitting their libraries, and writing libraries where a gist or a blog post would be enough[0]. Cargo is suffering the same[1]
[0]: https://github.com/Rob--W/proxy-from-env/blob/master/index.j...
[1]: https://docs.rs/is_executable/latest/src/is_executable/lib.r...
Couldn't you accomplish the same thing by adding a malicious [build-system] to a pyproject.toml file? You can pull in arbitrary code by providing exact URLs for requirements:
[build-system]
requires = ["hatchling @ https://files.pythonhosted.org/packages/8f/8a/cc1debe3514da292094f1c3a700e4ca25442489731ef7c0814358816bb03/hatchling-1.27.0.tar.gz"]
build-backend = "hatchling.build"Technically true, but wheels can include a `.pth` which will run arbitrary code as soon as Python is started, which is only marginally less dangerous. Recently exploited in the LiteLLM attack.
We could then add the philosophical question of asking what's the difference between:
1. Adding malicious code to a package's .pth file that's evaluated automatically on every python invocation
2. Adding malicious code to the package itself that's evaluated automatically on every python invocation _that uses that package_
Packaging systems that don't run arbitrary code when you install a package are more trustworthy than ones that do, but there's still the essential trust you have to place in all code you're installing, directly and indirectly.