undefined | Better HN

0 pointsjoshkel7d ago0 comments

I'm far from an expert, but this feels like an oversimplification.

Python packages traditionally use setup.py to install code, and setup.py is all executable code under the installed package's control.

Native Ruby Gems execute arbitrary code via extconf.rb.

Pre .NET Core, NuGet packages could ship scripts like `install.ps1`. That's been removed, but they can still ship `.targets` and `.props` files that are incorporated into your build (and so can run code at build time).

PHP Composer packages can ship install scripts or configure themselves as Composer plugins.

The venerable .tar.gz approach to packaging, covering decades of C and C++ code, is all about executing code during installation.

There are measures that can help (e.g., PHP Composer doesn't run install scripts of _transitive_ dependencies) but the JS space is adopting measures that can help too (like pnpm's approve-builds).

0 comments

amiga3867d ago

> Python packages traditionally use setup.py

But nowadays prefer pyproject.toml, and most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages.

> PHP Composer packages can ship install scripts

Which requires the user to say yes to running them, but they can also say they only want a specific package to run scripts with something like "composer -n config allow-plugins.foo/bar true && composer -n require foo/bar"

> The venerable .tar.gz approach to packaging

Which most people don't install directly, but have already had built for them by their distro.

As more and more languages get "package managers", there's an expectation that installing what should just be inert package/library code should not run commands. Sometimes generated files are needed, and the direction seems to be that these package managers should be like distro package managers, where they take the risk of running the build instructions and generate those files for you, serving up os/architecture-specific builds.

This is the direction npm ought to take, and furthermore shouldn't allow things like electron being a small bundle of javascript code that fetches large lumps of binary code from somewhere else on the internet to install. It should all be uploaded to, and sourced from, NPM.

joshkelOP7d ago

> But nowadays prefer pyproject.toml, and most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages.

Yes, and these are positive changes. But they aren't security boundaries, and they don't mean that pip won't execute arbitrary code: a malicious update could ship an update with sdist instead of wheels, a malicious pyproject.toml could provide an arbitrary-code `build-backend`, etc., and pip would still function as designed.

I appreciate the clarifications/corrections on PHP.

> Which most people don't install directly, but have already had built for them by their distro.

Yes, but the original claim was that npm is "particularly susceptible to these attacks" because "npm can execute code after install and most package managers don't do that." I don't think that's accurate: we've seen hundreds of NPM packages compromised in multiple high-profile attacks over the last several months, while .tar.gz was used for decades with nowhere near the same number of compromises.

Rather, I suspect it's a combination of factors: Early JS had a relatively anemic standard library in the early days, and NPM made code reuse dramatically simpler than before. This normalized the use of large and deep dependency trees among JS projects. And the extreme popularity of JS, the centralization of NPM + GitHub, and increased usage of automation makes attacks more practical and more lucrative.

Taking a step back from that particular debate, I'm very much in favor of changes like what you describe.

Taking still another step back, I'm not sure that even those will be enough. If I download a package, it's because I intend to run its code at some point: if it's malicious, I may be less automatically hosed than if its postinstall script runs, but I'm still hosed at execution time. I trust my distro packages, not because they don't execute arbitrary code on installation (RPMs and .debs both do), but because I _trust my distro_. NPM et al. simply cannot vouch for every package they host.

Thanks for the reply!

skydhash7d ago

> Early JS had a relatively anemic standard library in the early days, and NPM made code reuse dramatically simpler than before. This normalized the use of large and deep dependency trees among JS projects.

This always seems like a very convenient excuse. C also have a very small standard library. And unless you're doing system programming, you often have to find utility library. It's just that those libraries tries to solve their domain instead of splitting themselves into molecules. Before npm, we had good js libraries too like jQuery as a fundamental one, backbone.js, dropzone.js,... where you import a few files (and vendor them in your project) and be done with it.

The issue with NPM is that it led to the creation of weird ecosystem like babel, webpack, eslint,... where instead of having a good enough solution, it was plugins ad infinitum. And other maintainers started doing the same thing, splitting their libraries, and writing libraries where a gist or a blog post would be enough[0]. Cargo is suffering the same[1]

[0]: https://github.com/Rob--W/proxy-from-env/blob/master/index.j...

[1]: https://docs.rs/is_executable/latest/src/is_executable/lib.r...

1 more reply

optionalsquid7d ago

> But nowadays prefer pyproject.toml

Couldn't you accomplish the same thing by adding a malicious [build-system] to a pyproject.toml file? You can pull in arbitrary code by providing exact URLs for requirements:

  [build-system]
  requires = ["hatchling @ https://files.pythonhosted.org/packages/8f/8a/cc1debe3514da292094f1c3a700e4ca25442489731ef7c0814358816bb03/hatchling-1.27.0.tar.gz"]
  build-backend = "hatchling.build"

amiga3867d ago

That's a very visible Ken Thompson style attack. The modern expectation is that PyPI would be evaluating this build-system section and would only accept build-systems that they trust to turn package distributions into wheels, and the end users only need the wheels. If you need a specific version of hatchling that they know of, that's fine. If you need something they haven't heard of, they should say no.

bakkoting7d ago

> most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages

Technically true, but wheels can include a `.pth` which will run arbitrary code as soon as Python is started, which is only marginally less dangerous. Recently exploited in the LiteLLM attack.

amiga3867d ago

That appears to be an exploitable feature of the language, not the package manager per se.

We could then add the philosophical question of asking what's the difference between:

1. Adding malicious code to a package's .pth file that's evaluated automatically on every python invocation

2. Adding malicious code to the package itself that's evaluated automatically on every python invocation _that uses that package_

Packaging systems that don't run arbitrary code when you install a package are more trustworthy than ones that do, but there's still the essential trust you have to place in all code you're installing, directly and indirectly.

j / k navigate · click thread line to collapse

0 comments

amiga3867d ago

> Python packages traditionally use setup.py

But nowadays prefer pyproject.toml, and most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages.

> PHP Composer packages can ship install scripts

> The venerable .tar.gz approach to packaging

Which most people don't install directly, but have already had built for them by their distro.

joshkelOP7d ago

> But nowadays prefer pyproject.toml, and most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages.

I appreciate the clarifications/corrections on PHP.

> Which most people don't install directly, but have already had built for them by their distro.

Taking a step back from that particular debate, I'm very much in favor of changes like what you describe.

Thanks for the reply!

skydhash7d ago

[0]: https://github.com/Rob--W/proxy-from-env/blob/master/index.j...

[1]: https://docs.rs/is_executable/latest/src/is_executable/lib.r...

1 more reply

optionalsquid7d ago

> But nowadays prefer pyproject.toml

Couldn't you accomplish the same thing by adding a malicious [build-system] to a pyproject.toml file? You can pull in arbitrary code by providing exact URLs for requirements:

  [build-system]
  requires = ["hatchling @ https://files.pythonhosted.org/packages/8f/8a/cc1debe3514da292094f1c3a700e4ca25442489731ef7c0814358816bb03/hatchling-1.27.0.tar.gz"]
  build-backend = "hatchling.build"

amiga3867d ago

bakkoting7d ago

> most people use pre-built distributions (wheels) for their architecture from PyPI, so don't execute arbitrary code to install packages

Technically true, but wheels can include a `.pth` which will run arbitrary code as soon as Python is started, which is only marginally less dangerous. Recently exploited in the LiteLLM attack.

amiga3867d ago

That appears to be an exploitable feature of the language, not the package manager per se.

We could then add the philosophical question of asking what's the difference between:

1. Adding malicious code to a package's .pth file that's evaluated automatically on every python invocation

2. Adding malicious code to the package itself that's evaluated automatically on every python invocation _that uses that package_

j / k navigate · click thread line to collapse