With its ability to install Python and non-python packages (including binaries), conda is my go-to for managing project environments and dependencies. Between the bioconda[2] and conda-forge[3] channels, it meets the needs of many on the computational side of the biological sciences. Being able to describe a full execution environment with a yaml file is huge win for replicable science.
1. https://conda.io/miniconda.html https://conda.io/docs/user-guide/install/index.html
Also, a conda package is not a replacement for a distutils/setuptools package. When building a conda package, one still calls setup.py. So every python conda package has to be a distutils/setuptools package anyway.
The fact that it's not a community-driven project might be one of the reasons.
Meanwhile, in a galaxy far away, people are also using buildout.
- there is Miniconda that doesn't force you to install all PyData packages. - virtual envs and needed packages are all defined in simple yaml file. - it works well with pip. So if a package isn't in Conda repository, you can install from pip. The annoyance here is that you must try conda, fail, and then try pip. - you can easily clone envs. So you can have some base envs with your usual packages (or one for Python 2 and another fo Python 3), and just clone them to start a new project.
I "feel" like it is overkill to use anaconda for things unrelated to 'data science' and the likes, but I'm not sure why I feel that way.
It kind of makes sense to use for other projects as well since you don't need to import all the things conda offers.
Do you have scipy/numpy/keras or cython somewhere in the deps? pipenv lock is slow, but not 20-30 mins slow unless there's a very very large download and/or a long compilation somewhere in there.
This wasn't for complex pipenv operations either. A simple command: pipenv run python main.py took progressively longer to execute.
I've never seen anything like that on a number of fairly large apps – a minute or two, at most. Are some of those dependencies extremely large or self-hosted somewhere other than PyPI?
I rely on it for pretty much everything and I didn't run into game breaking problems.
Anyway, for me the most annoying thing about the Python projects architecture (and about the whole Python perhaps) is that you can't split a module into multiple files so you have to either import everything manually all over the project trying to avoid circular imports or just put everything in a single huge source file - I usually choose the latter and I hate it. The way namespaces and scopes work in C# feels just so much better.
There's no concept of explicit versus implicit dependencies. You install one package, and end up with five dependencies locked at exact versions when you do `pip freeze`. Which of those was the one you installed, and which ones are just dependencies-of-dependencies?
If you're consistent and ALWAYS update your requirements.txt first with explicit versions and NEVER use `pip freeze` you might be okay, but it's more painful than most of the alternatives that let you separate those concepts.
Interesting, this has never been a problem for me. I've built some large tools and while it isn't fast, it's always completed in a few minutes.
pipenv lock 5.65s user 0.29s system 77% cpu 7.639 total
i think 7.6 seconds is fine for an operation that you'd rarely doit would probably take ages at work though. just opening a WSL terminal takes several seconds there, which is predictably instantaneous (<100ms) on fedora linux at home
After trying to replace pip with Pipenv, we had to stop. The dependency resolution time for 20 declared dependencies (that in turn pull down > 100 components) takes well over 5 minutes. With poetry - it takes less than 33 seconds on a clean system. The times are consistent for both Ubuntu 16.04 and Mac OS X.
Our only goal is to get to the point we're now in - tracking dependencies, and separate dev requirements (like ipython and pdbpp) from our other requirements. Poetry made it fast, simple, and made me an addict.
Over two days, I moved our entire codebase and every single (active) personal project I had to poetry. I don't regret it :)
There's a certain joy working with tools when it's clear that the person making those tools actually cares about the developer and making it work well.
Man, the Python packaging ecosystem is one of those things which really bring me down regarding the state of Python, because there is such an extremely high barrier for breaking backwards compatibility and nothing really works.
The JS ecosystem is far better in this regard. Pipenv was most promising because it followed in Yarn's footsteps, but it didn't go all the way in replacing pip (which it really should have). So now there's still a bunch of stuff handled by pip, which pipenv does not / cannot know about, and this isn't really fixable.
The end result is that instead of telling people about pip + virtualenv, we now have pip, virtualenv and pipenv to talk about. And people who don't understand the full stack, and the exact role of each tool, can't really understand how to properly do the tasks we choose to recommend delegating to each one of them.
There's three separate-but-related use cases:
- "Installing a library" (npm install; pip install).
- "Publishing a library" (setup.py. Or Twine if you're using a tool. Both use setuptools.).
- "Deploying a Project", local dev or production (pipenv. Well, if it's configured with a pipfile, otherwise virtualenv, and who knows where your dependencies are, maybe requirements.txt. Pipenv does create a virtualenv anyway, so you can use that. Anyway you should be in docker, probably. Make sure you have pip installed systemwide. Yes I know it comes with python, but some distributions remove it from Python. Stop asking why, it's simple. What do you mean this uses Python 3.6 but there's only Python 3.5 available on Debian? Wait, no, don't install pyenv, that's not a good idea! COME BACK!)
The JS ecosystem manages to have two tools, both of which can do all of this. I don't know how we keep messing up when we have good prior work to look at.
This makes the situation sound a lot more complex than it actually is by conflating separate layers: the system distribution issue is exactly the same for both Python and JS (if Debian ships an old v8 you either need to install a new one, perhaps using Docker to make that easy and isolated). Similarly, the question of whether you install the app using pip or pipenv is a different layer from whether you're using Docker or not, just as Docker is unrelated to the question of whether you use npm or yarn.
For a new project in 2018, you can simply say “Use pipenv. Deploy in Docker, using pipenv.” and it works as well as the JS world. People sometimes choose to make their projects too complicated or to manage things at the wrong level but that's a social problem which is hard to solve with tooling.
Most Node developers grew up with
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
or brew install node
or using one of the dozen other ways to install node. Distinct versions and per-project packages were the norm from day one. That was not true with Python.As a developer I definitely see the advantage of this approach and realize that it trivially solves some very hard problems (and causes a whole bunch of different problems if you're running Windows...). As an end user I'm not super thrilled about the prospect each tool I use coming in its own docker container and needing to spin up 20 different containers each time I want to do anything.
Agreed. You know you really blew it when even PHP does it better, and composer is unquestionably better than anything we've got in Python.
Have you, or are you not using explicit versions supplied by eg pip freeze?
This article seems well-written and well-intentioned. Despite reading it, I don't know why I would not have loose dependencies in setup.py and concrete, pinned dependencies in requirements.txt. It's never felt hard to manage or to sync up - the hard part is wading through all the different tools and recommendations.
How does that work? How would someone else coming to work on your project use them?
> concrete, pinned dependencies in requirements.txt
How do you maintain that requirements.txt? And while that might work for applications, what do you do for libraries?
pip install -e .
in a virtual environment. I thought this was quite well-established. Is there a problem with it that I'm not aware of? pip freeze > requirements.txt
for requirements.txt generation. For libraries just omit this? I'm not sure I understand the question. The article also mentions that several of the new tools aren't appropriate for libraries anyway.For libraries, I've been using Poetry for molten[1] and pure setuptools for dramatiq[2] and, at least for my needs, pure setuptools seems to be the way to go.
In my limited experience, Clojure's Leinengen is a far more pleasant way to solve these problems. I'm sure there are many other examples in other languages, but in the few I've used, nothing comes close. Each project has versioned dependencies, and they stay in their own little playground. A REPL started from within the project finds everything. Switch directories to a different project, and that all works as expected, too. It's a dream.
I haven’t used Pipenv yet but it works with pyenv to create virtual envs with a specified puthon version as well as all the correct packages.
Then I start reading (these comments)... mayyybe I should try Julia... or anything else, at least while I’m still getting started.
"pip install --user pkg" is good enough for beginners and writing libraries with few deps.
Part of the issue is how well Python integrates with non-Python dependencies. Before conda, when I wanted to upgrade some Python projects, I'd get errors complaining about my Fortran compiler. These days, most of the major projects upload precompiled binaries for major platforms to PyPI, but when it was just source code...
Using pip and virtualenv is usually fine or using pipenv .
Perhaps “Primer” was the wrong word, but I believe the sentiment is valid. Reading the comments, there is simply no consensus. If code readablility is important since code will be read more than written, packing is important because in many many scenarios that count code will be distributed more than it will be read.
It is simply frustrating that the typical response to comments like my original comment is some form of “it’s not as bad as you think”. Look, we have a problem here. A problem many other languages deem important enough to solve upfront. It’s been a problem for a long long time.
In the future — you have a set of entry points to your program, these are crawled by the language aware tool chain to identify and assemble all the requirements for the program (including 3rd party functionality). There’s no need for separate tools to manage packages, caches, and virtual environments — let’s just put all this logic into the compiler(s) — where necessary let the application describe the necessary state of the external world and empower language toolchains to ensure that it’s so ... let’s live in the future already ...
I have seen poetry is working on their bootstrapping story. I could not get their current solution to work on Ubuntu. Maybe what they are developing towards will work.
Often there is a very simple way, even in Python packaging and deploying. In my situation the easiest way began along these lines --
1. Install python3 for the local user from the source distribution (make sure you have compilers etc that the configure check lists out)
2. After compiling the sources and finishing with 'make install', make Python available in your local search path
3. And use pip with this magical --user flag as needed. No virtual env, conda, etc etc.
4. Leave HOMEPATH etc alone as this conflicts with the setup of the admin's system wide installs (when you su)
Things can go smoothly with pip alone.
For library development I target pure pip/setuptools but still use pipenv during development phase. There have been a few cases where pipenv had problems and I had to either remove my virtualenv and reinitialize it or even remove my pip-file/lockfile, but since I still have my setup.py it's not a big deal for me.
As for uploading etc I use twine but I wrap everything in a makefile to make handling easier.
A problem I noticed recently was a case where one of my developers used a tool which was implicitly installed in the testing environment since it was a subdependency of a testing tool but it was not installed into the production image. This resulted in "faulty" code passing the CI/CD and got automatically deployed to the live development environment where it broke (so it never reached staging). Caused a little bit of a headache before I found the cause.
That's what Go and tools like Bazel allows for : static builds, which forces to modularize the project into smaller independent components.
In case of static builds, the protocol between components is the C ABI, or an RPC protocol, but it could be a mesh of microservices too.
What is currently happening with the explosions of tools with Python is the result (take it with a grain of salt, only my opinion) of people only working with Python and not exploring enough outside of it
A Python project does not only depend on Python modules, but non-Python modules as well. Beyond Python, conda helps manage your other dependencies, like your database. I use Miniconda instead of Anaconda, to avoid the initial mega-download.
I recently wrote about it as blog post, using conda within containers has solved almost every pain point we had with python packaging and how to get things into production reliably.
Now most projects have wheels pip is pretty damn good.
The conda CLI is also just terrible. It's good for ad-hoc research, but for big deployments? No thanks, I've had enough pain using it.
The biggest pain point of Pipenv for me is that it cannot as yet selectively update a single dependency without updating the whole environment.
There are even languages like C++ where the community as a (w)hole has given up on that topic and instead opts for completely building every tool by building the underlying libraries up first manually.
Considering all this, who can actually beat Python at this point? Java maybe? Is Ruby still competing? How is NodeJS doing?
Currently with what I see around me (mostly Go and C++) I don't feel too bad about setuptools+pip+virtualenv anymore.
What should the alternative be now?
Edit: I'm reading about twine right now, but I cannot begin to comprehend why it's not bundled directly if this is what they are intending for us to use to upload packages.
If the problem is that stdlib cannot move as fast as PyPI-related development requires, maybe that should be fixed, rather than trying to bypass all quality checks and then relying on obscure shared knowledge to navigate the ecosystem. Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.
> Maybe there should be a system where specific network-sensitive stdlib modules could be updated faster than the rest.
This is essentially what `setuptools` does, by putting a package on PyPI that monkeypatches/plugs in to the stdlib.
The reason for this is that right now, that command comes from `distutils`, which is part of the standard library. There is a huge disadvantage to bundling this functionality with your Python distribution, namely that it can only get upgraded when you upgrade your Python distribution. A lot of folks are still running versions of Python from several years ago, which is fine, but it means that they are missing out on anything new that's been added in the meantime.
For example, earlier this year, we released a new package metadata version which allows people to specify their package descriptions with Markdown. This required a new metadata field, which old versions of `distutils` know nothing about.
Upgrading `distutils` to support it would require that these changes go though the long process of making it into a Python release, and even then they would only be available to folks using the latest release.
Moving this functionality from `distutils` to a tool like `twine` means that new features can be made available nearly immediately (just have to make a release to PyPI) and that they're available to users on any Python distribution (just have to upgrade from PyPI).
The `distutils` standard library module comes from a time when we didn't have PyPI and thus, didn't have a better way to distribute this code to users. We have PyPI now though, so bundling `distutils` with Python is becoming less and less useful.
[1]: https://packaging.python.org/tutorials/packaging-projects/#u...
The article mentions Pipsi is designed to make command-line apps globally accessible, and I'll try it out.
Additionally, adding git/src/package/module.py may be fine when you're using an IDE, but when browsing in a file manager, you must navigate 3 directories deep to even see any source files, which seems to be trending towards the inconvenience amd pain of Java projects.
I just pin the project's direct dependencies in the setup.py file and install the folder directly. I know it might cause bugs with different developers (or the CI) using different versions of the upstream dependencies but I guess I trust the developers who create each library I'm using. The moment I directly import something from what used to be an upstream dependency, I pin it too.
So far this approach hasn't given me trouble, but I'll still take a look at poetry based on what I read in the comments here.
edit: I requested an exemption but corp IT staff came back and said there's definitely been malware identified on that site. So... be careful with your clicks.
edit2: Well who knows where the malware alert is coming from, might be an ad or something.
Being able to select P2 or P3 environments is great.
Unfortunately it decided all my packages were in /var/mail.
No patience to debug it, so I gave up on it.