* Downloading any new dependencies to a cached folder on the server (this was before wheels had really taken off) * Running pip install -r requirements.txt from that cached folder into a new virtual environment for that deployment (`/opt/company/app-name/YYYY-MM-DD-HH-MM-SS`) * Switching a symlink (`/some/path/app-name`) to point at the latest virtual env. * Running a graceful restart of Apache.
Fast, zero downtime deployments, multiple times a day, and if anything failed, the build simply didn't go out and I'd try again after fixing the issue. Rollbacks were also very easy (just switch the symlink back and restart Apache again).
These days the things I'd definitely change would be:
* Use a local PyPi rather than a per-server cache * Use wheels wherever possible to avoid re-compilation on the servers.
Things I would consider:
* Packaging (deb / fat-package / docker) to avoid having any extra work done over per-machine + easy promotions from one environment to the next.
Even at the time I thought Docker would be a great solution to the problem, but the organization was vehemently against using modern tech to manage servers and deployments, so I ended up writing that tool in bash instead. Good times.
We're moving to the Docker approach, which is really nice, but it does change the shape of the whole deploy pipeline, so it's going to take some time.
>Use a local PyPi rather than a per-server cache
I stil prefer a per-server cache. A local pypi is another piece of infrastructure you need to keep alive. You don't have to worry about the uptime of an rsync playbook.
Their first reason (not wanting to upgrade a kernel) is terrible considering that they'll eventually be upgrading it anyways.
Their second is slightly better, but it's really not that hard. There are plenty of hosted services for storing Docker images, not to mention that "there's a Dockerfile for that."
Their final reason (not wanting to learn and convert to a new infrastructure paradigm) is the most legitimate, but ultimately misguided. Moving to Docker doesn't have to be an all-or-nothing affair. You don't have to do random shuffling of containers and automated shipping of new images—there are certainly benefits of going wholesale Docker, but it's by no means required. At the simplest level, you can just treat the Docker contain as an app and run it as you normally would, with all your normal systems. (ie. replace "python example.py" with "docker run example")
If they're running ubuntu 12.04 LTS they can keep the 3.2 kernel until late 2017. That's 2 more years. And they wrote "did not", so it was likely the situation months ago, not yesterday.
> (not wanting to learn and convert to a new infrastructure paradigm) is the most legitimate, but ultimately misguided
It depends on the amount of stuff they deploy. If they handle everything using Ansible (and from the list it looks like they do), then it's months of work to migrate to something else. They may need the right users / logging / secret management in the app itself, not outside of it.
It's not. It would be months of work if they wanted to convert all their Ansible code to Docker, but that's by no means required.
Docker and Ansible can easily coexist peacefully.
Edit: found https://py2deb.readthedocs.org/en/latest/comparisons.html
that said, for python files and simple packages it works well enough!
One of the significant tradeoffs to this approach is you lose the carefully-crafted tree-of-dependencies that the distros favor, so it makes the package pretty much automatically unacceptable to package maintainers.
However, being able to have install instructions that amount to "yum/apt-get install <package>" is pretty great.
I am hoping for an app/container convergence at some point, but we might need to drop the fine-grained dependency dream and have them be more self-contained, like Mac OS X apps.
We also incorporate a set of meta packages which means we can have multiple codebase versions installed and switch the "active" one by installing the right version of the meta-package. There's also meta-packages for each service running off the same codebase, which deals with starting/stopping/etc.
Basically, what it comes down to a build script that builds a deb with the virtualenv of your project versioned properly(build number, git tag), along with any other files that need to be installed (think init scripts and some about file describing the build). It also should do things like create users for daemons. We also use it to enforce consistent package structure.
We use devpi to host our python libraries (as opposed to applications), reprepro to host our deb packages, standard python tools to build the virtualenv and fpm to package it all up into a deb.
All in all, the bash build script is 177 LoC and is driven by a standard build script we include in every applications repository defining variables, and optionally overriding build steps (if you've used portage...).
The most important thing is that you have a standard way to create python libraries and application to reduce friction on starting new projects and getting them into production quickly.
https://www.datadoghq.com/blog/new-datadog-agent-omnibus-tic...
It's more complicated than the proposed solution by nylas but ultimately it gives you full control of the whole environment and ensure that you won't hit ANY dependency issue when shipping your code to weird systems.
Also, are there seriously places that don't run their own PyPI mirrors? Places that have people who understand how to integrate platform-specific packages but can't be bothered to deploy one of the several PyPI-in-a-box systems or pay for a hosted PyPI?
Yes. I've seen them, and they've been huge shops.
Only in cases where you don't have wheels depending on external libraries. If you do, you should still package with the right dependency constraints. Otherwise you can install a wheel which does not work (because of missing .so)
Deploys are harder if you have a large codebase to ship. rSync works really well in those cases. It requires a bit of extra infrastructure, but is super fast.
Come from the same island as you, trust me. But the more you learn about this the more you see how complex it is. You can't even say that one solution is better than the other (like apt vs yum). Each and every one of them has their pros and cons. And more often than not architectural decisions make it impossible to get both solutions into the same system working together.
rSync is not deploying. It's syncing files. But even if you have a 1:1 copy from your development computer on a server it still might not work because on that server package xyz is still in version 1.4.3b and not 1.4.3c. Deployment is getting it there AND getting it to work together nicely and maintainable with the other things that run on that computer/vm.
I've been bundling libs and software into a single virtual environment like package that I distribute with rsync for a long time - it solves loads of problems, is easy to bootstrap a new system with, and incremental updates are super fast. Combine that with rsync distribution of your source and a good tool for automating all of it (ansible, salt, chef, puppet, et al) and you have a pretty fool-proof deployment system.
And a rollback is just a git revert and another push away -- no need to keep build artifacts lying around if you believe your build is deterministic.
- how do you know which version you're running right now?
- how do you deploy to two environments where different deps are needed?
- how do you tell when your included dependencies need security patches?
For server-side apps like this, that usually means a Deb or an RPM. These systems handle upgrades, rollbacks, dependencies, etc.
Just because some people decide that writing an RPM specfile or running dh_make is too hard to work out, doesn't mean that the solution doesn't exist.
For someone trying out building python deployment packages using deb, rpm, etc. I really recommend Docker.
forget virtualenv; forget package dependencies on conflicting versions of libxml; forget coworkers that have 3 different conflicting versions of requests scattered through various services, and goddamnit I just want to run a dev build; forget coworkers that scribble droppings all over the filesystem, and assume certain services will never coexist on the same box
just use docker. It's going to go like this:
step 1: docker
step 2: happy
Indeed, we actually use Docker to build packages. Blog post coming soon, maybe.
In the meantime you can get a taste with Lattice[0].
one of which was just silly (kernel version -- are you living on that point release forever?)
one of which was valid (necessity to maintain method for distributing docker images), but probably dumb: you only get so many innovation points per company, and innovating on a problem docker just solves means you are supporting your in-house solution ad infinitum
and one of which definitely sounds painful (docker vs extant ansible playbooks)
On the app end we just build a new virtualenv, and launch. If something fails, we switch back to the old virtualenv. This is managed by a simple fabric script.
Bitbucket and GitHub are reliable enough for how often we deploy that we aren't all that worried about downtime from those services. We could also pull from a dev's machine should the situation be that dire.
We have looked into Docker but that tool has a lot more growing before "I" would feel comfortable putting it into production. I would rather ship a packaged VM than Docker at this point, there are to many gotchas that we don't have time to figure out.
git clone --depth=1 path/to/repo
when doing a clone for a deploy, since you don't need the historyedit: but yes, cloning as a developer will take a long time. But, if it really gets out of hand, I can hand new devs a HDD with the repo on it, and they can just pull recent changes. Not ideal, but pretty workable
It's really not hard to deploy a package repository. Either a "proper" one with a tool like `reprepro`, or a stripped one which is basically just .deb files in one directory. There's really no need for curl+dpkg. And a proper repository gives you dependency handling for free.
For example I find the --instdir option to dpkg but it still would have to be downloaded from the other host, unless of course the folder was mounted somehow.
You can set a different base path in debian/rules with export DH_VIRTUALENV_INSTALL_ROOT=/your/path/here
Do people really do that? Git pull their own projects into the production servers? I spent a lot of time to put all my code in versioned wheels when I deploy, even if I'm the only coder and the only user. Application and development are and should be two different worlds.
/etc/default/mycoolapp.conf
Debian packages have the concept of 'config' files. Files will be automatically overwritten when installing a new version of package FOO, unless they're marked as config files in the .deb manifest. This allows you to have a set of sane defaults, but not to lose customisations when upgrading.
When I used this approach with a Django site years ago using RPM[1] we used the pattern vacri mentioned or the reverse one where you have an Apache virtualhost file which contains system-specific settings (hostname, SSL certs, log file name, etc.) and simply included the generic settings shipped in the RPM.
In either case the system-specific information can be set by hand (this was a .gov server…), managed with your favorite deployment / config tool, etc. and allows you to use the same signed, bit-for-bit identical package on testing, staging, and production with complete assurance that the only differences were intentional. This was really nice when you wanted to hand things off to a different group rather than having the dev team include the sysadmins.
1. http://chris.improbable.org/2009/10/16/deploying-django-site...
1. Create a python package using setup.py 2. Upload the resulting .tar.gz file to a central location 3. Download to prod nodes and run pip3 install <packagename>.tar.gz
Rolling back is pretty simple - pip3 uninstall the current version and re-install the old version.
Any gotchas with this process?
So at some point, as you know you'll need to move on.
There are no git dependencies in the process I describe above.
The pip drawback that is discussed in the post is of PyPi going down. In the process described above there is no PyPi dependency. Storing the .tar.gz package in a central location is similar to Nylas storing their deb package on S3.
I vaguely remember .deb files having install scripts, is that what one would use?
- your app user doesn't need rights to modify the schema
- you need to handle concurrency of schema upgrades (what if two hosts upgrade at the same time?)
- if your migration fails it may leave you in the weird installation state and not restart the service
Ideal solution: deploy code which can cope with both pre-migration and post-migration schema -> upgrade schema -> deploy code with new features.
If your migration system is smart enough (or you can easily check the migration status from a shell script) you could also do this in a multi-app-server environment too.
So how is this solving the first issue? If PyPI or the Git server is down, this is exactly like the git & pip option.
How has your experience with Ansible been so far? I have dabbled with it but haven't taken the plunge yet. Curious how it has been working out for you all.
I'm looking to do something pretty similiar, but RPMs. I found rpmvenv that seems to work in the same fashion. https://pypi.python.org/pypi/rpmvenv/0.3.1
If a company wants to use Docker that's their choice, but I don't think its at all reasonable to insist on or only support that environment as a software vendor. If it works on Debian, give me a .deb or even better an Apt Repo to use.
With that said, Conda is not a perfect solution. One thing that can be frustrating is that a package can include compiled code (shared objects/dylibs) that may be incompatible with your system. Unfortunately, while you can indicate dependencies on other conda packages, python versions, etc there isn't currently a convenient way to indicate things like GLIBC dependencies.
cf push some-python-app
So far it's worked pretty well.Works for Ruby, Java, Node, PHP and Go as well.
You'd use it for one in your own data centre, or Pivotal Web Services[0], or BlueMix. You point it at an API and login, then off you go.
If you need something more cut-down to play with, Lattice[1] is nifty, but currently doesn't do buildpack magic.
No, the state of the art where I'm handling deployment is "run 'git push' to a test repo where a post-update hook runs a series of tests and if those tests pass it pushes to the production repo where a similar hook does any required additional operation".
Looks like these guys never heard of things like CI.
This is the core of how we deploy code at Nylas. Our continuous integration server (Jenkins) runs dh-virtualenv to build the package, and uses Python’s wheel cache to avoid re-building dependencies.