The JankStack in which one piles Python environment jank on top of Ubuntu/Darwin jank, and piles Docker jank on top of the previous, and piles Docker Compose jank on the previous, until you finally arrive at Jank-As-A-Service via something like ECS or EKS gives a terrifyingly comforting illusion of such with roughly the risk profile of speedballing hard drugs while simultaneously free climbing El Capitan. It has the nice ancillary benefit of subsidizing some combination of mega-yachts and private space programs, so that's cool I guess.
Sooner or later you're going to link CUDA, or glibc, or some other thing that just doesn't play like that. And then your are capital-F fucked if you didn't invest early on in some heavy-metal hermitic shit and some Gazed into a Palantir foresight around feature flags.
None of those things is a silver bullet though. They are among the best tools for that class of job, but while Nix and Bazel both have facilities for coping with multi-platform builds with difficult dependencies, you still work hard for it. Nix/NixOS will get you just about perfect builds on a given OS/CPU pair, and Bazel has a lot to offer there too, but when targeting `aarch64-darwin` for local development on Macs and `x86_64-linux` on the deployment target, it's not a free lunch.
The documentation and OSS community support and stuff like that has gotten better (a lot better) in the last year or two for both of those things, but I still can't in good conscience recommend either to a shop that doesn't already have experienced staff (or staff that is passionate about getting that experience): it's a steep learning curve because the problem is fundamentally insane in practice if not in theory.
For folks that want code to run the same in production as in development and can't categorically commit to languages and libraries that religiously eschew diverse native dependencies but still don't want to go through the looking glass with e.g. Nix the best bet in my experience is to provision dev servers adjacent to the production infrastructure and lean on a massively improved remote development experience in e.g. VSCode compared to even a few years ago.
Don’t demand your language do the job of the operating system and come with a package manager. Your distro has packages that are guaranteed to play well together, use them.
Off the top of my head I can't think of any language other than Python that manages to be a constant and persistent cavalcade of self-inflicted environment and dependency chemical fires on absolutely every single one of them. Pointing this out usually generates a bunch of "pipenv/poetry/uv/whatever works fine for me", and for those folks I am very happy that they don't deal with heavily native extension-backed requirements. Mazel tov.
It's a meme: https://xkcd.com/1987/
I’ve never worked in an org that had all machines 100% unified on a single distro from dev to prod.
That means writing a test should be easy and running it should be fast (including any compilation steps). As soon as something takes more than 30 seconds, you've lost a lot of people. They've switched tabs. They're on HN or reddit or they've pulled out their phone.
You've broken the flow.
Some people can work effectively in an environment where it takes 1-10+ minutes to build something and then you run enough code to test a bunch of changes at once. You might even have multiple clients open at once and you switch between. This doesn't suit me and it doesn't suit a lot of people I've worked with.
Where does local storage fit in? Any test you write will probably need data. You don't want to mock out your data storage. You just want to use the same API and have it be backed by a hash map (or whatever) and have it easy to populate that with data.
Once you have that, local data for something interactive like a website becomes a natural extension.
This made me switch back to JS and Python. The frictionlessness of "I made this change and I see the result in the same breath" is very compelling to me. Fortunately there are fast TypeScript compilers now, which make the process more or less tolerable (in watch mode).
I seem to be in the minority here, preferring instant responsiveness. When I run Windows XP in a VM, and press Win+E, an explorer window is presented to me in the next frame, fully rendered. When I do this on the host OS (Win10), it takes several hundred MS for the OS to draw a window, and then I watch with sadness for several moments as the UI elements are drawn one by one. If they had done nothing for twenty years, it would still be perfect.
I'm also reminded of Casey Muratori doing the Visual Studio questionnaire. They asked how long do you think it should take to launch. The lowest option was "ten seconds", and he found this comical and depressing. VS used to be much faster, too.
But I'm with you on fast compilation and at Google this was the case, even with C++. There was a ton of infrastructure made around this that wouldn't be easy to replicate but I could certainly compile and run a C++ unit test in <20 seconds most of the time.
With JS and Python, I can iterate on code in a REPL. Iteration time is measured in milliseconds.
The somewhat-tested code then goes into the code file to be actually tested as part of the product.
ADHD can make this worse since a time delay means getting distracted for much longer.
I have been working on testing a single issue for months. I wrote the fix the first day, but because it can only be tested in the remote development environment. An environment that I have zero insight into how it works and also requires coordinator cross-team members to work together for several hours together to test the fix! ...Because if we need to make a change pushing takes twenty minutes to test and build the new environment.
> That means writing a test should be easy and running it should be fast (including any compilation steps). As soon as something takes more than 30 seconds, you've lost a lot of people. They've switched tabs. They're on HN or reddit or they've pulled out their phone.
And here I am, compiling tailwind CSS on a blade based template consisting of hundreds of partials. ~30 seconds then refresh. FML.
So we develop the code and deploy into QA, often breaking things. Cycle times are measured in hours to days. Breaking QA is sometimes a fireable offense. Lol.
The leads and architects are correct in my case; it would be impossible and too expensive to do. This is because our services are built on hundreds of bespoke manual configurations made along the way to production. Discovering and pulling into code/automation would be a whole months/year long project in itself.
That said there are ways of developing locally without running everything locally. Pull in the code of the service you want to work on locally and just point to QA for its dependencies. Most times it takes some finagling of code but it usually works.
Even if everything was running locally, often generating usable data is the biggest barrier. In QA thousands of hands have generated partly useable data. It’s a hard problem to solve since I don’t want to have to know about data requirements for every service.
That being said, you should also have a dev environment. QA isn't for development. It'll certainly be cheaper than firing people.
But it does mean that you have to be religious about building links and connection-strings in your code.
By “local” I mean self contained, my “local” environment is right now a single cloud Linux VM with kind for kubernetes, all sorts of k8s local monitoring / cert issuing / stubbed things, metallb and openvpn to access (with a local to my computer dhcpcd for DNS resolution) and so on, all building from local and orchestrated by a python set of scripts. All of this works great and has saved me so many times and made development so much faster compared to teams that rely on ci/cd even for basic development (where every commit takes an hour+ to deploy due to tests builds and so on)
The only issue with this approach can be services / software that has hard dependencies on specific cloud products you can’t run “locally” but even in that case you can always spin up / tear down super small versions of them to use for the setup. Python has been great to glue orchestrating everything with easy to use kubernetes and docker libraries, and aws/azure cli commands as needed.
If you have a product that you can’t easily build or deploy in an automated manner you’re going to have a bad time at disaster recovery time or when the one person that knew how that one piece gets installed is laid off.
When I first joined travel company Orbitz they had a build that took just over 90 minutes. Of course it was a Java first environment, so I needed to test a small one line change to the UI (that had absolutely to do with Java) I still had to wait 90 minutes. So, I just planned on doing nothing all day.
In my personal software I start crying if my builds take longer than 12 seconds. The difference is that I really enjoyed getting paid to watch YouTube for 90 minutes stretches 5 or 6 times a day. It wasn't my time. It was their time. With my own software, though, it absolutely is my time.
Small improvement increases to trivial items is a cost savings that adds up. Its like a flash flood. A rain drop isn't going to drown you. A bunch of rain drops are still insignificant, but when there are too many rain drops you are under water.
Just think about how floods work, because its still just tiny rain drops. One increase does almost nothing on its own, but it enables other things to occur more frequently. When that happens everywhere you have a performance flood. Suddenly you can test automate several hundred features of your application end-to-end in less than 10 seconds. When that does occur it changes peoples behavior and their perceptions of risk, because now all options are discoverable in a nearly cost free manner for everyone. You will never ever get to experience that freedom if you are drowning in dependency and framework stupidity.
On the one hand, maintaining local development environments that work reliably across larger team if developers is a HUGE cost in terms of time and effort. Developers are amazing at breaking their environments in unexpected ways. It’s common for teams to lose many hours a week to problems related to this.
So I love the idea of cloud-based development environments where, if something breaks, a developer can click a button on a website, wait a few minutes and have a fresh working environment ready to go again.
But… modern laptops are getting SO fast now. It’s increasingly difficult to provide a cloud-based environment that’s remotely as performant as a M2 or M3 laptop.
* in every single case I’ve seen, the developers who broke their local environments that badly had significant skill deficits which affected their general productivity. Investing in training paid dividends far beyond not having to deal with their local environment getting hosed since they also stopped creating massive security and performance problems in production.
* building a cloud-based development where someone can refresh it automatically means you’ve identified a reliable process for installing and configuring everything. That same process can be run locally, probably using the same container definitions for everything except the parts they’re focused on.
I mirror this sentiment.
Another one that I've commonly seen: a bit of a yellow flag if a developer is struggling with Git. Becomes a red flag if the issues continue to happen after training. Usually means they don't understand other technical fundamentals.
I worked on a large project that needed to support development on many platforms. The amount of scripting for the git hooks and build hooks alone required its own development team. When subtly incompatible changes in git hooks rolled out, you would see a few people that ran commands in just the wrong way to bork their repository. These commands are usually fine, but when combined with an otherwise invisible change in the hooks, would just break things. Now the actual breakage rate was low, like 1 in 1000 in a given month, but with as many developers as were working on this project that was always a couple of people that had to nuke their local repo from orbit (it's the only way to be sure).
While I am very pro-containers as a deployment target I am also very resistant to containers in the critical feedback loop of local dev, chiefly due to iteration speed and performance/battery overhead, which was much more glaring when the Intel chips were all we had.
I can't endorse Nix in any capacity these days but it did seek to address multi platform environmental consistency, without containerization as your LCD.
With cloud, they can't. They could drop their laptop out of a window and still be up and running productively a few minutes later (given access to a computer with a keyboard and a web browser).
Until the cloud development environment inevitably breaks, and your entire development team’s productivity drops to zero.
I assert that such a team will be a lot more productive than the exact same sized team dedicated to local development support instead.
Ideally both options would be available using a wrapper that ensures consistency across CI, cloud dev and laptop dev environments but I havent had that luxury anywhere.
with nixos, you can have a single default.nix and serve from a binary cache.
This has the added benefit of ensuring that developers don't interfere with each other during development. Your Kafka messages will only be picked up by your processes. Your database changes don't impact anyone else until they're pushed.
It certainly helps, but there's still SO MUCH that can go wrong.
However the most productive I've been was on a team where local development, even on a single service, was eschewed for the most part. We had tests that would cover 90+% of the codebase - more importantly nearly 100% of what was worth testing - and would routinely deploy things into staging from master that had never even been executed as a whole app, let alone run in anger and tested with live dependencies. The coverage was good enough that everything actually held together really well.
I'd never shipped anything that efficiently prior or since, it totally changed my view of TDD as being a time-consuming but safe and conservative regime vs my experience of it dramatically speeding up iteration.
The only thing that started gnawing at my brain (while admittedly operating a far lesser sized constellation of distributed services than, say, Facebook) was that there is no way of unit testing (or even statically verifying with TLA+ or something) the wider-scale structure of services, at least that I'm aware of. At some point I might knock something together involving specs and code generation but I dunno.
I have been arguing with myself on what a good dev-env is for as long as I have been working in enterprise software dev which by now is about 10 years.
All I know is that attempts to reproduce prod via tilt and similar have not been successful (I have witnessed 3 attempts). The promise of "one dev env for all teams" quickly falls apart.
The main problems with this are IMO: - 1. there is a false sense of encapsulation of complexity: you don't need to know how the components of non-owned services work, until something breaks (and break it will), and then you really do. - 2. #1 + docker + k8s + kafka + graphql etc... make complexity seem very cheap. - 3. add a minimum of deadline pressure on teams, and quickly they will stop caring about keeping the dev-env images up-to-date and/or working.
I would rather have intimate familiarity with what my services depend on, which is much easier to get by running these dependencies somewhat manually, close to natively. You can be sure that your colleagues will come knocking if complexity is not respected.
But this seems hard to package as a product...
Reproducible builds are what I've found to be the most important part of any local dev experience; standing up local databases, message queues etc required for whatever is being developed has always been relatively simple in comparison.
My favorite argument against local development, however, is that isolation is a bug, not a feature.
When I want to show another developer what I've built, or get help debugging an issue, I don't want to have to call them over to my laptop or do a screensharing session. I want them to have access to the same machine that I have access to, with the same data and configuration, and having cloud-based dev boxes enables that.
The local emulator suite is one of the best I've seen[0] (would love to see others). Powerful and easy to set up[1].
It includes a top notch emulator for auth which gives you a full SSO flow as if going through Google's OAuth flow and makes it easy to get otherwise complicated auth flows nailed down. The database and Functions runtime emulators are excellent and make it easy to prototype and ship. Comes with a Pub/Sub emulator to boot for more complex async processes. You can export the emulator state to disk so you can share it or bring it back up into the same state.
If you need to interface with relational DBs, you just use a Pg or MySQL container.
Really phenomenal and would love to find other stacks with a similarly solid emulator suite. It's a strong recommend from me for any teams that value speed because it really allows much, much faster iteration.
Edit: Dear GCP team, please, please - never kill this thing.
[0] https://firebase.google.com/docs/emulator-suite
[1] https://github.com/CharlieDigital/coderev/blob/main/web/util...
If you really need a 3rd-party Auth solution then it could be good for that. Otherwise I would recommend to sticking to open source tools on top of a hosting provider that gives you containers and a managed (postgres or mysql) database.
I run a mix; some apps are pure Firebase and Functions. Some run Cloud Run serverless containers connected to Supabase upstream and Pg container locally.
Firebase is quite flexible as a chunk of it is a facade for Google Cloud services (storage, auth API, CDN, Cloud Run).
Very flexible model of upscaling services progressively. Functions -> Cloud Run -> GKE Autopilot -> GKE
Attempting to do all of this on the local machine will mostly work, but it fails to exercise a lot of the networking concerns (public IP detection, port assignment, etc.), and weird edges crop up as latency grows beyond 0ms. It also makes it impossible to test with other players on other LANs without reaching for complicated networking setups that add even more confusion when things go wrong.
I could write a bunch of bandaid "if editor attached" code throughout, but I also like the idea that I am testing the final thing on the ~final hardware and there isn't going to be any weird dragon fight after this.
For doing end to end tests, that is likely required! But as the project matures over time hopefully you can carve out code which can be tested without the network stack.
Also, these days, I use Cloudflare more and more. They're very affordable, and deployment is a breeze for the simplest cases. But local development seems to be an afterthought. I built a service that uses some of their dev offerings. Some work locally (using Miniflare), and some can only work remotely (dev environment in your Cloudflare account). Imagine when you need both kinds of offerings!
Local deployment is not negotiable.
And I still come back to Spolsky's "Joel Test": there must be 1 command with no extra steps that you run to get running version of the software on your local developer machine.
It’s a constant fight though.
Identified every single thing needed to stand up a functional version of your app and made the rails to do so in a repeatable fashion.
You should do that anyway, and whether the dev environment is true local or runs using some special k8s hooks to some dev clusters or what have you is immaterial.
I think fuck it. I run my own company where I develop software for clients and the last thing I need is for my environment and tools is to be controlled / selected by some corporate imbecile.
Or does everyone have their own server that they ssh into? Cause I consider that as close to local as you can get without running stuff on your own laptop. Would be very annoyed though those few times I work on a plane or a train with spotty 4G/5G.
I am not sure how much web development you have done, but the "works on my machine" problem can and does show up. And not just machines, different browswers and different browser versions. Unless you go very simple, you can get a graphic designer to point out plenty of problems for you.
Stress testing the application by stretching it like a rubber sheet to wrap around as many different operating systems as possible is a useful way to iron out various bugs that affect more than one system but may not have been triggered in an easier development process.
Running the application locally is also a way that many people first download and try it, so ensuring a reasonable experience on a laptop is quite important. Iterating on front-end code with a TLS certificate from mkcert provides access to all the JavaScript APIs you’d expect to see on the public internet under TLS.
Running the browser back-end on different OS and architecture targets is a good way to control for or “fuzz against” the quirks you might see in interactions between the operating system, file system layout, the behavior of common Linux-style command line utilities, and the creation of different users and running processes as them. Many of these things have slightly different behaviors across different operating systems, and stretching BrowserBox across those various targets has been one of the strong methods for building stability over time.
My respect for Bash as the lingua franca of scripting languages has grown immensely during this time, and I’ve felt validated by the Unix-style approach where commonly available tools are combined to handle much of the OS-level work. Adaptations are made as needed for different targets, while a lot of the business logic is handled in Node.js. Essentially, this approach uses Bash and Linux philosophy as the glue that sticks the Node.js application layer to the substrate of the target operating system. I made this choice a long time ago, and I’m very satisfied with it. I increasingly feel that it’s the validated approach because new features requiring interaction with the operating system, such as extensions or audio, have been well-supported by this design choice for building the remote browser isolation application.
An alternative approach might be to stick everything into first-class programming languages, seeking a Node library for each requirement and wrapping anything not directly supported in a C binding or wrapper. But I’ve never found that practical. Node is fantastic for the evented part of the application, allowing data to move around quickly. However, there are many touchpoints between the application and the operating system, which are useful to track or leverage. These include benefits like security isolation from user space, permissions and access control, and process isolation. The concept of a multi-user application integrated with a Unix-style multi-user server environment has been advantageous. The abundance of stable, well-established tools that perform many useful tasks in Unix, and targets that can run those tools, has been immensely helpful.
On the front end, the philosophy is to stay at the bleeding edge of the latest web features but only to use them once they become generally available across all major browsers—a discipline that is easier to maintain these days as browser vendors more frequently roll out major, useful features. There’s also a policy of keeping top-level dependencies to a minimum. Essentially, the focus is on the Node runtime, some WebRTC and WebSocket libraries, and HTTP server components. Most of the Node-side dependencies are actually sub-dependencies and not top-level. A lot of dependencies are at the operating system level, allowing us to benefit from the security and stability maintained by multiple package ecosystems. I think this is a sound approach.
Porting everything to a Windows PowerShell-type environment was a fascinating exercise. For the front end, having virtually no dependencies except some in-house tools fosters faster iteration, reduces the risk of breaking changes from external sources, and minimizes security risks from frequently updated libraries with thousands of users and contributors. Some of the ways we’ve approached security by design and stability by design include adopting a development process that is local-first but local-first across a range of different targets.