Software Infrastructure 2.0: A Wishlist (2021) (opens in new tab)

The one thing I want, that doesn't exist, and won't for at least 10 years: immutable infrastructure.

Oh, the concept exists. I can make some infrastructure mostly-immutable, myself. But the cloud doesn't give me it out of the box. What the cloud gives me are APIs. If I write software to call those APIs, predict what the allowed values are, predict the failures I might see, write about 5,000 lines of code to handle the failures, attempt to reconcile differences, retry, store my artifacts, reference them, after implementing a build system, etc, I can get one or two things to be immutable. But for the vast majority of services it's actually impossible.

Take an S3 bucket. Can you make an S3 bucket immutable? The objects inside it might be versions, sure. Can you roll back all the objects in the bucket to Version 123? Can you roll back the S3 policy back to revision 22? Can you make it also roll back the CORS rules? Can you diff all these changes and see a log of them? Can you tell the bucket to fix itself back to the correct expected version of itself? Can you tell it to instead adopt 3 new changes, as part of a version of the S3 bucket you tested somewhere else? The answer is "no".

You can fake it, with a configuration management tool like Terraform. But that's as immutable as a file on your filesystem. Any program can overwrite your files at any time; you have to have Puppet configured to monitor your files, and constantly fix the files when they get changed, track the Puppet code in Git, keep your own log of changes, etc. That filesystem isn't immutable, it's mutable! If it was immutable you wouldn't have to use Puppet (or Terraform). And the sad thing is we're all stuck on Terraform, which is actually terrible for a configuration management tool, because it mostly refuses to reconcile inconsistencies (the way every other configuration management tool in history has). It just bombs out and says "Oh shit, that wasn't a change I planned, and you didn't write this HCL code to handle this weird condition, so I'm just gonna bail and not fix this. Good luck getting production working again." Puppet wouldn't stop working if something other than Puppet updated a file. But nobody seems to mind that we literally regressed in functionality, because a company made up new marketing terms for their tools.

Sadly this desired built-in immutability, and the declarative nature of it, won't be built into S3 or other tools for at least a decade or two. They would need to effectively build something akin to K8s just to manage their own components immutably and expose an entirely new API. So we are doomed to do Configuration Management in the cloud, until the cloud starts implementing immutability out of the box.

pdimitar2y ago

Yeah, sadly true. While I am not a platform engineer I've witnessed their plight many times and I truly sympathize.

Now more than ever because I started making an effort to self-host much more than before... the amount of scripts I have to write just to achieve idempotency, nevermind immutability, is staggering, and I am already questioning my approach. Will likely start making use of ZFS or BTRFS snapshots, or I don't know, I'll just start snapshotting manually the entire filesystem on my Linux machines (like store all dir/file paths with their sizes and modification dates; it's a start and you can diff against such "snapshots").

I am just not comfortable with running commands and not having an idea what and where changed. It's insane that everyone is just accepting this! I am not okay with it, I want to see an exact breakdown on what changed and where and how.

IMO working on this and bringing it to the mainstream is loooong overdue.

I think it's that few people can see its potential. When I first started using immutable infra like 10 years ago, and saw how many problems it solved, my mind was blown. Until I saw the difference myself, it just looked like some trivial CS concept.

It's not apparent that problems X, Y and Z will be solved by immutability. Once it's applied everywhere, whole classes of problems just disappear. But until people see the problems disappear, they won't implement it. Catch-22.

pdimitar2y ago

True, plus not many devs are directly exposed to the problems and thus the will to fix the problem never has a chance to materialize.

One of the best-oiled teams I was in had devs and sysadmins work together closely. If Jim made a huge Python mess out of its small throwaway project (that the CEO needed because he wanted a nice chart for an investor meeting) that required several virtual environments and a particular (older) version of something then the sysadmin had the power to call him out and question his methods. While not many programmers appreciate that, those that do make for a more positive workplace IMO.

RE: idempotency / immutability in general, I heard about Nix many times but I have been put off every time I tried it: cutesy (and rather dumb) terminology like pills and flakes and such, a Haskell dialect the world really did not need, tight binding between things (forgot which at this point, sorry), and the list kept growing until I just gave up. With all their quirkiness and edge cases my scripts still beat the pants off of Nix for my own goals. I mean, pacman/yay have a flag that says "only install this package if not already installed" so... ¯\_(ツ)_/¯

But I really do want something like Nix (and no, not Guix either). Not only for packages -- for the entire system. I want to be able to plug an USB drive and issue a command that says "show me new devices plugged in the last 5 minutes, or last time I checked".

We don't have stuff like that. Or if we do, I am blissfully unaware of it. Can't we just start writing them and push their adoption? Every sysadmin team invents magic from scratch. Surely we can and should collectively do better...

socketcluster2y ago

I built a serverless SaaS no-code/low-code platform which could be of interest: https://saasufy.com/

You can build your entire app inside a plain HTML file which can be deployed online with something like GitHub pages.

I've built a few apps with it including a real-time chat app which supports both group chat, private 1-on-1 chat with an account system (with access control), OAuth via GitHub... The entire app is only 260 lines of HTML markup and fully serverless (no custom back end code). Access controls are defined via the control panel. All the app's code is in this file: https://github.com/Saasufy/chat-app/blob/main/index.html

You can try the app here (use the 'Log in with GitHub' link): https://saasufy.github.io/chat-app/index.html

Saasufy comes with around 20 generic declarative HTML components which can be assembled in complex ways: https://github.com/Saasufy/saasufy-components?tab=readme-ov-...

There is a bit of a learning curve to figure out how the components work but once you understand it, you can build apps very quickly. The chat app only took me a few hours to build.

I've also been helping a friend to build an application related to HR with Saasufy and I managed to get the basic search functionality working with only 160 lines of HTML markup.

fhuici2y ago

> The speed that's not there is setting up infrastructure. If I make a change in the AWS console, or if I add a new pod to Kubernetes, or whatever, I want that to happen in seconds. I'm not asking for milliseconds!

Milliseconds is now possible: https://kraft.cloud/ (e.g., an NGINX web server in under 20 millis).

thundergolfer2y ago

Cool looking website :) Small nit feedback, you say "less servers to operate" when it should be "fewer servers to operate" because servers are countable.

fbergen2y ago

But you still have clusters, why not everywhere… ?

mike_hearn2y ago

It's a sub-component but Oracle Labs has a project to develop something like the FaaS platform he's asking for, called GraalOS.

The basic idea is that FaaS is a leaky abstraction because (a) lots of runtimes are slow to start up and (b) isolation tech isn't good enough. So FaaS services start up VMs and containers and then the user's function which might have to do a lot of init work, like to load reference data, and because that takes too long you have to keep idle capacity around. At that point the abstraction is broken.

So there's a two-part fix:

1. For Java users, the GraalVM native-image tool can pre-initialize and pre-compile a JVM app so that it starts up instantly (including with pre-loaded reference data).

2. Change the isolation model so VMs and containers don't need to be started up anymore. Containers alone can take hundreds of milliseconds to start.

There's also some interesting stuff there that takes advantage of Oracle Cloud's more "edgey" nature than other clouds, where it has more datacenters than others (but smaller).

The new isolation model works by exploiting new hardware features in CPUs that allow for intra-process memory isolation (Intel MPK) combined with hardware-enforced control flow integrity. This requires compiler support, but GraalVM knows about these features and so the cloud can just compile JVM apps to native for you. And what about other apps? Well, many languages run on GraalVM via Truffle, so those are covered (e.g. JavaScript) and for native code you can use a modified LLVM to compile and then do a static verification of any user supplied binaries, like NaCL used to do.

If you put those things together then starting user code that's already available locally becomes just mmapping a shared library into a process, which is extremely fast. It can only exit the hardware/software enforced isolate by going via a trampoline that's equivalent to a syscall, but without needing an actual syscall. The Linux kernel isn't reachable at all.

With that you can have functions that start and stop in milliseconds.

sethkim2y ago

What's cool is that Erik actually acted on these complaints. Modal is, by far, my favorite developer tool ever and makes me hopeful not just for the future of software engineering but the entire tech industry.

If you're a naysayer in the comments, I would encourage you to go give it an honest try, and consider again why you think infra has to be done in harder ways.

traverseda2y ago

Alright, how to I search for "Modal"?

elyall2y ago

https://modal.com/

friedrich_zip2y ago

I get where he is going with this... but idk. Feels like a somewhat mid take. Strong abstractions always means strong vendor lock-in and more power to infrastructure providers. But AWS, Netlify and whoever runs your apps are not your friends. Vertically integrating your infrastructure can be a pretty good source of cost reduction and can create interesting assets if you have good talent in-house. So idk... sometimes the fact that building something takes time and you have to think about how you are going to set it up actually is a good thing, because you take the time to build it right and you end up understanding how everything works together.

cheptsov2y ago

I have a lot of respect for Erik and his work with Modal, which I've heard a lot of good feedback about. What Erik says about serverless and code over configuration can benefit many users and companies. However, I strongly disagree on the main points and certainly have a different wishlist for infrastructure. My main point would be on that list – open-source and vendor-agnosticism.

Finally, I believe simple configuration can coexist with code.

P.S.: At dstack, we are building an open-source platform to manage AI infra – a more lightweight and AI-friendly alternative to Kubernetes.

mdaniel2y ago

(2021) and at the time: https://news.ycombinator.com/item?id=26869050

dang2y ago

Thanks! Macroexpanded:

Software Infrastructure 2.0: A Wishlist - https://news.ycombinator.com/item?id=26869050 - April 2021 (195 comments)

pnathan2y ago

One my basic design philosophies is I learn key things deeply, and fit them together, without layers of "make it easy" tools that introduce incessant XY problems and integration issues.

If something is "magically" easy, it either is a meaningful design/algo revolution or it overpromises the production case while showing off the trivial. Most of the time it's #2. Docker was #1.

samsquire2y ago

I enjoyed this post, thank you.

I'm encouraged by the same ideas.

Sometimes you just want something that stays running and doesn't go down and can scale to zero and scale upwards, ideally with revenue.

I kind of want a special mega HTTP form endpoint which I can define a pipeline from, that can go to database and cause background jobs and goes into a mega API automatically.

fbergen2y ago

I would love to have what we were sold as a “truly” serverless (even though the name doesn’t mean no server)

- CloudRun did a good job, but the autoscaling is too slow to not pay for idle

- Lambda is great, but I want to run way more complex workloads than simple functions

pdimitar2y ago

Somebody mentioned Google Cloud Functions and I instantly bookmarked the service to check it later. Looks to be a pretty solid deal.

fbergen2y ago

Am I asking too much? =P

shayarma2y ago

So true. why are we still paying for idle resources in 2024?

swyx2y ago

we recently interviewed Erik and touched on this list: https://www.latent.space/p/modal

and how Modal exemplifies a lot of the ideas he's been looking for. check it out incl our show notes!

thecleaner2y ago

These are bad ideas. They are software wishes which no enterprise will pay for. Infra is setup once so optimising for setup time doesn't do the trick. Rollouts should take time deliberately so faulty software don't lead to an outage in seconds. No infra provider will bother turning off infra as again it can have impact on availability. AWS is optimising resource usage anyway barring a few services like Cloudwatch

Within a few lines of each other in TFA:

> We are, like what, 10 years into the cloud adoption? Most companies (at least the ones I talk to) run their stuff in the cloud. So why is software still acting as if the cloud doesn't exist?

> As in, I don't want to think about future resource needs, I just want things to magically handle it.

'nuff said.

ljm2y ago

The cloud is so expensive for most companies that I think that a solution architect's insistence on setting up in the cloud by default is actually a corporate welfare program where VC funds are redirected to Amazon and Google.

That said, it's still not as trivial as using managed SaaS but it's still easier than ever to basically spin up your own cloud of sorts, using the wealth of open source tech out there. K3S on Hetzner can do a pretty solid job for cheap. In that sense, the ecosystem around running your own cloud is only improving.

crabbone2y ago

If I didn't know better, I'd think I'm reading one of those cheesy LinkedIn advertorials... To someone who dedicated their professional life to infrastructure all of these wishes read mostly irrelevant, with a strong proprietary advertising flavor. At every turn of a sentence I expected to find a mention of some commercial product this article was going to promote. Well, at least it doesn't seem to do that, not openly anyways.

So, here are some thoughts on what seems to be the key points of the article:

* I want to go fast.

Well... yeah, sure, why not... but it's not very important. Lots of other goals will overshadow this one. Also, if we are talking in the context of whatever-as-a-service, there's very little incentive to work on the speed aspect as long as it not taking ages.

Also, reducing infrastructure to whatever-as-a-service is seriously hollowing the definition. I've been in ops / infra for over a decade, and I've barely even touched the as-a-service aspect. Also, whenever I do come in contact with it, it's always awful, and I want to get away from it as fast as possible. Making it go faster won't help that though. The disappointing parts are poor documentation, poor support, proprietary tech. overly narrow scope etc.

* Testing in production

Why is this even a relevant issue?.. Anyways. OP needs to take a trip to the QA department. They obviously don't know why they have one. But it's also possible their QA department is worthless (ours is...) But having a worthless QA department isn't really something to wish for in Infrastructure 2.0. I don't see how this is a good goal.

So, the reason why QA department is necessary, and why CI can possibly cover only a fraction of what can be / should be done with testing is that QA, beside other things, needs to simulate plenty of different possible conditions in controlled environment to be able to investigate and to diagnose problems. Most of the work of QA is spent on RCA, and then figuring out how to present the problem, stripped of all unnecessary components to the development team to be able to fix it. It's not possible to do good QA w/o an ability to isolate components which calls for creation of fake / artificial environments which are not like production.

* Calls to unleash the next order of developer productivity

This is such an MBA b/s... Just give it a break.

pdimitar2y ago

> Well... yeah, sure, why not... but it's not very important. Lots of other goals will overshadow this one.

For you. For me having to tinker with a repo full of YAML files just to have a Kafka topic provisioned (like it just happened to me this week) can and has killed motivation to the point of not working at all after, for a day or two.

This stuff should be blindingly obvious, to the point a trained monkey should be able to do it.

I have the feeling that many agents are working against such a goal though. Vested interests and all.

You even kinda sorta agree with me by qualifying your statement with this, right after the previous quote:

> Also, if we are talking in the context of whatever-as-a-service, there's very little incentive to work on the speed aspect as long as it not taking ages.

Maybe to me time_it_should_take == X and to you X times 3 is fine, but in the end the brain schemata is the same: have it take LongEnough™ (subjective value) and the person responsible simply checks out mentally.

If I were a CTO or an IT manager I'd be very worried about stuff like this.

> But having a worthless QA department isn't really something to wish for in Infrastructure 2.0. I don't see how this is a good goal.

This is IMO not at all related to the article, nowadays QA depts are removed either because leadership wants to save money or because iteration would grind to a crawl, and many businesses need the next feature the next Wednesday. Nothing to do with infra management I'd think.

Though don't get me wrong, QA is hugely important per se. But I wonder if proper end-to-end automated frontend testing (e.g. with Playwright) won't eventually make them truly extinct. Who knows. I don't.

> This is such an MBA b/s... Just give it a break.

I'll always despise MBA speak but the point of programmer productivity is important. I have no problem churning out features and fixing bugs but give me a slow bureaucratic process and you'll find out what it's like to pay a salary to somebody who pushes to the GitHub repo 5 times a month with diffs like +30-20.

crabbone2y ago

> For you. For me having to tinker with a repo full of YAML files just to have a Kafka topic provisioned

This is understandable, but this isn't about speed. Many YAML files may result in high provisioning speed or low provisioning speed, after all they only give instructions to the program doing the provisioning.

You could legitimately complain about choice of YAML as a platform for infrastructure configuration so several reasons, like:

1. Not having a built-in ability to describe templates. Lots of infrastructure wants to have some sort of polymorphic configuration, and when the infra developers chose YAML to configure it, they didn't account for that. So, instead they use various template engines that strap on this polymorphism on YAML. This was also indirectly mentioned by OP.

2. Poorly structured, especially when it comes to large configuration size. It's easy to accidentally write something you didn't intend. It's hard to search.

3. Being JSON in disguise, it inherits a lot of problems from JSON. Marshaling richer type / structure of data in and out of the program is severely impacted by the primitive and inflexible type system of the format.

But, again, this isn't speed. This is just a different set of problems.

> If I were a CTO or an IT manager I'd be very worried about stuff like this.

Practice shows this is mostly irrelevant. It's hard to reach the point where provisioning speed starts to hurt so much it impacts business decisions. For instance, provisioning in MS Azure is on average twice as slow as it is in AWS. (And deprovisioning is probably four times as slow.) And nobody cares. So many other concerns will overshadow this particular aspect, that you'd feel uncomfortable to even bring it up, if you had to choose between two service providers. Primary driver is cost of running the infrastructure for a long time, overall as a system. Starting time does contribute to the total, but unless your business requires very frequent allocation and deallocation of resources, this won't make a difference. Also, cloud vendors don't bill you for the time that the infrastructure is being brought up, so, it's really hard to make a compelling case to choose the fast-to-provision infra over the slow one just based on that aspect alone.

pdimitar2y ago

> Practice shows this is mostly irrelevant.

I'd dispute this, though I don't have data. To me the problem of people just phoning it in and collecting FAANG salaries is pretty nasty and seems like it's not solvable.

But yes, I do agree that if the economic analysis SEEMS TO point at the idea that X times 3 effort for provisioning is irrelevant to the bigger bottom line then yes, it seems that the need for action does not exist.

phrotoma2y ago

> You know how crappy software is crappy in ways that are so blatantly obvious to the user that you wonder why it was released?

It has crossed my mind several times recently that I want a word to describe this exact state of affairs. Where a thing has a defect so blatant that it is evident to any user that the creator of the thing has never tried using it.

Eg. an airbnb with no towels in it.

What's the word for this situation?

JSR_FDED2y ago

Microsoft Teams

fbergen2y ago

Yet still people are using it?

Otherwise it’s called an MVP and a promise of plugging the holes

crabbone2y ago

It's overly naive to think that people who use such a tool choose to use it.

In many cases it's "you are hired into this job, this is the tool we give you, if you don't like the tool, take a hike".

Even more so, a lot of software is developed not to be competitive, but to be exclusive. It's a lot easier to be the only choice for doing something than trying to compete with a different tool. I've seen countless examples of tools developed in exactly this paradigm, where the decision to use the tool wasn't made by anyone anywhere close the users of the tool (eg. hospital procurement department buying a PACS or a large avionics company ordering a custom-made budget-management program).

pdimitar2y ago

Come on now, you were never told in your career, not once, "we use Microsoft Teams for communication here"? Ever?

Most crappy software exists because of inertia and corporate policies. If people truly had a choice stuff like MS Teams could be phased out by the end of the next quarter.

dkasper2y ago

Fugazi is my favorite word for it. Also snafu.

A tool with more than one way to use it?

crabbone2y ago

I want to expand on this :)

When I have to describe to people who don't work with me my interactions with developers (especially of the crappy code like that) from a standpoint of someone who represents the QA side of things... I describe to them my interactions with my five y.o. son:

    Me: How as school?
    Son: Goooood!
    Me: Did you behave?
    Son: Yes!
    Me: Did the teacher send you into timeout?
    Son: Yes...
    Me: So how come?  You told me you behaved...  What did you do?
    Son: Played with Ryan!
    Me: That doesn't seem like a good reason to send you into timeout.

And we go like this until I either discover that he was yelling in class or I will never know the reason why he was in detention. This is also the pattern of denial I very frequently face when talking to the programmers who wrote the crappy code. Somewhere on the back of their minds they understand that they screwed up, but they will come up with all sorts of concocted reasoning to pretend that they either don't understand why the product sucks, or they would claim that it cannot be made any better, or attack me for not understanding how the product is supposed to work etc. The most recent example would be (in slight adaptation):

    Me: I discovered that we set PYTHONPATH variable when loading a (Tcl) module.
    Dev: I see no problems with that.
    Me: The new feature we are releasing to the users is conda support.  Conda will not work (well) when this variable is set.
    Dev: Did the documentation tell users to load this module?
    Me: No, but it's obvious that users would like the functionality provided by the module in addition to using conda.  They are made to complement each other.  Besides, documentation doesn't say they shouldn't.
    Dev: (summons PM)

And then PM continues in the same spirit as the developer. And, my guess is that the reason for it is that nobody really wants to work too hard. There's no reward in making a better quality product if that quality isn't immediately appreciated. Features like latency, throughput, size etc. are immediately visible to the user and are an easy sell. Features like internal consistency in the face of more sophisticated usage: these might never happen, and the user might never know that they were protected from their system collapsing on them by a substantial development effort. So, commercial companies de-prioritize quality. And that's how we get crappy programs.

pdimitar2y ago

> And, my guess is that the reason for it is that nobody really wants to work too hard.

There is certainly a lot of that but it gets even worse: you get actively punished for doing good work in many companies: you end up making other people work like asking managers around for product requirements (that are of course barely written somewhere, if at all) or reminding that sysadmin that they half-arsed the job of the deployment and now must add another k8s resource, or asking another dev why did they do X with the Y library... you want to make sure not to screw something up but you just end up annoying them.

And sadly these things get brought up on meetings. And many over-zealous managers will scold you because they don't like the boat rocked (even if they would actually welcome their initiative; but that assumes they'd have made an effort to understand the situation which is not a given).

It's no surprise that many talented people just end up checking in, doing the bare minimum, and clocking out. The equation is extremely easy to solve: "work X*3, get scolded, don't get promotions, accumulate hostility in colleagues" vs. "work X and have peace and quiet".

crabbone2y ago

Haha. Yeah. I almost got fired in my first month because I asked another developer something I thought was really innocent: they mixed some code from pytest with unittest (two competing Python unit-testing libraries) where either one or the other could do the job perfectly fine. So, I naturally asked why'd they do it. Not even being mean. They, of course, interpreted this as me being snarky... complained to the management, and I had to look for another department to house me.

Now I write "sorry" and "excuse me" when I get assigned to review someone's code and I mostly fix typos in the comments. But, even so, I don't get assigned to code reviews all that often :)

Kon-Peki2y ago

Not knowing anything other than what you wrote, it sounds like your organization has leadership problems. People don't know why their job exists, they don't know what your organization is actually trying to accomplish, how any individual person fits into it, why the day-to-day things someone does helps, etc.

Nothing anyone does with software will help.

pdimitar2y ago

> it sounds like your organization has leadership problems

That's like saying "sounds like the Sun is going to rise tomorrow again". Most companies have leadership problems, it's kind of ingrained in Homo Sapiens to fight for a cozy position and then become a gatekeeper of their own mediocrity, to the detriment of their rulers.

crabbone2y ago

Yeah, but the most disappointing part is that you cannot say it out loud. Even acknowledging the problem will have all the swords pointing at you. So, everyone quietly hates their pointless tasks, but completes them somehow anyways. And things only get worse :) But, there's no reality check because there's no competition and hasn't been for such a long time that users are essentially led to believe that what they are getting is the best they can get.

stavros2y ago

This isn't substantive, but it bugged me:

> I'm not asking for milliseconds! Just please at least get it to less than a second.

What do we measure "less than a second" times in?

yoyohello132y ago

Centiseconds?

stavros2y ago

Deciseconds, surely.

zsoltkacsandi2y ago

The author apparently does not have any experience in building systems/infrastructure.

> I can set up a static website in AWS, but it takes 45 steps in the console and 12 of them are highly confusing if you never did it before

Anything can be confusing/takes time if you never did before. Getting productive needs time and practice. If your goal is only to set up a static site, AWS is an overkill for it.

> It's sad this is the current state of infrastructure.

It’s sad that some people still haven’t learned to pick the right tool for a problem.

> I could go on, but I won't. I'm dreaming of a world where things are truly serverless.

I don’t even understand what the author wants here. There is no such thing “truly serverless”. Your code will be executed by a server. Period. Serverless is just a fancy marketing term for ephemeral lightweight VMs.

> If I make a change in the AWS console, or if I add a new pod to Kubernetes, or whatever, I want that to happen in seconds

The author obviously doesn’t have any knowledge about distributed systems.

> My deep desire is to make it easy to create ephemeral resources. Do you need a database for your test suite? Create it in the cloud in a way so that it gets garbage collected once your test suite is done.

Fortunately we have Terraform that’s made this possible for a decade(?).

> Code not configuration

Terraform, Pulumi, countless of client libraries for all of the cloud providers.

dang2y ago

Can you please not post in the flamewar style to HN, as you did here and elsewhere in this thread? You can make your substantive points without that. We're trying for a different kind of discussion.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

samuell2y ago

> The author apparently does not have any experience in building systems/infrastructure.

Well, he built https://modal.com , one of the coolest things since sliced mangoes, and before that https://github.com/spotify/luigi

zsoltkacsandi2y ago

I don’t care what he built if he justifies his arguments with distorted facts and complains about lack of things that have been around for a decade.

phillipcarter2y ago

When you make statements like this:

> There is no such thing “truly serverless”. Your code will be executed by a server. Period.

It indicates that maybe you are the one who's missing the point. The author is not saying anything about wanting code that magically runs on a server without running on a server.

kitd2y ago

There is no such thing “truly serverless”. Your code will be executed by a server.

This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

Fortunately we have Terraform that’s made this possible for a decade(?).

Setting up production-grade DBs in Terraform is easy?

szszrk2y ago

If done perpetually - yes.

The autor does make some weird arguments and seem to be creating an emotional setting for something. Like his own product you guys mentioned.

My pods ARE ready in seconds. Wondering why his are not.

cassianoleal2y ago

> My pods ARE ready in seconds. Wondering why his are not.

That's what I was thinking too. What kind of underpowered, crappy k8s cluster is this person running where pods take minutes to spin up?

zsoltkacsandi2y ago

> Setting up production-grade DBs in Terraform is easy?

Oh, yes, it is. Setting up the resources actually the easiest part, most of the problems originate from the phenomenon that as the developers starts to use more and more "serverless" things, they know less about how the underlying technology works, how to use indexes, structure the database, how replication or transaction works. Production readiness is not just how a resource is configured. It is about how the application uses a resource efficiently.

> This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

There is no "dev" and "ops" serverless. Your application will run on one or multiple CPUs, will use the memory, the disk, the network. When you write the application all of these matter, memory management, network communication, CPU caches, parallel execution, concurrency, disk access. It does not matter if you call it serverless, cloud, bare metal, etc. The basics are the same.

jasode2y ago

>There is no such thing “truly serverless”. Your code will be executed by a server. Period.

>Your application will run on one or multiple CPUs, will use the memory, the disk, the network.

But the term "serverless" has never meant "serverless does not run on cpu, does not use any RAM, and does not use disk or network."

You're attempting a clarification for "serverless" that nobody needs because reasonable people didn't actually think serverless/LambdaFunctions/CloudWorkers/etc defied the laws of physics.

"Serverless" from the beginning has always meant not having to do "os management/operations" type of tasks in a vm such as:

  sudo apt-get update
  sudo apt-get install <package>
  [...]

Instead, the cloud vendors created ability to run stateless functions which are executed in a "cloud runtime". The "dev" focuses the effort on coding the stateless functions instead of Linux os housekeeping tasks.

And yes -- to pre-empt the discussion from going around in circles... the "cloud's runtime" for stateless functions do ultimately run on a "server" which runs on cpu/memory/disk. And yes, "the cloud is just somebody else's computer". I think we all know that.

evantbyrne2y ago

I built a CD for AWS (beakerstudio.com). The author is correct about everything being super complicated. Tools like Terraform help automate changes, but you still have to _learn_ all of the strange ways AWS works and juggle configuration requirements that are oftentimes so bizarre it makes you wonder if they are trying to funnel developers into support plans.

Honestly, the experience of building Beaker Studio made me bearish on AWS. They price gouge and the DX is so bad teams pretty much need CDs. Once I get the time I want to update Beaker Studio so people can deploy to any old Linux box instead. Teams deserve so much better than AWS/Google/Azure.

abi2y ago

Been looking at a few solutions similar to yours! I'm currently on Render and looking to move elsewhere so I can have more control and particularly insight into system metrics. Do you support zero downtime deploys? It wasn't clear to me from your home page.

evantbyrne2y ago

Tasks are run on ECS with Fargate. If you setup your server with a load balancer, which is required on ECS to point DNS to the server, then the load balancer will wait for health checks to pass before switching over to the newly deployed tasks. ECS with Fargate is reliable in my experience, and Beaker Studio uses an alternate installation of itself to deploy itself, so everything is dogfooded. A big drawback imo is that AWS is expensive and Beaker Studio does not attempt to hack its way around their pricing. Right now I'm not billing users (within reason) who provide feedback, so please feel free to sign up and email me your notes.

nkohari2y ago

Just because you disagree with someone doesn't mean they don't know what they're talking about.

sciurus2y ago

> I don’t even understand what the author wants here. There is no such thing “truly serverless”

The author says what they want. It's literally their next sentence:

"As in, I don't want to think about future resource needs, I just want things to magically handle it."

and they have four bullet points with examples of what this means to them earlier.

I think it's fair to argue about the desirability, achievability, etc of this. I don't think it's fair to act as if the author is just spewing buzzwords without explanation.

zsoltkacsandi2y ago

Let's see:

- Why do I have to think about the underlying pool of resources? Just maintain it for me.

- I don't ever want to provision anything in advance of load.

- I don't want to pay for idle resources. Just let me pay for whatever resources I'm actually using.

- Serverless doesn't mean it's a burstable VM that saves its instance state to disk during periods of idle.

This article was written in 2021.

AWS Lambda was introduced in 2014 that fulfilled all of those requirements in those bullet points that you mentioned. Google App Engine is the same, it was introduced in 2008.

So again, this article tells only one thing: that the author does not know what he is talking about.

whoiskatrinOP2y ago

I think it's important to understand that this was his opinion in 2021. Things have changed since, and hopefully, all these solutions are available now.

zsoltkacsandi2y ago

TBH, in 2018 I already used the things he was complaining about, so my opinion still stands.

lostmsu2y ago

I would not agree. Author basically describes Google App Engine.

opentokix2y ago

He is from spotify, he dont have any experience full stop.

j / k navigate · click thread line to collapse

82 comments

015a2y ago

Here's something very specific I've been thinking about recently.

mrkurt2y ago

(I am bias because I work on Fly.io)

Fly Machines are more powerful than Google Cloud Run IMO. You can treat them like cloud run, or manage them directly and implement your own Serverless model.

Our PaaS orchestration is implemented entirely I. The client CLI, and it manages Fly Machines directly: https://fly.io/docs/machines/

ajcp2y ago

At the risk of being on the outside here, I'd have to agree.

I just want my cloud providers ML service to talk to the data lake on the same cloud tenant without having to architect my way through 15 network nics, 30 service accounts, and 4 VDI...

latchkey2y ago

Cloud Run is great, but imho Cloud Functions are even better. It is just a simple http handler.

The entire deployment can be easily defined in github actions. Combine that with Cloud Tasks and a Cloud SQL Postgres instance and you have a near infinitely scalable solution.

maccard2y ago

Azure has container _instances_ - https://azure.microsoft.com/en-gb/products/container-instanc...

DigitalOcean iss not wildly far off it either.

ECS + Fargate is the closest AWS has to it, but you need to do IAM and Networking to utilise it. If you're in AWS already, it's pretty good, albeit with some frustrating limits

015a2y ago

Yup my bad, I meant ACI, not ACS.

jiggawatts2y ago

kastden2y ago

Container Instances is bad though, and you'll regret using it. There is Azure Container Apps but it requires some more setup in advance.

lijok2y ago

Maybe I’m misunderstanding something but what you’re describing is what AWS Lambda has been able to do for a long time now. You can run an api in a docker container with no Lambda-specific code.

dmattia2y ago

My understanding is that your docker image must have the lambda runtime interface client installed on the image in order to work.

It's not a huge step usually to add the RIC, but it's a bit more tied in to AWS than CloudRun is, which can run arbitrary docker images, if I understand.

lijok2y ago

That's right - you have to package awslabs/aws-lambda-web-adapter into your docker image which proxies the API-GW/ALB requests through.

meowtastic2y ago

Isn't AWS App Runner similar?

The one thing I want, that doesn't exist, and won't for at least 10 years: immutable infrastructure.

pdimitar2y ago

Yeah, sadly true. While I am not a platform engineer I've witnessed their plight many times and I truly sympathize.

IMO working on this and bringing it to the mainstream is loooong overdue.

pdimitar2y ago

True, plus not many devs are directly exposed to the problems and thus the will to fix the problem never has a chance to materialize.

socketcluster2y ago

I built a serverless SaaS no-code/low-code platform which could be of interest: https://saasufy.com/

You can build your entire app inside a plain HTML file which can be deployed online with something like GitHub pages.

You can try the app here (use the 'Log in with GitHub' link): https://saasufy.github.io/chat-app/index.html

Saasufy comes with around 20 generic declarative HTML components which can be assembled in complex ways: https://github.com/Saasufy/saasufy-components?tab=readme-ov-...

There is a bit of a learning curve to figure out how the components work but once you understand it, you can build apps very quickly. The chat app only took me a few hours to build.

I've also been helping a friend to build an application related to HR with Saasufy and I managed to get the basic search functionality working with only 160 lines of HTML markup.

fhuici2y ago

Milliseconds is now possible: https://kraft.cloud/ (e.g., an NGINX web server in under 20 millis).

thundergolfer2y ago

Cool looking website :) Small nit feedback, you say "less servers to operate" when it should be "fewer servers to operate" because servers are countable.

fbergen2y ago

But you still have clusters, why not everywhere… ?

mike_hearn2y ago

It's a sub-component but Oracle Labs has a project to develop something like the FaaS platform he's asking for, called GraalOS.

So there's a two-part fix:

1. For Java users, the GraalVM native-image tool can pre-initialize and pre-compile a JVM app so that it starts up instantly (including with pre-loaded reference data).

2. Change the isolation model so VMs and containers don't need to be started up anymore. Containers alone can take hundreds of milliseconds to start.

There's also some interesting stuff there that takes advantage of Oracle Cloud's more "edgey" nature than other clouds, where it has more datacenters than others (but smaller).

With that you can have functions that start and stop in milliseconds.

sethkim2y ago

If you're a naysayer in the comments, I would encourage you to go give it an honest try, and consider again why you think infra has to be done in harder ways.

traverseda2y ago

Alright, how to I search for "Modal"?

elyall2y ago

https://modal.com/

friedrich_zip2y ago

cheptsov2y ago

Finally, I believe simple configuration can coexist with code.

P.S.: At dstack, we are building an open-source platform to manage AI infra – a more lightweight and AI-friendly alternative to Kubernetes.

mdaniel2y ago

(2021) and at the time: https://news.ycombinator.com/item?id=26869050

dang2y ago

Thanks! Macroexpanded:

Software Infrastructure 2.0: A Wishlist - https://news.ycombinator.com/item?id=26869050 - April 2021 (195 comments)

pnathan2y ago

One my basic design philosophies is I learn key things deeply, and fit them together, without layers of "make it easy" tools that introduce incessant XY problems and integration issues.

If something is "magically" easy, it either is a meaningful design/algo revolution or it overpromises the production case while showing off the trivial. Most of the time it's #2. Docker was #1.

samsquire2y ago

I enjoyed this post, thank you.

I'm encouraged by the same ideas.

Sometimes you just want something that stays running and doesn't go down and can scale to zero and scale upwards, ideally with revenue.

I kind of want a special mega HTTP form endpoint which I can define a pipeline from, that can go to database and cause background jobs and goes into a mega API automatically.

fbergen2y ago

I would love to have what we were sold as a “truly” serverless (even though the name doesn’t mean no server)

- CloudRun did a good job, but the autoscaling is too slow to not pay for idle

- Lambda is great, but I want to run way more complex workloads than simple functions

pdimitar2y ago

Somebody mentioned Google Cloud Functions and I instantly bookmarked the service to check it later. Looks to be a pretty solid deal.

fbergen2y ago

Am I asking too much? =P

shayarma2y ago

So true. why are we still paying for idle resources in 2024?

swyx2y ago

we recently interviewed Erik and touched on this list: https://www.latent.space/p/modal

and how Modal exemplifies a lot of the ideas he's been looking for. check it out incl our show notes!

thecleaner2y ago

Within a few lines of each other in TFA:

> We are, like what, 10 years into the cloud adoption? Most companies (at least the ones I talk to) run their stuff in the cloud. So why is software still acting as if the cloud doesn't exist?

> As in, I don't want to think about future resource needs, I just want things to magically handle it.

'nuff said.

ljm2y ago

crabbone2y ago

So, here are some thoughts on what seems to be the key points of the article:

* I want to go fast.

* Testing in production

* Calls to unleash the next order of developer productivity

This is such an MBA b/s... Just give it a break.

pdimitar2y ago

> Well... yeah, sure, why not... but it's not very important. Lots of other goals will overshadow this one.

This stuff should be blindingly obvious, to the point a trained monkey should be able to do it.

I have the feeling that many agents are working against such a goal though. Vested interests and all.

You even kinda sorta agree with me by qualifying your statement with this, right after the previous quote:

> Also, if we are talking in the context of whatever-as-a-service, there's very little incentive to work on the speed aspect as long as it not taking ages.

If I were a CTO or an IT manager I'd be very worried about stuff like this.

> But having a worthless QA department isn't really something to wish for in Infrastructure 2.0. I don't see how this is a good goal.

> This is such an MBA b/s... Just give it a break.

crabbone2y ago

> For you. For me having to tinker with a repo full of YAML files just to have a Kafka topic provisioned

You could legitimately complain about choice of YAML as a platform for infrastructure configuration so several reasons, like:

2. Poorly structured, especially when it comes to large configuration size. It's easy to accidentally write something you didn't intend. It's hard to search.

But, again, this isn't speed. This is just a different set of problems.

> If I were a CTO or an IT manager I'd be very worried about stuff like this.

pdimitar2y ago

> Practice shows this is mostly irrelevant.

I'd dispute this, though I don't have data. To me the problem of people just phoning it in and collecting FAANG salaries is pretty nasty and seems like it's not solvable.

phrotoma2y ago

> You know how crappy software is crappy in ways that are so blatantly obvious to the user that you wonder why it was released?

Eg. an airbnb with no towels in it.

What's the word for this situation?

JSR_FDED2y ago

Microsoft Teams

fbergen2y ago

Yet still people are using it?

Otherwise it’s called an MVP and a promise of plugging the holes

crabbone2y ago

It's overly naive to think that people who use such a tool choose to use it.

In many cases it's "you are hired into this job, this is the tool we give you, if you don't like the tool, take a hike".

pdimitar2y ago

Come on now, you were never told in your career, not once, "we use Microsoft Teams for communication here"? Ever?

Most crappy software exists because of inertia and corporate policies. If people truly had a choice stuff like MS Teams could be phased out by the end of the next quarter.

dkasper2y ago

Fugazi is my favorite word for it. Also snafu.

A tool with more than one way to use it?

crabbone2y ago

I want to expand on this :)

    Me: How as school?
    Son: Goooood!
    Me: Did you behave?
    Son: Yes!
    Me: Did the teacher send you into timeout?
    Son: Yes...
    Me: So how come?  You told me you behaved...  What did you do?
    Son: Played with Ryan!
    Me: That doesn't seem like a good reason to send you into timeout.

    Me: I discovered that we set PYTHONPATH variable when loading a (Tcl) module.
    Dev: I see no problems with that.
    Me: The new feature we are releasing to the users is conda support.  Conda will not work (well) when this variable is set.
    Dev: Did the documentation tell users to load this module?
    Me: No, but it's obvious that users would like the functionality provided by the module in addition to using conda.  They are made to complement each other.  Besides, documentation doesn't say they shouldn't.
    Dev: (summons PM)

pdimitar2y ago

> And, my guess is that the reason for it is that nobody really wants to work too hard.

crabbone2y ago

Now I write "sorry" and "excuse me" when I get assigned to review someone's code and I mostly fix typos in the comments. But, even so, I don't get assigned to code reviews all that often :)

Kon-Peki2y ago

Nothing anyone does with software will help.

pdimitar2y ago

> it sounds like your organization has leadership problems

crabbone2y ago

stavros2y ago

This isn't substantive, but it bugged me:

> I'm not asking for milliseconds! Just please at least get it to less than a second.

What do we measure "less than a second" times in?

yoyohello132y ago

Centiseconds?

stavros2y ago

Deciseconds, surely.

zsoltkacsandi2y ago

The author apparently does not have any experience in building systems/infrastructure.

> I can set up a static website in AWS, but it takes 45 steps in the console and 12 of them are highly confusing if you never did it before

Anything can be confusing/takes time if you never did before. Getting productive needs time and practice. If your goal is only to set up a static site, AWS is an overkill for it.

> It's sad this is the current state of infrastructure.

It’s sad that some people still haven’t learned to pick the right tool for a problem.

> I could go on, but I won't. I'm dreaming of a world where things are truly serverless.

> If I make a change in the AWS console, or if I add a new pod to Kubernetes, or whatever, I want that to happen in seconds

The author obviously doesn’t have any knowledge about distributed systems.

Fortunately we have Terraform that’s made this possible for a decade(?).

> Code not configuration

Terraform, Pulumi, countless of client libraries for all of the cloud providers.

dang2y ago

Can you please not post in the flamewar style to HN, as you did here and elsewhere in this thread? You can make your substantive points without that. We're trying for a different kind of discussion.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

samuell2y ago

> The author apparently does not have any experience in building systems/infrastructure.

Well, he built https://modal.com , one of the coolest things since sliced mangoes, and before that https://github.com/spotify/luigi

zsoltkacsandi2y ago

I don’t care what he built if he justifies his arguments with distorted facts and complains about lack of things that have been around for a decade.

phillipcarter2y ago

When you make statements like this:

> There is no such thing “truly serverless”. Your code will be executed by a server. Period.

It indicates that maybe you are the one who's missing the point. The author is not saying anything about wanting code that magically runs on a server without running on a server.

kitd2y ago

There is no such thing “truly serverless”. Your code will be executed by a server.

This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

Fortunately we have Terraform that’s made this possible for a decade(?).

Setting up production-grade DBs in Terraform is easy?

szszrk2y ago

If done perpetually - yes.

The autor does make some weird arguments and seem to be creating an emotional setting for something. Like his own product you guys mentioned.

My pods ARE ready in seconds. Wondering why his are not.

cassianoleal2y ago

> My pods ARE ready in seconds. Wondering why his are not.

That's what I was thinking too. What kind of underpowered, crappy k8s cluster is this person running where pods take minutes to spin up?

zsoltkacsandi2y ago

> Setting up production-grade DBs in Terraform is easy?

> This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

jasode2y ago

>There is no such thing “truly serverless”. Your code will be executed by a server. Period.

>Your application will run on one or multiple CPUs, will use the memory, the disk, the network.

But the term "serverless" has never meant "serverless does not run on cpu, does not use any RAM, and does not use disk or network."

You're attempting a clarification for "serverless" that nobody needs because reasonable people didn't actually think serverless/LambdaFunctions/CloudWorkers/etc defied the laws of physics.

"Serverless" from the beginning has always meant not having to do "os management/operations" type of tasks in a vm such as:

  sudo apt-get update
  sudo apt-get install <package>
  [...]

evantbyrne2y ago

abi2y ago

evantbyrne2y ago