Amazon’s distributed computing manifesto (1998) (opens in new tab)

(allthingsdistributed.com)

247 pointswerner3y ago53 comments

53 comments

This is one example of the CEO making something happen that essentially birthed AWS.

Bezos, of all people, was like "make it happen." And it did. It was basically work for no reason except future proofing. Having someone up the food chain OK this much work for the future (and no hard dollar benefit) is highly unusual.

And besides that they've done some incredible things with their infrastructure, like authorization. Distributed authorization is really hard, but at AWS it's completely invisible. Remove a permission from an IAM role and it moves through AWS really, really fast. It's totally magic. Anyone who was abused by CORBA knows how hard that is to do well.

Their newer stuff (like Cognito) is sort of weird, but other things are surprisingly solid given how big AWS is. Small shops have trouble shipping feature complete software, and BigCorps can be even worse. AWS has gotten really good at it.

potamic3y ago

I wonder if we are far too quick to bestow CEOs with credit for something other people effected. Sure the CEO is the one to sign off on everything, but the question to ask is could any other person in the CEO role at that time, have done any different? I don't know the details in this particular case, but I'll go out on a limb and say the CEO quite likely did not proclaim "make this happen". Business was growing at an unbelievable pace, their systems were probably stressed to the max, their development was likely choked and the technical team comes up and says this is what we need to do otherwise we can't handle more than this much traffic. What choice does the CEO have? He says, send me your proposal and broadcasts it.

As for AWS, as far as I remember, Bezos was initially against the idea. The idea was the brainchild of one Andy Jassy who along with Rick Salzell convinced a reluctant board into trying this out. They realized that they had been unintentionally building this cloud platform for some years now in order to provide sellers with computing resources. Opening up to public users was just a small sales move. Whether they do it or not, they were going to continue to invest in their cloud platform and nothing would change as far as their technical direction was concerned, so the board finally relented.

jzelinskie3y ago

Distributed authorization is indeed hard! IAM is one of the few (maybe the only) AWS service that isn't regional and it's because permissions must propagate globally for correctness' sake. As a distributed systems junkie, I'm shocked that other folks aren't as interested in authorization systems because they really push the boundaries of what we can do with data consistency at scale.

It's unfortunate that only Amazon themselves can add new permissions to IAM to secure their services. Why can't our applications add new permissions to IAM and query those? This is going to be a shameless plug, but it was this very problem that caused my cofounders and I to quit our jobs and start a company. Together (and now with a community of hundreds of users and contributions from a few well-known companies) we built SpiceDB[0], which is the culmination of state of the art distributed systems and authorization technology developed open source instead of behind closed doors at a hyper-scaler. We were mostly inspired by the internal system at Google, which is actually more powerful than AWS or Google Cloud's IAM services, despite a fork of it actually powering GCP's IAM.

[0]: https://github.com/authzed/spicedb

p_l3y ago

It's easy till you break IAM itself with your policy complexity and random services start dying because other AWS components few layers deep cnst get IAM to parse

another_devy3y ago

Intriguing, can you share details or overview why it failed for you. Will be kind of gotchas for me

1 more reply

anonymousDan3y ago

Can you (or anyone) say a bit about how the auth service is implemented from a distributed systems perspective? For example is it some kind of custom KV store?

dastbe3y ago

in AWS, authentication and authorization happens within the application.

For the purposes of authorization, services integrate with a library that handles retrieving and caching policies based on caller identity. services create a context that includes all of the relevant metadata (service, operation, resources, etc.) and the library evaluates the policy and says allow or deny.

Doing it all in application means that if the control/distribution systems for auth go down most things that are in motion will remain in motion, and that deployments of the authentication/authorization code deploy out at a per-service granularity which also scopes blast radius.

There's some pretty obvious pain points (doing anything as a library means update the world for new features) but it has nice degradation properties and is relatively straightforward to grok as a service owner.

mannyv3y ago

Well, it's really tough, because (1) every operation has to check if the calling entity is authorized, (2) changes need to propagate super quickly, and (3) performance needs to be pretty much realtime.

At some level every API call is authorized (and tracked).

To be honest, this is one of the secret sauces that makes AWS go. Someone once told me that they're not doing anything exciting, just caching, but I'm pretty sure they didn't really know what was going on.

benhoyt3y ago

I think this is a very interesting and clear manifesto, and almost certainly the right thing for Amazon at the right time (and presumably was part of what led to AWS). However, at one of the previous companies I worked at we got an ex-Amazon person as CTO and he grew the company from 50 engineers to 500 in 2 years, and pushed microservices everywhere. Very impressive, and I think all-microservices made sense at one level to handle that sort of growth, but it doesn't make sense technically in a lot of cases.

Essentially I think we've gone too far: service-oriented architectures turned into "micro" services, which come with a lot of complexity and distributed systems issues. I think for most small companies monoliths are right, for medium-sized companies (say 50+) it makes sense to carefully introduce a few separate services, and only for large companies (say 300+) does many services (which may or may not be "micro") start to pay off. I've heard it said that "microservices solve a people problem, not a technical one", and I think that's true.

jerglingu3y ago

This sounds similar to Flexport's CTO situation (he came from Amazon) and attempted microservice-ification of everything. Except it sounds like they weren't even able to get wheels up and are still floundering after years of planning and attempted execution.

mongol3y ago

It is most likely true. If you take away all constraints related to planning, communication, prioritization, collaboration, development efficiency etc, most of the arguments against monoliths goes away. What remains are considerations for memory, bandwidth and similar that could motivate a breakdown.

PaulDavisThe1st3y ago

As one of the main designers of the original system (but who had left by the time this architectural change was done), that is an interesting read. Always good to see the things that we missed in 1994/1995, even though we believed we were thinking far, far ahead.

asim3y ago

I'm sure it would have been nice to have that tech in 94 and yet at the same time I get the feeling it had to play out the way it did for Amazon to succeed. Without the first part of the journey Amazon would not have gone on to build AWS.

pjmlp3y ago

The rest of us in 1994 were doing Sun RPC calls, while getting started with DCOM and CORBA, actually quite interesting Amazon's bet on distributed computing given the landscape back then.

2 more replies

0xbadcafebee3y ago

"And with every few orders of magnitude of growth the current architecture would start to show cracks in reliability and performance, and engineers would start to spend more time with virtual duct tape and WD40 than building new innovative products. At each of these inflection points, engineers would invent their way into a new architectural structure to be ready for the next orders of magnitude growth."

That last part, to me, is the key to success: getting the whole business to do things in a new way. That is fucking hard. If you can get your business to do it, you have an invaluable superpower. The more things that you can reinvent, faster, gives you more and more superpowers. It's one thing to change your architecture. But also imagine getting every employee to change how they deal with vacations, suppliers, customers, finance, or involving entirely new industries. The easier it is to adapt and change, the longer you survive and the more you thrive. Evolution, baby.

yedava3y ago

I wish there was a way to quantify the externalities of "success" of this kind. How many developers had to burnout? How many relationships had to suffer, or never even had a chance to bloom because "success" didn't leave time for anything else? And also to be considered are the downstream effects of a culture of such "success", like how Amazon's warehouse employees are treated.

bwestergard3y ago

"But also imagine getting every employee to change how they deal with vacations"

Interesting example. Why would changing distributed computing architecture have an impact on vacation policy?

0xbadcafebee3y ago

I'm saying architecture is just one way of changing an organization. Other ways of changing an organization, separate from anything technical, might include changing people's schedules or vacation policy, or who you hire, or where, or how. Another would be how you store parts, make orders, assemble products. Or starting work in an entirely new industry.

Maybe you work at a company that sometimes works with the government. As a result, the whole company might develop a hiring process which is very slow, very detailed, and excludes certain people from being hired. But probably only a very small number of employees actually have to conform to those government requirements. You can apply them to all new hires "for simplicity", but it makes it harder to hire for non-government positions. So changing how you hire, to make it easier and faster to hire people of a wider background, benefits your organization. If your org can't easily make those changes, it will be disadvantaged.

1 more reply

kerblang3y ago

> All of this was being done before terms like service-oriented architecture existed.

I feel like the first time I heard the term was early 2000's, and wasn't it a mainframe thing first? Dunno, just wondering.

Anyhow, it's nicely written, very concise, and worth noting how the original author focuses more on "What kind of realistic options do we have?" than winning the A vs. B vs. C argument in one fell swoop.

pjmlp3y ago

SOA as buzzword started with DCOM and CORBA distributed computing, then it evolved into the XML spaghetti of XML-RPC and WebServices.

Ironically when .NET was launched, Microsoft's vision was web services everywhere, with orchestration servers like Bizztalk.

We got there eventually, only using REST (aka JSON-RPC) and gRPC instead.

qqtt3y ago

Things like Java RMI existed beforehand and there was the elements of industry moving towards server-based partitioning of services - the big difference is none of it was formalized and there was little consistent language of which to speak about these paradigms. At the beginning yes, people would discuss having one mainframe call another mainframe but today that would be SoA.

fiddlerwoaroof3y ago

Yeah, the buzzwords have changed, but some version of the concept has been in the air at least since I was learning Delphi in the 90s

awithrow3y ago

I remember seeing the term around the mid to late 2000s. But it was also used primarily in the context of enterprisy J2EE, weblogic servers and various IBM hardware that made the everything way more complicated than it needed to be.

numbsafari3y ago

It was definitely pre-2000. First “SoA” firm I worked for, I started at in 99, and they had been doing it for 2 years already and most of the crew brought if from a prior gig.

dboreham3y ago

Although not using the same buzzwords, we had the same architecture deep into the 1980s in Inmos/Transputer/Occam-land.

nevir3y ago

> We propose moving towards a three-tier architecture where presentation (client), business logic and data are separated. This has also been called a service-based architecture. The applications (clients) would no longer be able to access the database directly, but only through a well-defined interface that encapsulates the business logic required to perform the function.

It is really interesting to see a recent(ish) trend away from this three tier design and back towards tighter coupling between application layers. Usually due to increased convenience & developer ergonomics.

We've got tools that 'generate' business layers from/for the data layer (Prisma, etc).

We've got tools that push business logic to the client (Meteor, Firebase, etc)

ajkjk3y ago

For what it's worth, Amazon's architecture for the core retail business has, if anything, moved even further up in abstraction. Tighter coupling is something that simple usecases can afford. Large scale but low-complexity can be closely coupled. High-complexity can't be.

The thing about Amazon's systems is that they are horrendously complex. In ~2016 I was working on the warehousing software, and it was a set of some hundreds of microservices in the space, which also communicated (via broad abstraction) to other spaces (orders, shipments, product, accounting, planning, ...) which were abstractions over 100s of other microservices.

So what I observed at the time was a broad increase in abstraction horizontally, rather than vertically. This manifesto describes splitting client-server into client-service-server; the trend two decades later was splitting <a few services, one for each domain> into <many services, one for each slice of each domain>, often with services that simply aggregated the results of their subdomains for general consumption in other domains.

I'm sure things have only gotten more complicated since then (in particular, a large challenge at the time was the general difficulty in producing maintainable asynchronous workflows, so lots of work was being done synchronously or very-slightly-asynchronously that should have been done in long-running or even lazy workflows).

amcvitty3y ago

A big part of the difference has to be that if you have a small number of developers (esp. n=1) and you can deploy everything at once, then those layers just get in the way of fast change. It seems Amazon were optimising for the ability to distribute data because they had big volume, and hide its form so they could change it without having to change lots of applications.

Of course, there’s some cargo culting around services where people jump to that architecture before they need it, but for most apps YAGNI. it’s cool that their architecture was driven by clear needs “just in time” to allow them to continue to scale

tsss3y ago

Nowadays you separate service by business capability and not by "layer". Layers just lead to a dependencies and dependencies lead to bad reliability and terrible development speed.

derefr3y ago

What Amazon were describing here is simply the division between a frontend web gateway service (or, in modernity, client-delivered SPAs); an API backend service to serve the XHRs of the web-gateway / SPA; and some kind of DBMS where user-visible query schema is separable from storage architecture via e.g. views. I don't think there's any modern system that doesn't have those things, no?

1 more reply

wging3y ago

It’s interesting to think about how much of a perspective shift this must have been, especially the service oriented bits. Interestingly, I think it might not have even been made completely in the authors’ minds at the time of this proposal. (Which is understandable, of course. It’s a proposal, not a retrospective on already accepted ideas.)

For example,

> In the case of DC processing, customer service and other functions need to be able to determine where a customer order or shipment is in the pipeline. The mechanism that we propose using is one where certain nodes along the workflow insert a row into some centralized database instance to indicate the current state of the workflow element being processed.

definitely doesn’t seem to reflect the hiding of a database behind an interface. (From a workflow node’s perspective, rows in that centralized database should be an implementation detail it has no knowledge of.)

Then again, this is part of their pitch for workflow processing, not service-oriented architecture.

ajkjk3y ago

Anecdotally, it was at least 2015 before the DC processing system was actually mostly operating against service-oriented interfaces (when I left in 2016 we had a few old tools left that still talked to the databases directly :/ ).

epberry3y ago

> Currently much of our database access is ad hoc with a proliferation of Perl scripts that to a very real extent run our business.

There are companies started later than 2010 where this was still the case. Interesting to think about how shipping things quickly is so different than scaling them up.

npalli3y ago

Seeing that Werner Vogels has submitted this entry, I wonder if he can comment how long it took to actually build this out. When did a form of this service oriented architecture work in production at Amazon?

viburnum3y ago

I don't think this is really from 1998.

wging3y ago

The blog post is recent, but it describes much older work, so I think the “(1998)” tag is right.

“Distributed Computing Manifesto

Created: May 24, 1998”

thomastjeffery3y ago

The blog post (what the OP link takes you to) is from 2022, but the manifesto itself (the substance of the post) is from 1998; so both dates should be used:

> Amazon's distributed computing manifesto (1998) (2022)

1 more reply

j / k navigate · click thread line to collapse

53 comments

manv13y ago

This is one example of the CEO making something happen that essentially birthed AWS.

potamic3y ago

jzelinskie3y ago

[0]: https://github.com/authzed/spicedb

p_l3y ago

It's easy till you break IAM itself with your policy complexity and random services start dying because other AWS components few layers deep cnst get IAM to parse

another_devy3y ago

Intriguing, can you share details or overview why it failed for you. Will be kind of gotchas for me

1 more reply

anonymousDan3y ago

Can you (or anyone) say a bit about how the auth service is implemented from a distributed systems perspective? For example is it some kind of custom KV store?

dastbe3y ago

in AWS, authentication and authorization happens within the application.

mannyv3y ago

At some level every API call is authorized (and tracked).

benhoyt3y ago

jerglingu3y ago

mongol3y ago

PaulDavisThe1st3y ago

asim3y ago

pjmlp3y ago

The rest of us in 1994 were doing Sun RPC calls, while getting started with DCOM and CORBA, actually quite interesting Amazon's bet on distributed computing given the landscape back then.

2 more replies

0xbadcafebee3y ago

yedava3y ago

bwestergard3y ago

"But also imagine getting every employee to change how they deal with vacations"

Interesting example. Why would changing distributed computing architecture have an impact on vacation policy?

0xbadcafebee3y ago

1 more reply

kerblang3y ago

> All of this was being done before terms like service-oriented architecture existed.

I feel like the first time I heard the term was early 2000's, and wasn't it a mainframe thing first? Dunno, just wondering.

pjmlp3y ago

SOA as buzzword started with DCOM and CORBA distributed computing, then it evolved into the XML spaghetti of XML-RPC and WebServices.

Ironically when .NET was launched, Microsoft's vision was web services everywhere, with orchestration servers like Bizztalk.

We got there eventually, only using REST (aka JSON-RPC) and gRPC instead.

qqtt3y ago

fiddlerwoaroof3y ago

Yeah, the buzzwords have changed, but some version of the concept has been in the air at least since I was learning Delphi in the 90s

awithrow3y ago

numbsafari3y ago

It was definitely pre-2000. First “SoA” firm I worked for, I started at in 99, and they had been doing it for 2 years already and most of the crew brought if from a prior gig.

dboreham3y ago

Although not using the same buzzwords, we had the same architecture deep into the 1980s in Inmos/Transputer/Occam-land.

nevir3y ago

We've got tools that 'generate' business layers from/for the data layer (Prisma, etc).

We've got tools that push business logic to the client (Meteor, Firebase, etc)

ajkjk3y ago

amcvitty3y ago

tsss3y ago

Nowadays you separate service by business capability and not by "layer". Layers just lead to a dependencies and dependencies lead to bad reliability and terrible development speed.

derefr3y ago

1 more reply

wging3y ago

For example,

Then again, this is part of their pitch for workflow processing, not service-oriented architecture.

ajkjk3y ago

epberry3y ago

> Currently much of our database access is ad hoc with a proliferation of Perl scripts that to a very real extent run our business.

There are companies started later than 2010 where this was still the case. Interesting to think about how shipping things quickly is so different than scaling them up.

npalli3y ago

viburnum3y ago

I don't think this is really from 1998.

wging3y ago

The blog post is recent, but it describes much older work, so I think the “(1998)” tag is right.

“Distributed Computing Manifesto

Created: May 24, 1998”

thomastjeffery3y ago

The blog post (what the OP link takes you to) is from 2022, but the manifesto itself (the substance of the post) is from 1998; so both dates should be used:

> Amazon's distributed computing manifesto (1998) (2022)

1 more reply

j / k navigate · click thread line to collapse