The words "it's a miracle it works at all" routinely popped up in those conversations, which is... something you don't want to hear about any sort of power generation - especially not nuclear - but it's true. It's a system basically built to produce "common accidents". It's amazing that it doesn't on a regular basis.
Funny thing is, those are the exact words I use when talking to people about networking. And realistically anytime I dig deep into the underlying details of any big enough system I walk away with that impression. At scale, I think any system is less “controlled and planned precision” and more “harnessed chaos with a lot of resiliency to the unpredictability of that chaos”
Components aren’t reliable. The whole thing might be duct tape and popsicle sticks. But the trick for SRE work is to create stability from unreliable components by isolating and routing around failures.
It’s part of what made chaos engineering so effective. From randomly slowing down disk/network speed to unplugging server racks to making entire datacenters go dark - you intentionally introduce all sorts of crazy failure modes to intentionally break things and make sure the system remains metastable.
Seek only to understand it well enough to harness the chaos for more subtle useful purpose, for from chaos comes all the beauty and life in the universe.
The syncronasation of a power grid ... Wow.
Or the U.S. financial system. Or civilization in general.
I am an ex-scientist and an engineer and had a look at the books of my son who studies finance in the best finance school in the world (I am saying this to highlight that he will be one of the perpetrators, possibly with influence, of this mess)
The things in there are crazy. There are whole blocks that are obvious but made to sound complicated. I spent some time on a graph just to realize that they ultimately talk about solving a set of two linear equations (midfle school level).
Some pieces were not comprehensible because they did not make sense.
And then bam! A random differential equation and explanation as it was the answer to the universe. With an incorrect interpretation.
And then there are statistics that would make "sociology science" blush. Yes, they are so bad that even the, ahem, experts who do stats in sociology would be ashamed (no hate for sociology, everyone needs to eat, it is just that I was several times reviewer of thesises there and I have trauma afterwards).
The fact that finance works is because we have some kind of magical "local minimum of finance energy" from which the Trumps of this world somehow did not maybe to break from (fingers crossed) by disrupting the world too much.
Computer networking is not the same. Our networks will not explode. I will grant you that they can be shite if not designed properly but they end up running slowly or not at all, but it will not combust nor explode.
If you get the basics right for ethernet then it works rather well as a massive network. You could describe it as an internetwork.
Basically, keep your layer 1 to around 200 odd maximum devices per VLAN - that works fine for IPv4. You might have to tune MAC tables for IPv6 for obvious reasons.
Your fancier switches will have some funky memory for tables of one address to other address translation eg MAC to IP n VLAN and that. That memory will be shared with other databases too, perhaps iSCSI, so you have to decide how to manage that lot.
EVPN uses BGP to advertise MAC addresses in VXLAN networks which solves looping without magic packets, scales better and is easier to introspect.
And we didn't even get into the provider side which has been using MPLS for decades.
A problem with high bandwidth networking over fiber is that since light refracts within the fiber some light will take a longer path than other, if the widow is too short and you have too much scattering you will drop packets.
So hopefully someone doesn't bend your 100G fiber too much, if that isn't finicky idk what is, DAC cables with twinax solve it short-range for cheaper however.
What’s your source?
Perhaps the safest assumption is that system reliability ultimately depends on quite a lot of factors that are not purely about careful engineering.
Most operating systems are based on ambient authority, which is just a disaster waiting to happen.