Trying to become a better developer by learning more about aviation (opens in new tab)

(medium.com)

129 pointsfcmam52y ago91 comments

91 comments

Minor lessons from time at an aerospace company:

- When your device is in use in the field, the user will be too hot, too cold, too windy, too dark, too tired, too wet, too rushed, or under fire. Mistakes will be made. Design for that environment. Simplify controls. Make layouts very clear. Military equipment uses connectors which cannot be plugged in wrong, even if you try to force them. That's why. (Former USMC officer.)

- Make it easy to determine what's broken. Self-test features are essential. (USAF officer.)

- If A and B won't interoperate, check the interface specification. Whoever isn't compliant with the spec is wrong. They have to fix their side. If you can't decide who's wrong, the spec is wrong. This reduces interoperability from an O(N^2) problem to an O(N) problem. (DARPA program manager.)

- If the thing doesn't meet spec, have Q/A put a red REJECTED tag on it. The thing goes back, it doesn't get paid for, the supplier gets pounded on by Purchasing and Quality Control, and they get less future business. It's not your job to fix their problem. (This was from an era when DoD customers had more clout with suppliers.)

- There are not "bugs". There are "defects". (HP exec.)

- Let the fighter pilot drive. Just sit back and enjoy the world zooming by. (Navy aviator.)

Aerospace is a world with many hard-ass types, many of whom have been shot at and shot back, have landed a plane in bad weather, or both.

SoftTalker2y ago

I like the term "defect" it's more accurate than "bug."

medler2y ago

“We could, for instance, begin with cleaning up our language by no longer calling a bug a bug but by calling it an error. It is much more honest because it squarely puts the blame where it belongs, viz. with the programmer who made the error. The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation. The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be "almost correct", afterwards a program with an error is just "wrong" (because in error).” - Edsger Dijkstra, EWD 1036

2 more replies

cromulent2y ago

Tester at work calls them “defects”. Mostly they are “test findings” resulting from hallucinated specifications that don’t exist except in his mind, or misunderstandings from him not reading the documentation. Or data problems. It’s annoying having them called “bugs” let alone “defects”.

treve2y ago

Is it? I think the definition of Bug is literally defect. Maybe it carries a different 'weight' for some people?

3 more replies

jacquesm2y ago

> There are not "bugs". There are "defects".

Priceless, I've been trying make that point for years but nobody seems to want to listen.

photochemsyn2y ago

A defective device in my mind would have some obviously fatal flaw - it wouldn't even start up, parts would be broken and dangling off, etc. QC immediately rejects it and it doesn't get incorporated into more complex systems.

A buggy device is much worse for any critical application - it appears to work under inspection, and even limited testing, but 1% of the time it develops some fatal data race condition that causes it to fail erratically and cause havoc, e.g. the Therac-25.

So buggy is a subset of defective but it's even worse?

1 more reply

resonious2y ago

What's the difference? I'm not sure how this renaming would change my behavior.

1 more reply

gbacon2y ago

Fun to consider as both a computer scientist and a CFI.

Instrument training in FAA-land requires learners to understand the five hazardous attitudes: anti-authority ("the rules don't apply to me"), impulsivity ("gotta do something now!), invulnerability ("I can get away with it"), macho ("watch this!"), and resignation ("I can't do anything to stop the inevitable"). Although the stakes are different, they have applicability to software development. Before a situation gets out of hand, the pilot has to recognize and label a particular thought and then think of the antidote, e.g., "the rules are there to keep me safe" for anti-authority.

Part 121 or scheduled airline travel owes its safety record to many layers of redundancy. Two highly trained and experienced pilots are in the cockpit talking to a dispatcher on the ground, for example. They're looking outside and also have Air Traffic Control watching out for them. The author mentioned automation. This is an area where DevSecOps pipelines can add lots of redundancy in a way that leaves machines doing tedious tasks that machines are good at. As in the cockpit, it's important to understand and manage the automation rather than following the magenta line right into cumulogranite.

bulte-rs2y ago

Former airline pilot checking in!

Remember the importance of checklists in the "grand scheme of things". It helps maintain proper "authority" during operation and makes sure you don't forget things. If you don't write it down and check it, someone, at a certain moment will forget something.

Also, the "Aviate, navigate, communicate" axiom (as mentioned by author) is really helpful if you're trying to setup incident/crisis response structures. You basically get your guiding principles for free from an industry that has 100+ years of experience in dealing with crisises. It's something I teach during every incident/crisis response workshop.

edit: Although it's not aviation specific, and a little light on the science, "The Checklist Manifesto" by A. Gawande is a nice introduction into using (and making) checklists.

jacquesm2y ago

And the value of good documentation, and actually reading that documentation as well as making sure the documentation is indexed and quick to peruse in a situation where you don't have time to waste.

ryandrake2y ago

Great observation. You can easily and routinely see all five hazardous attitudes in software development, especially in small companies and startups where there is sometimes no formal process in place. I wonder if you could measurably improve your software by focusing on those attitudes during interviews…?

paulddraper2y ago

IIRC the five hazardous attitudes are required material for all pilots not just IFR.

the__alchemist2y ago

Another good one: Warnings, cautions, rules etc are often written in blood.

RugnirViking2y ago

I feel like when applied to software the "invulnerability" point needs to be tweaked a little, the others are good. Perhaps something more towards apathy "it does't matter/it isn't worth fixing". It's the same end result (the concequences won't track back to me), but it's much more likely to be true in software development and yet is still a hazardous attitude

perilunar2y ago

> anti-authority ("the rules don't apply to me")

Of course in Aviation the 'authorities' are usually rational and fair. In many other areas of life they are neither, and are incompetent to boot. Being anti-authority is justified in such cases. i.e. there is a moral responsibility to disobey unjust laws.

samtho2y ago

Authority is relative and is more nuanced. Only in recent human history have we seen a deliberate separation of church and state, for example. Prior to this, they were intertwined to a degree that would be incomprehensible to us now.

As a new pilot myself, I can say with confidence that the FAA has some major flaws and the US congress has been able to get their dirty hands in aviation policy to enact separate rules that did not originate from the recommendations by the NTSB.

expertentipp2y ago

Software industry prides itself with low barrier to entry and ageism is absolutely the case. Then in workplace the authorities are things like "Airbnb code guidelines", blog posts (or even tweets!) by evangelists sponsored by Google/Amazon/Meta, or design mockups by designer who hates checkboxes.

m4632y ago

The attitudes are interesting.

WRT development, I wonder if there are attitudes that can be applied to software and hardware design that combat bad systems.

For example, cars with touchscreens instead of individual controls.

eschneider2y ago

If you want to become a better developer through aviation, I can't recommend anything more highly than reading through NTSB accident reports. Learn from others the many, many ways small problems and misjudgements become accidents. It'll change the way you build things.

jimkleiber2y ago

I'm really glad I stumbled on your comment. I train people in conflict resolution and emotional leadership and I've been looking for places to learn more about conflicts and causes and I think these NTSB reports could provide me a lot of examples from which to learn. They remind me of how at college I had a class on business communication and we discussed the communication issues that led to the Space Shuttle Challenger disaster.

Thank you for recommending the NTSB reports :-)

eschneider2y ago

The cliff note version: More people die because they're in a hurry/feel pressured and push a marginal situation when they should have known better than you could possibly believe.

1 more reply

evil-olive2y ago

the umbrella term for this in the aviation world is "crew resource management" [0] or CRM.

a major focus has been on encouraging less-experienced members of the flight crew to speak up if they notice something wrong, and ensuring that the more-experienced pilots are open to receiving that feedback instead of adopting an "I'm more senior, I know what I'm doing, don't question it" attitude.

0: https://en.wikipedia.org/wiki/Crew_resource_management

1 more reply

gljiva2y ago

Hello! Are there some educational materials for learning about conflict resolution / management you would recommend? If there are, can you please name them here?

imoverclocked2y ago

The Swiss-Cheese Model of Aviation Safety is useful to know before reading NTSB reports. It's a good lens to help in understand the context around the reports.

1 more reply

evil-olive2y ago

for approachable summaries of those accident reports, I'd recommend Admiral Cloudberg's "Plane Crash Series" on Medium [0] and Reddit [1], as well as YouTube videos from Mentour Pilot [2].

the latter's videos tend to have clickbait-y titles to make the YouTube algorithm happy, but the content is excellent.

0: https://admiralcloudberg.medium.com/

1: https://www.reddit.com/r/AdmiralCloudberg/comments/e6n80m/pl...

2: https://www.youtube.com/@MentourPilot

ohdannyboy2y ago

+1 for Mentour, being a commercial pilot himself his insights are fantastic.

_dain_2y ago

A lot of this wisdom is summarized in the book The Field Guide to Understanding "Human Error" by Sidney Dekker. I learned of it from this talk about Three Mile Island: https://www.youtube.com/watch?v=hMk6rF4Tzsg

jasonpeacock2y ago

Similarly, Accidents in North American Mountaineering[1] covers failures and their factors and is good reading.

[1] https://publications.americanalpineclub.org/about_the_accide...

thenobsta2y ago

I took a self-rescue course and a list at the end of their handout had such good engineering tips that I shared it with my eng team. Lots to be learned about engineering from climbing/mountaineering. The list was something like the following:

A few pro-tips:

- Slow is smooth. Smooth is fast.

- Simplicity/speed/efficiency is good.

- Strong is good. Redundancy is good/better.

- Equalization and Extension are a trade off.

- Not everything is a nail so NOT every tool is a hammer.

- Tools that have multiple uses are good.

Animats2y ago

It takes two major failures or errors today to cause the crash of a commercial transport aircraft. All the single points of failure have been fixed. You'll see this repeatedly in NTSB reports. Failure or event A happened, and then failure or event B happened. Single-event crashes of airliners are very, very rare.

JohnFen2y ago

> All the single points of failure have been fixed.

Now you've jinxed it!

sklargh2y ago

Agree with the above, fabulous learning tool. A theme that often dominates is small mistakes deep paid for with blood and treasure at usurious interest rates.

I also recommend Admiral Cloudberg. https://admiralcloudberg.medium.com/drama-in-the-snow-the-cr...

civilitty2y ago

There's also a lot to learn about the differences in solo work vs teamwork. The swiss cheese model plays out differently when it's GA vs airliners.

NTSB reports for general aviation tend to focus on individual mistakes since that's most often solo pilots with no ground crew, but for commercial flights it's generally a more complex series of mistakes made in a team.

jacquesm2y ago

And risks digest.

WalterBright2y ago

Control reversal, when the surfaces move opposite from the command, have happened. They nearly always result in a crash. How does Boeing prevent controls from being hooked up backwards?

The hydraulic actuators (rams) have an input and an output port. Connecting the hydraulic lines to the wrong port results in control reversal. To defend against that:

1. One port has left handed threads, the other right handed threads

2. The ports are different sizes

3. The ports are color coded

4. The lines cannot be bent to reach the wrong port

5. Any work on it has to be checked, tested, and signed off by another mechanic

And finally:

5. Part of the preflight checklist is to verify that the control surfaces move the right way

I haven't heard of a control reversal on airliners built this way, but I have heard of it happening in older aircraft after an overhaul.

WalterBright2y ago

As a former Boeing flight controls engineer, I wrote a couple articles about lessons that transfer to software:

Safe Systems from Unreliable Parts https://www.digitalmars.com/articles/b39.html

Designing Safe Software Systems Part 2 https://www.digitalmars.com/articles/b40.html

mips_r4300i2y ago

Good points. Embedded systems deal with many of those. It made me think of a funny story that causes me to pay closer attention to things like this:

Some time ago I shipped a product running an RTOS which unfortunately had a subtle scheduler bug where it would randomly crash periodically. The bug was pretty rare (I thought), only affecting part of the system, and reproducing the bug took several days each time.

In my infinite genius, rather than waste weeks of valuable time up to release, I set up the watchdog timer on the processor to write a crash dump and silently reboot. A user would maybe see a few seconds of delayed input and everything would come back up shortly.

Unfortunately, I had accidentally set the watchdog clock divider the wrong way, resulting in the watchdog not activating for over 17 hours after a hang!

The bug became much more widely noticeable after the product was released, and only by sheer luck, many people never noticed it.

I eventually fixed the scheduler bug in an update, but the useless watchdog configuration was set in stone and not fixable. Taught me to never assume a rare bug would stay rare when many tens of thousands of people use something in the field.

KolmogorovComp2y ago

> NATO Phonetic alphabet (Alpha, Bravo, Charlie…).

NIT, A is written as Alfa in the NATO alphabet [0] as it is easier to understand its pronunciation. For the same reason J is written as Juliett (two t), because in some languages t can be silent.

[0] https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

gbacon2y ago

Don't forget their cousins "tree" and "fife" — or the overpronunciation of Papa as /papà/.

KolmogorovComp2y ago

As well as "niner", but I didn't want to delve into all the subtleties.

josefrichter2y ago

I once heard a recording of a conversation between ATC and an aircraft with a registration OK-PPP. Every communication from either side starts with Oskar Kilo Papapapapapa :-)

psunavy032y ago

No one actually uses "tree" and "fife."

LorenPechtel2y ago

I very much believe in swiss cheese safety systems. There *will* be errors, you try to avoid them becoming catastrophes.

And I hate systems that don't let you say "ignore *this* warning" without turning off all warnings. I have some Tile trackers with dead batteries--but there's no way I can tell the app to ignore *that* dead battery yet tell me about any new ones that are growing weak. (We haven't been using our luggage, why should I replace the batteries until such day as the bags are going to leave the house again?)

warner252y ago

It seems like there are almost daily HN front page items about aviation, and a lot of pilots in the comments. I've wondered about the reasons for such an overlap in interests among people here.

I fit this myself: I grew up playing flight simulators, studied computer science as an undergrad, was a military helicopter pilot for a while, and then went to grad school for computer science. Along the way, I've personally met at least half a dozen other academic computer scientists with a pilot's license or military aviation background. Is it just selective attention / frequency illusion for me, or is there more to this?

JohnFen2y ago

> I've wondered about the reasons for such an overlap in interests among people here.

I bet that a large part of why is that people here tend to have reasonably high incomes, and flying is an expensive hobby. I'm sure that flying would be an incredibly popular hobby across all demographics if it were affordable.

samtho2y ago

Flying was not as expensive as I thought especially since I found a school that lets you pay by the lesson. Once you got your 40 hours, you can, without much restriction, fly general aviation aircraft (that you are type-rated on) wherever you want in the US at least. A non club price puts a Cessna at about $110-$160/hr wet (time engine is on, 100LL Avgas included). Expensive, but not so out of reach.

The thing you need to be rich in is free time. You can have all the money in the world but if you don’t have the time to put in, you’re not getting into this hobby.

josefrichter2y ago

I studied Air Traffic Management = basically managing and evolving ATC systems. I work now as a product designer, which for the most part involves conceptual design, dealing with complex flows and optimising them, dealing with imperfect humans using those systems, "solving problems" that go far beyond design. I often say it's the exact same job, just in slightly different domain.

zeroc82y ago

I became a pilot at age 20, got all my ratings up to a frozen ATPL, worked as a flight instructor, gave up my airline ambitions due to deteriorating eyesight, became a software developer, worked on other stuff for 20 years, rekindled my interest using Xplane, got a job working on a new flight planning system for a major airline...

ArnoVW2y ago

I'll add a datapoint. Not a pilot, just like to know how complicated stuff works. And, like the article, I do think there are lessons to learn from aviation.

hcarvalhoalves2y ago

> Build for resiliency and designed to fail safely

This is important, but I'm not sure everybody necessarily agrees on what "fail safely" means.

Fail safely can mean one of:

- It doesn't fail silently

- It doesn't cause cascading failures

- It doesn't cause infinite failure loops

- It doesn't fail in ways that corrupt data

- It doesn't fail in ways you lose money

- You can safely retry

- You can safely retry anytime (not just today, or just this month)

rad_gruchalski2y ago

This article isn't complete without mentioning DO-178C: Design guidance for aviation software development.

karmelapple2y ago

And the different levels therein can help you think about your own systems, even if someone won’t crash a plane if your software fails.

A large passenger aircraft does not solely consist of Level A software. There’s plenty of not-flight-safety-critical software on any airplane you ride as a civilian passenger, but there is some Level A software that could cause the worst consequences if it fails.

Think about what pieces of your software are critical to your company/team’s mission, and which aren’t so bad if they fail. Not every line of code you write, or system you build, will wreak havoc on your company’s primary mission.

akhayam2y ago

I have taken so much inspiration aviation industry when designing and operating software system. In addition to the inspirations mentioned in this blog, I find the idea of "antifragility" in aviation quite fascinating, where every near miss is studied, documented and checklisted across the entire aviation industry. This means that every near miss improves the resilience of the entire industry. We need to build similar means of learning from others' mistakes in complex software systems as well.

SoftTalker2y ago

Makes sense if your software is responsible for keeping people alive. Most of us don't need to work to such a standard (thankfully).

jacquesm2y ago

You don't always know if your software is going to be responsible for keeping people alive. Operating systems, system components, firmware in devices and so on are all potentially software that can be responsible for keeping people alive.

Let me give you a simple and easy to understand example: an MP3 decoder performs the boring task of transforming one bunch of numbers into another bunch of numbers. This second bunch of numbers is then fed into a DAC which in turn feeds into an amplifier. If your software malfunctions it could cause an ear splitting sound to appear with zero warning while the vehicle that your MP3 decoder has been integrated into is navigating a complex situation. The reaction of the driver can range from complete calm all that way to a panic including involuntary movements. This in turn can cause loss or damage of property, injury and ultimately death.

Farfetched? Maybe. But it almost happened to me, all on account of a stupid bug in an MP3 player. Fortunately nothing serious happened but it easily could have.

So most of us should try harder to make good software, because (1) there should be some pride in creating good stuff and (2) you never really know how your software will be used once it leaves your hands so better safe than sorry.

syndicatedjelly2y ago

There’s a certain level of arrogance that comes from the people who don’t work on safety critical stuff, that we could all do without

maccard2y ago

Eh.

I make video games. _Everything_ in games is a trade-off. There are areas of my code that are bulletproof, well tested, fuzzex and rock solid. There are parts of it (running in games people play, a lot) that will whiff if you squint too hard at it. Deciding when to employ the second technique is a very powerful skill, and knowing what corners to cut can result in software or experiences that handle the golden path case so much better, you decide it's worth the trade off of cutting said corner.

I'll let you know when I find the right balance.

1 more reply

ryandrake2y ago

I always thought “well, nobody’s gonna die” is a crappy attitude for any professional developer. We should care about quality and getting it right, regardless of the stakes.

QA: ”Look, if that integer overflows here, your software is going to fail.” Dev: “Well, it’s a cooking recipe app. Nobody’s gonna die!” How low of an opinion you must have of your own profession if you’re going to excuse yourself this way!

1 more reply

rad_gruchalski2y ago

That does not mean there are no valuable lessons in there.

jacquesm2y ago

Fantastich thread this, thank you fcmam5, I'm bookmarking this for future reference.

My own contribution is to recommend reading risks digest:

http://catless.ncl.ac.uk/Risks/

vunderba2y ago

When I started working for more consumer facing application developing companies, I tried to adopt the software developers equivalent of "if you're the developer of a new experimental plane, you're the first to go up in the said plane."

maxbond2y ago

I had a similar experience, and found "aviate, navigate, communicate" to be an excellent model for responding to production incidents.

r2on3nge2y ago

This is so fascinating! Continuing becoming better and better everyday.

deathanatos2y ago

It's a lot of good advice, but IME the next step is "but how do I actually do this?"

A lot of the difficultly boils down to an inverse NIH syndrome: we outsource monitoring and alerting … and the systems out there are quite frankly pretty terrible. We struggle with alert routing, because alert routing should really take a function that takes alert data in and figures out what to do with it … but Pagerduty doesn't support that. Datadog (monitoring) struggles (struggles) with sane units, and IME with aliasing. DD will also alert on things that … don't match the alert criteria? (We've still not figured that one out.)

“Aviate, Navigate, Communicate” definitely is a good idea, but let me know if you figure out how to teach people to communicate. Many of my coworkers lack basic Internet etiquette. (And I'm pretty sure "netiquette" died a long time ago.)

The Swiss Cheese model isn't just about having layers to prevent failures. The inverse axiom is where the fun starts: the only failures you see, by definition, are the ones that go through all the holes in the cheese simultaneously. If they didn't, then by definition, a layer of swiss has stopped the outage. That means "how can this be? like n different things would have to be going wrong, all at the same time" isn't really an out in an outage: yes, by definition! This is too, of course, assuming you know what holes are in your cheese, and often, the cheese is much holier than people seem to think it is.

I'm always going to hard disagree with runbooks, though. Most failures are of the "it's a bug" variety: there is no possible way to write the runbook for them. If you can write a runbook, that means you're aware of the bug: fix the bug, instead. The rest is bugs you're unaware of, and to write a runbook would thus require clairvoyance. (There are limited exceptions to this: sometimes you cannot fix the bug: e.g., if the bug lies in a vendor's software and the vendor refuses to do anything about it¹, then you're just screwed, and have to write down the next best work around, particularly if any workaround is hard to automate. There are other pressures, like PMs who don't give devs the time to fix bugs, but in general runbooks are a drag on productivity, as they're manual processes you're following in lieu of a working system. Be pragmatic about when you take them on (if you can).

> Have a “Ubiquitous language”

This one, this one is the real gem. I beg of you, please, do this. A solid ontology prevents bugs.

This gets back to the "teach communication" problem, though. I work with devs who seem to derive pleasure from inventing new terms to describe things that already have terms. Communicating with them is a never ending game of grabbing my crystal ball and decoding WTF it is they're talking about.

Also, I know the NATO alphabet (I'm not military/aviation). It is incredibly useful, and takes like 20-40 minutes of attempting to memorize it to get it. It is mind boggling that customer support reps do not learn this, given how shallow the barrier to entry is. (They could probably get away with like, 20 minutes of memorization & then learn the rest just via sink-or-swim.)

(I also have what I call malicious-NATO: "C, as in sea", "Q, as in cue", "I, as in eye", "R, as in are", U, as in "you", "Y, as in why")

> Don’t write code when you are tired.

Yeah, don't: https://www.cdc.gov/niosh/emres/longhourstraining/impaired.h...

And yet I regularly encounter orgs or people suggesting that deployments should occur well past the 0.05% BAC equivalent mark. "Unlimited PTO" … until everyone inevitably desires Christmas off and then push comes to shove.

Some of this intertwines with common PM failure modes, too: I have, any number of times, been pressed for time estimates on projects where we don't have a good time estimate because there are two many unknowns in the project. (Typically because whomever is PM … really hasn't done their job in the first place of having even the foggiest understanding of what's actually involved, inevitably because the PM is non-technical. Having seen a computer is not technical.) When the work is then broken out and estimates assigned to the broken out form, the total estimate is rejected, because PMs/management don't like the number. Then inevitably a date is chosen at random by management. (And the number of times I've had a Saturday chosen is absurd, too.) And then the deadline is missed. Sometimes, projects skip right to the arbitrary deadline step, which at least cuts out some pointless debate about, yes, what you're proposing really is that complicated.

That's stressful, PMs.

¹ cough Azure cough excuse me.

michaelrpeskin2y ago

I use “E, as in Eminem” in my malicious alphabet.

j / k navigate · click thread line to collapse

91 comments

Animats2y ago

Minor lessons from time at an aerospace company:

- Make it easy to determine what's broken. Self-test features are essential. (USAF officer.)

- There are not "bugs". There are "defects". (HP exec.)

- Let the fighter pilot drive. Just sit back and enjoy the world zooming by. (Navy aviator.)

Aerospace is a world with many hard-ass types, many of whom have been shot at and shot back, have landed a plane in bad weather, or both.

SoftTalker2y ago

I like the term "defect" it's more accurate than "bug."

medler2y ago

2 more replies

cromulent2y ago

treve2y ago

Is it? I think the definition of Bug is literally defect. Maybe it carries a different 'weight' for some people?

3 more replies

jacquesm2y ago

> There are not "bugs". There are "defects".

Priceless, I've been trying make that point for years but nobody seems to want to listen.

photochemsyn2y ago

So buggy is a subset of defective but it's even worse?

1 more reply

resonious2y ago

What's the difference? I'm not sure how this renaming would change my behavior.

1 more reply

gbacon2y ago

Fun to consider as both a computer scientist and a CFI.

bulte-rs2y ago

Former airline pilot checking in!

edit: Although it's not aviation specific, and a little light on the science, "The Checklist Manifesto" by A. Gawande is a nice introduction into using (and making) checklists.

jacquesm2y ago

And the value of good documentation, and actually reading that documentation as well as making sure the documentation is indexed and quick to peruse in a situation where you don't have time to waste.

ryandrake2y ago

paulddraper2y ago

IIRC the five hazardous attitudes are required material for all pilots not just IFR.

the__alchemist2y ago

Another good one: Warnings, cautions, rules etc are often written in blood.

RugnirViking2y ago

perilunar2y ago

> anti-authority ("the rules don't apply to me")

samtho2y ago

expertentipp2y ago

m4632y ago

The attitudes are interesting.

WRT development, I wonder if there are attitudes that can be applied to software and hardware design that combat bad systems.

For example, cars with touchscreens instead of individual controls.

eschneider2y ago

jimkleiber2y ago

Thank you for recommending the NTSB reports :-)

eschneider2y ago

The cliff note version: More people die because they're in a hurry/feel pressured and push a marginal situation when they should have known better than you could possibly believe.

1 more reply

evil-olive2y ago

the umbrella term for this in the aviation world is "crew resource management" [0] or CRM.

0: https://en.wikipedia.org/wiki/Crew_resource_management

1 more reply

gljiva2y ago

Hello! Are there some educational materials for learning about conflict resolution / management you would recommend? If there are, can you please name them here?

imoverclocked2y ago

The Swiss-Cheese Model of Aviation Safety is useful to know before reading NTSB reports. It's a good lens to help in understand the context around the reports.

1 more reply

evil-olive2y ago

for approachable summaries of those accident reports, I'd recommend Admiral Cloudberg's "Plane Crash Series" on Medium [0] and Reddit [1], as well as YouTube videos from Mentour Pilot [2].

the latter's videos tend to have clickbait-y titles to make the YouTube algorithm happy, but the content is excellent.

0: https://admiralcloudberg.medium.com/

1: https://www.reddit.com/r/AdmiralCloudberg/comments/e6n80m/pl...

2: https://www.youtube.com/@MentourPilot

ohdannyboy2y ago

+1 for Mentour, being a commercial pilot himself his insights are fantastic.

_dain_2y ago

jasonpeacock2y ago

Similarly, Accidents in North American Mountaineering[1] covers failures and their factors and is good reading.

[1] https://publications.americanalpineclub.org/about_the_accide...

thenobsta2y ago

A few pro-tips:

- Slow is smooth. Smooth is fast.

- Simplicity/speed/efficiency is good.

- Strong is good. Redundancy is good/better.

- Equalization and Extension are a trade off.

- Not everything is a nail so NOT every tool is a hammer.

- Tools that have multiple uses are good.

Animats2y ago

JohnFen2y ago

> All the single points of failure have been fixed.

Now you've jinxed it!

sklargh2y ago

Agree with the above, fabulous learning tool. A theme that often dominates is small mistakes deep paid for with blood and treasure at usurious interest rates.

I also recommend Admiral Cloudberg. https://admiralcloudberg.medium.com/drama-in-the-snow-the-cr...

civilitty2y ago

There's also a lot to learn about the differences in solo work vs teamwork. The swiss cheese model plays out differently when it's GA vs airliners.

jacquesm2y ago

And risks digest.

WalterBright2y ago

Control reversal, when the surfaces move opposite from the command, have happened. They nearly always result in a crash. How does Boeing prevent controls from being hooked up backwards?

The hydraulic actuators (rams) have an input and an output port. Connecting the hydraulic lines to the wrong port results in control reversal. To defend against that:

1. One port has left handed threads, the other right handed threads

2. The ports are different sizes

3. The ports are color coded

4. The lines cannot be bent to reach the wrong port

5. Any work on it has to be checked, tested, and signed off by another mechanic

And finally:

5. Part of the preflight checklist is to verify that the control surfaces move the right way

I haven't heard of a control reversal on airliners built this way, but I have heard of it happening in older aircraft after an overhaul.

WalterBright2y ago

As a former Boeing flight controls engineer, I wrote a couple articles about lessons that transfer to software:

Safe Systems from Unreliable Parts https://www.digitalmars.com/articles/b39.html

Designing Safe Software Systems Part 2 https://www.digitalmars.com/articles/b40.html

mips_r4300i2y ago

Good points. Embedded systems deal with many of those. It made me think of a funny story that causes me to pay closer attention to things like this:

Unfortunately, I had accidentally set the watchdog clock divider the wrong way, resulting in the watchdog not activating for over 17 hours after a hang!

The bug became much more widely noticeable after the product was released, and only by sheer luck, many people never noticed it.

KolmogorovComp2y ago

> NATO Phonetic alphabet (Alpha, Bravo, Charlie…).

NIT, A is written as Alfa in the NATO alphabet [0] as it is easier to understand its pronunciation. For the same reason J is written as Juliett (two t), because in some languages t can be silent.

[0] https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

gbacon2y ago

Don't forget their cousins "tree" and "fife" — or the overpronunciation of Papa as /papà/.

KolmogorovComp2y ago

As well as "niner", but I didn't want to delve into all the subtleties.

josefrichter2y ago

I once heard a recording of a conversation between ATC and an aircraft with a registration OK-PPP. Every communication from either side starts with Oskar Kilo Papapapapapa :-)

psunavy032y ago

No one actually uses "tree" and "fife."

LorenPechtel2y ago

I very much believe in swiss cheese safety systems. There *will* be errors, you try to avoid them becoming catastrophes.

warner252y ago

It seems like there are almost daily HN front page items about aviation, and a lot of pilots in the comments. I've wondered about the reasons for such an overlap in interests among people here.

JohnFen2y ago

> I've wondered about the reasons for such an overlap in interests among people here.

samtho2y ago

The thing you need to be rich in is free time. You can have all the money in the world but if you don’t have the time to put in, you’re not getting into this hobby.

josefrichter2y ago

zeroc82y ago

ArnoVW2y ago

I'll add a datapoint. Not a pilot, just like to know how complicated stuff works. And, like the article, I do think there are lessons to learn from aviation.

hcarvalhoalves2y ago

> Build for resiliency and designed to fail safely

This is important, but I'm not sure everybody necessarily agrees on what "fail safely" means.

Fail safely can mean one of:

- It doesn't fail silently

- It doesn't cause cascading failures

- It doesn't cause infinite failure loops

- It doesn't fail in ways that corrupt data

- It doesn't fail in ways you lose money

- You can safely retry

- You can safely retry anytime (not just today, or just this month)

rad_gruchalski2y ago

This article isn't complete without mentioning DO-178C: Design guidance for aviation software development.

karmelapple2y ago

And the different levels therein can help you think about your own systems, even if someone won’t crash a plane if your software fails.

akhayam2y ago

SoftTalker2y ago

Makes sense if your software is responsible for keeping people alive. Most of us don't need to work to such a standard (thankfully).

jacquesm2y ago

Farfetched? Maybe. But it almost happened to me, all on account of a stupid bug in an MP3 player. Fortunately nothing serious happened but it easily could have.

syndicatedjelly2y ago

There’s a certain level of arrogance that comes from the people who don’t work on safety critical stuff, that we could all do without

maccard2y ago

Eh.

I'll let you know when I find the right balance.

1 more reply

ryandrake2y ago

I always thought “well, nobody’s gonna die” is a crappy attitude for any professional developer. We should care about quality and getting it right, regardless of the stakes.

1 more reply

rad_gruchalski2y ago

That does not mean there are no valuable lessons in there.

jacquesm2y ago

Fantastich thread this, thank you fcmam5, I'm bookmarking this for future reference.

My own contribution is to recommend reading risks digest:

http://catless.ncl.ac.uk/Risks/

vunderba2y ago

maxbond2y ago

I had a similar experience, and found "aviate, navigate, communicate" to be an excellent model for responding to production incidents.

r2on3nge2y ago

This is so fascinating! Continuing becoming better and better everyday.

deathanatos2y ago

It's a lot of good advice, but IME the next step is "but how do I actually do this?"

> Have a “Ubiquitous language”

This one, this one is the real gem. I beg of you, please, do this. A solid ontology prevents bugs.

(I also have what I call malicious-NATO: "C, as in sea", "Q, as in cue", "I, as in eye", "R, as in are", U, as in "you", "Y, as in why")

> Don’t write code when you are tired.

Yeah, don't: https://www.cdc.gov/niosh/emres/longhourstraining/impaired.h...

That's stressful, PMs.

¹ cough Azure cough excuse me.

michaelrpeskin2y ago

I use “E, as in Eminem” in my malicious alphabet.

j / k navigate · click thread line to collapse