https://s3.documentcloud.org/documents/24410269/report_dca24...
I'd be curious to know how many non-conformances they typically see during assembly of a plane and whether management is actually allowing the quality department sufficient independence to investigate these issues and fully resolve them. I'm guessing that the production personnel are under tremendous time constraints and are constantly pressure the quality assurance people to sign off on whatever paperwork is holding up the line, no matter the safety implications.
Also, I think a lot of middle and upper level management needs to lose their jobs over this. I hope this mess ends up in textbooks and gets beaten into the head of every MBA student in the country.
Very likely that number is meaningless. I suspect this is the kind of environment that incentivises hiding non-conformances whenever possible.
For example, better quality control usually results in an increase of number of defects, at least temporarily. But that just because large portion of these defects were undetected before.
So... you are looking at a number that you have nothing to compare to that also depends on how closely the process is monitored and also depends a lot on the definition of what is non-conformance.
It is like trying to give an answer to "what is the length of Britain's coastline?" Everybody knows that you can get whatever answer you want depending on how long the ruler is.
That, by itself, should be the kind of thing that should shutdown a company permanently. Remember, this is the aviation industry, where they track the mine where the ore from the bolt was mined, and who tightened the bolt up and with which torque wrench.
Given that if a worker doing the work raised this problem or took initiative to resolve it, they'ld probably be punished; I completely agree.
Reminds me so much of how, once-upon-a-time, there seemed to be actual engineering management in a cooperatively adversarial relationship with business managers but not anymore. Now any sort of engineering in business seems to be completely business managed and business minded. I'm sure it's great for profits while it lasts but I haven't observed engineering becoming better and I suspect business is suffering by overextending itself too, I just don't have any solid observations.
(Well, maybe one, my brother does warehouse / logistics management and says, despite there being every reason in the world, he has never seen the accounting software and the inventory software successfully and productively linked. So, big opportunity there for a serious player but maybe not the profitable compared to the issue?)
Apparently, there are two ticketing systems (one for "history of plane," Boeing internal, and one for "day-to-day onsite work," visible by contractors and Boeing management). The work to fix the rivets was logged in the day-to-day, but management and the onsite staff managed to convince themselves that merely opening the plug to fix the vacuum-seal trim did not constitute "removing" the plug, and since there was only an entry in the history-of-plane log for removing, not opening, they didn't log it there (when the intent was "there's no entry for 'just opening' because there's no such thing as 'just opening', breaching the pressure vessel at all constitutes 'removal of plug'").
The final inspection that should have caught the error would have been triggered by the update in the history-of-plane ticketing queue.
(And as for 'how many non-conformances,' the same source claims that Spirit is one of the few subcontractors with on-site staff at the factory because their parent company delivers such consistently shoddy out-of-compliance product that they are continuously doing final warrenty-work onsite. So maybe "fire that vendor" should be on the docket too).
From the portions of the report that hint at corroboration,
> Documents and photos show that to perform the replacement of the damaged rivets, access to the rivets required opening the left MED plug (see figure 15). To open the MED plug, the two vertical movement arrestor bolts and two upper guide track bolts had to be removed.
> Records show the rivets were replaced per engineering requirements on Non-Conformance (NC) Order 145-8987-RSHK-1296-002NC completed on September 19, 2023, by Spirit AeroSystems personnel. Photo documentation obtained from Boeing shows evidence of the left-hand MED plug closed with no retention hardware (bolts) in the three visible locations (the aft upper guide track is covered with insulation and cannot be seen in the photo)
If management in the aerospace industry works like management in the software industry, then I guess they are pushing for results as agressively as possible without much concern about safety or anything else.
In some software projects the level of rush, and the fact that bugs sometimes would leak into production was kinda horrifying. It would've been way more so, if it would've been the kind of project that could kill people in case of failure. Like it happened in Chernobyl with nuclear reactors, or at Boeing with planes.
I can't really imagine what these engineers feel when they rush this kind of work knowing what's at stake.
If you believe in quality engineering, but refuse to engage in the political and business dimensions of an enterprise to fight for that view, then you are just virtue signaling — since you’re refusing to engage with the tools needed to make it happen.
> As a result, this check job that should find minimal defects has in the past 365 calendar days recorded 392 nonconforming findings on 737 mid fuselage door installations (so both actual doors for the high density configs, and plugs like the one that blew out). That is a hideously high and very alarming number, and if our quality system on 737 was healthy, it would have stopped the line and driven the issue back to supplier after the first few instances.
Source:
https://leehamnews.com/2024/01/15/unplanned-removal-installa...
Well, the report says "During the build process, one quality notification (QN NW0002407062) was noted indicating the seal flushness was out of tolerance by 0.01 inches.
So I'd say they've had about 2,407,062 quality issues :)
The planes they worked on did not share an assembly line with the 737 but another Boeing model…
I didn't follow how that issue evolved.
Lol, they said the door plug "departed" instead of "blew the f* off"
So this all just serves to confirm that report, and what people suspected for a while; the bolts were just missing, removed for removing the door plug for the rivet rework and never reinstalled.
there was apparently two (identical) procedures -- one for "removing" the door plug and one for "opening" the door plug -- but only one of them actually called for paper-work and quality inspection after it is done.
It seems they decided to "open" the door plug instead of "remove" it to work on the sealing issue? -- and thereefore documentary record of the work performed and inspection thereafter is lacking.
It would be useful IMO to explore what led the team to prefer "opening" versus "removing" -- and if the subtle difference in documentation requirement was a consideration in preferring one over the other -- and if that points to deeper pervasive culture issues that leans towards less work rather than give safety and quality the primary importance.
I can remember when I was small my Mom mentioned that doors of buildings in the US are mandatory to open to the outside, a rule that does not exist in Europe AFAIK. So there you have it, you have a significant higher chance of crashing in a Boeing, but when it happens, your can leave the plane 2 seconds faster.
There must be a middle ground here- the paradox is that Google, Apple, etc have this ability to generate user friendly software and hardware at scale. But they aren't considered "battle proven". The expensive proprietary systems that are used instead tend to be hard to use and brittle, so what's the middle ground?
Read https://www.airlinepilotforums.com/safety/146074-boeing-inte...
And then this from the doc: "The investigation continues to determine what manufacturing documents were used to authorize the opening and closing of the left MED plug during the rivet rework."
https://s3.documentcloud.org/documents/24410269/report_dca24...
I mean, there is already a ton of documentation and process surrounding the construction of an airplane. Adding more process doesn't safety make. Having a safety culture without the fear of retaliation, on the other hand, makes a world of difference.
I don't know if this should be considered "adding more process" because it has been standard process for a very long time. All work done on an airplane is authorized, by someone, and after completion is recorded, by someone. Discrepancies and deviations from this standard operating procedure are a big deal.
Ooofff. No bolts at all! How did this pass Boeing QA?
https://www.airlinepilotforums.com/safety/146074-boeing-inte...
https://www.youtube.com/watch?v=xIAfCupuZ3w
Edit: Also that entire culture and dynamic between Boeing and Spirit on the production floor seems very toxic and driven by misaligned incentives. There should be zero place for bickering, aggressiveness, and finger pointing in that SAT channel. If something needs to be fixed Spirit needs to fix it with a smile. If Boeing and Spirit need to review who is responsible for what, and what needs fixing and doesn't, that should happen in review in a different setting. The production crews need to be able to execute on their processes and focus on the quality of the product without having timeline and budget concerns seeping into their day-to-day.
I like how that comment is from an anonymous source, but now that the NTSB preliminary report is out, it seems thoroughly corroborated to me. The dates of certain events and the reason for the door's removal—er, "opening"—both match the comment in your link.
Thanks.
Reading through that post gave me nightmares of dealing with outsourcing software teams where you send them a small issue to fix and their fix breaks 3 existing items.
[0]: https://www.airlinepilotforums.com/safety/146074-boeing-inte...
I'm going to be contrarian and say that this is exactly the sort of thing that happens when you train humans to be robots: They lose all signs of common sense and critical thinking, and what's worse is that on top of that, they'll still have their inherent imperfection. Normally the former would counteract the latter, but not if you only make them rigidly follow some process all the time. They stop thinking about what they're doing. They stop paying attention to all the other things in their environment they would've noticed, and even if they do, they won't question it because they'll just assume someone else also following a rigid process will take care of it. They won't think "this door plug should've been bolted in place now that the work that needed it opened is done, but where are the bolts?"
I'm not saying to throw out all the process and make them figure everything out, but I think there has to be a balance, similar to how overautomation and reliance on that has also lead to avoidable incidents in aviation.
The process was there so that the people would know there was work being done on the doors despite not being there for it. If you see an unfinished work from a previous shift, it does not mean you can start messing with it - there might be context you do not know.
Which is why such things are supposed to be noted in appropriate ways. Similarly why aviation has so many procedures everywhere - because we know and understand that sometimes you miss things. For any human reason, not just mismanagement. The process is a way to have reliable place to double check with.
This is different from over reliance on automation, which is arguably less of an issue of automation itself (it's just more visible in such areas) as much as getting out of training because you do not encounter certain things so often. 96 people died because in a stream of many deviations, among other things, the crew never trained how to do IFR landing without ILS, autopilot or no autopilot.
The process is the part that says "yeah, I haven't done this in a long time, I need to train, here is documentation that provides we need to do it and can't delay".
Similarly CMES is supposed to track "work was done on this part of the ticket, now different work needs to be done, do not assume it will be done by other teams"
But the CEO who started the moves etc. was a Boeing lifer.
There was noticeable damage to the door plug's mechanical fittings from the violence of it being blown out of the plane. But the holes where the bolts belonged were pristine. That would not have been true if the holes had had bolts in them.
>The CVR was downloaded successfully; however, it was determined that the audio from the accident flight had been overwritten. The CVR circuit breaker had not been manually deactivated after the airplane landed following the accident in time to preserve the accident flight recording.
Classic. If they use CD quality audio at 1411kbps, they can store 2 hours of audio in about 1.2 GB. Given how cheap flash is these days, why not 20x that so that we don't have to rely on people pulling circuit breakers after accidents? If there's some concern about robustness and recertification, why not require all aircraft to carry two CVRs, one of the old "robust" style for kinetic accidents, and one that's less robust but has 20x the capacity, so we can record a full day after less violent accidents?
Theory: having less privacy makes things easier for accident investigators, post-mortem.
Reality: In this case, the pilots did their job and got the plane down safely despite rapid depressurization and literally having their headsets sucked off of their heads. It is extremely unlikely to be pilot-error that a door-plug ripped off the airframe at 16,000' or that investigators would learn anything significant from the process in the flight-deck before or after the incident. At least nothing that would root-cause this incident.
Sincerely, someone who had death threats partially thanks to manipulated audio record that was done in good faith during investigation, which was leaked and edited later by third party who gained access to it 5 years later.
School counselor: Max, I'm… I'm sorry, I… I really can't discuss this. You wouldn't want me talking to any other students about you, right?
Max: If I were dead and it would help catch the killer, then yeah, I most definitely would.
https://subslikescript.com/series/Stranger_Things-4574334/se...
As my first boss, meat clerk young lady, told me “shit rolls down hill.” More powerful people tend to get shitted on less. It was a motivation to move up.
But I still think it’s shitting on people to expect or accept constant recording of everything mundane thing while awaiting the exceptional [screw up]. Pilots are more powerful than Amazon warehouse workers but recording every breath, every whisper, ever fart is undoubtedly shit in a warehouse or a cockpit or an operating room.
Then again, the only way I could accept it is if everyone is recorded all the time and it was all public or at least FOIA able for many people. Especially the government and universities and Wall Street other wise it’s just a way to control and hang things over peoples heads.
As to the tipping grumpiness I grew up partly in the 3rd world where tipping 50 cents was a great tip and I’m cheap and didn’t/don’t make tech bro money. I found the ultimate solution was to just not eat out so much except for truly special occasions. I’m sure there’s a lesson in there too somewhere.
https://www.federalregister.gov/documents/2023/12/04/2023-26...
>Notice of proposed rulemaking (NPRM).
>[...]
>DATES:
>Send comments on or before February 2, 2024.
Seems like it's a proposal, and not actually enacted yet?
> Given how cheap flash is these days
How cheap is flash that will survive a sudden stop from 400mph to 0 mph in no seconds flat, will survive a post-crash fire, and/or submersion for years in salt water?Flash data retention at high temps is TERRIBLE (and gets worse for MLC/TLC/etc), see any flash datasheet. It is NOT nearly as simple a problem as you might think.
Yes, it is a solvable problem, but please do not dismiss it so outright as "trivial"
[1] https://mentourpilot.com/who-doesnt-want-25-hour-cockpit-voi...
Exposure to super-high temps occurs in a small set of circumstances, all of which overlap with the destruction of the recording device and the cessation of incoming data. So we only need the same 1.2GB (or whatever) of high-temperature-tolerant storage.
The 25 hour storage can be on normal flash, as if we're more than 2 hours past the incident and data is continuing to come in, then the incident of interest did not destroy the airplane, and the flash will have remained within its normal operating parameters.
It's a material science problem, and other forms of media are affected by high temperatures and physical deformation just as much as flash if not more.
Irritatingly, they didn't even pick the top-of-the-line machine from the vendor at that time. They picked a middling one. And then put an LTS OS version on it that didn't fully support the motherboard chipset. I spent way, way too much time an energy trying to get the software to run on the sort of timescales necessary. It took me months to get anyone to let me talk to the vendor in order to sort out the fact that the storage was being run in legacy PATA mode, reducing our IO throughput by an order of magnitude and the application throughput by about a third.
Ten minutes on the phone and I got them to agree to give us a patch that aliased the chipset to one it was backward compatible with, that was actually supported by the OS. But they really wanted us to take the never version of the OS that didn't have this problem.
That's not even the most hard-ware crippled I'd ever been, but it was top three.
Ok
>Photos from the interior repair that show the lack of bolts
Huh. Well that's conclusive.
Interesting for terrorists. Cause a rapid decompression, and get easy access to the cockpit.
Also how do you cause a rapid decompression without a gun of some kind?
https://i.cbc.ca/1.7077373.1704733027!/fileImage/httpImage/g...
OXYGEN MASKS - DON
OXYGEN REGULATORS - Set to 100%
CREW COMMUNICATIONS - ESTABLISH
PRESSURIZATION MODE SELECTOR - MAN AC/MAN
OUTFLOW VALVE SWITCH - CLOSE
Hold in CLOSE until outflow Valve indicates fully closed
If Pressurization is Not Controllable
PASSENGER SIGNS - ON
PASSENGER OXYGEN SWITCH - ON
EMERGENCY DESCENT - ANNOUNCE
The pilot flying will advise the cabin crew, on PA system, of impending rapid descent. The pilot monitoring will advise ATC and obtain area altimeter setting.
PASSENGERS SIGN - ON
DESCENT - INITIATE
I do giggle a little at the thought of a door flying off, the air rushing out of the cabin, and the pilots responding by switching the seatbelt light on.
The plane was only at 16,000 feet when it lost its door and according to [2] you've got 20-30 minutes of 'useful consciousness' at such an altitude, even without your oxygen mask on. So there was no need for an abrupt dive.
[1] https://www.theairlinepilots.com/forumarchive/b737/b737memor... [2] https://skybrary.aero/articles/time-useful-consciousness
Edit: Re-reading, it was more like 16K feet when it popped, 10K is what ATC assigned them when requested. Still low enough not to be a critical emergency. Some people absolutely will get altitude sickness at that level, but it's likely to be mild. Many people climb mountains much taller.
Better to climb for a bit more as you get your oxygen mask on than to try to descend immediately and make some problem worse.
We know it was a door plug blowing out, but in the past it has been entire major sections of the airframe ripping off, in which case sudden extra stresses are not what you want.
Pilots can't look at the rear view mirror, and see the whole plane. Accident reports on engine malfunctions routinely mention that someone had to check their appearance through the passenger window, and relay that to the pilots. In case of a blast so severe that the door flies off, it is safer to assume the worst. Say, that part of the plane disintegrated because of sudden collision. In such conditions, indications on what works and what doesn't are probably really messy and unreliable, and there can be not enough means to control the plane properly. Lower the nose too much, and you might not be able to pull it up any more.
Pilots probably did checklists with one eye on the instruments to check that they were not losing speed, that angles were correct, autopilot inputs resulted in stable flying, and so on, and deduced that everything still worked. By that time, they were probably informed that the plane was seemingly intact, although with a hole in its side.
Not really. The cockpit door was blown open, and the pilot's headsets were blown off. It was a pretty chaotic event, and when you are flying an airplane, you definitely don't want to figuratively "jerk the wheel" - you remain calm and start running checklists.
Looking at the track, they descend to 10000' until they start their downwind to base turn. Once they start that turn, they get a lower altitude (looks like 7000') until they are established on final and can fly an approach.
How the fuck is this still a problem on brand new aircraft?
Smells like CISCO
Lol, they said the door plug "departed" instead of "blew the f** off"
'Forgetting' to put in any of the screws holding a gas tank in place in a car?
'Missing' all welds in one of a skyscraper's lower columns?
An 'oversight' of providing rendundant instruments in an airplane with natural tendency to stall?
What a hopeless shitshow is going on there behind the company gates that these kind of things can happen in succession?
A duck forgot how to swimm, an eagle forgot how to fly, Boieing forgot how to build airplanes?
Or just throwing the rebar in at random maybe? [0]
[0] https://www.gr-us.com/%E2%80%9Chorror-at-the-harmon%E2%80%9D...
WTF! How was there no punishment for everyone involved in this?
In addition to local storage, why isn't the audio(along with location, altitude and some sensor information) also streamed using something like Starlink or Inmarsat to a secure location where you can store more data for cheaper and with more redundancy?
There’s also bandwidth and satellite coverage not being magic of course.
That said, there’s a standard and reliable 25-hour flight voice recorder that solves this problem. But it’s only used outside the US. That’s a regulatory inertia situation and I suspect this incident will speed changes in this area.
However, finally, and particularly in relation to your proposal of streaming cockpit voice recordings to some cloud server. There is some resistance to this (and to longer recordings in general) from air crew on privacy grounds. The privacy issue is less about how much personal info is revealed in a crash situation and more about how easy it would be for a bad actor in management —or whatever operations group runs the audio storage—to listen in on conversations. And you can be sure this would happen if something like your system were implemented without the appropriate regulatory controls (and tbh even with them it would probably still happen).
Got me curious how often this happened.
Last example I can find of a CVR being overwritten and not just exploded/missing was in 2018 for an engine fire, similar to this where the flight had to emergency land shortly after take-off. Before that...well a lot of complete failures ("not operative at time of flight") but not many like this scenario.
https://en.wikipedia.org/wiki/List_of_unrecovered_and_unusab...
And it also lists 17 more incidents where something happened in a flight and it took more than 2 hours to land so data from the incident was lost.
Not true.
https://arstechnica.com/information-technology/2022/10/starl...
https://cleantechnica.com/2022/07/06/fcc-approves-starlink-f...