Avionics software starts with writing comprehensive requirements. When the software itself is developed based on those requirements, it is then tested against the requirements, always in a real functioning airplane, but also often in smaller airplane-cockpit-like rigs and in purely simulated environments.
Nobody is going to write a requirement that says "this avionics subsystem will function without error forever". Even if you thought you could make it happen, you can't test it. So there are going to be boundaries. You might say that the subsystem will function for X days. What happens after that? It may well run just fine for X+1 days, or 2X days, or 100X days. But it's only required to run for X days, and it's only tested and certified for running for X days.
I could easily imagine that this particular subsystem was required and certified for some value of X <=51 days, and it just so happened that if the subsystem ran for over 51 days then it started to fail. Or, it could have been a genuine mistake.
But even if the intended X wasn't 51 days, there almost certainly was some intended, finite value for X. We might say, "well, my laptop has run for three years without needing a reboot". Great! Is that a guaranteed, repeatable state of operation that the FAA would certify? Probably not. And besides that, do we really want to have to endure a three-year verification test?
In most software, we are happy to say, "it should run indefinitely". For avionics software, that's insufficient. We instead say "it will run at least for some specific predetermined finite amount of time" and then back up that statement with certifiable evidence.
Also, uptime is a factor, I've seen what windows looks like when it runs out of GDI objects, it's strange. But once you see it, you can explain to the customer the importance of regular reboot/restarts.
I do not know if this particular 51-day limitation was intentional or no
I highly doubt it was intentional. Boeing's already had to issue an AD for similar behavior on the 787:https://www.engadget.com/2015-05-01-boeing-787-dreamliner-so...
If they knew about it there'd be no need for an AD. Boeing tried to become the aviation equivalent of a fabless chip designer with the 787 and it didn't go well at all. Turns out they had little-to-no experience managing external development and manufacturing teams. I don't know anything about the 51-day bug, but the 248-day bug caused critical failures that you really wouldn't want happening in flight.
These time limits could at least be pegged to real-life intervals to when the system is going to be shut down anyway. If the system continues to be operated past that point, skipped maintenance intervals could be underlined as the cause.
Not by testing, but by using formal methods.
Isn't it surprising that modulo arithmetic, as already employed successfully in TCP sequence numbers and the like, still seems to be incorrectly implemented today? What's more disappointing is seeing all the other incredible systemic complexity they've added, and yet the plane appears to have no mechanical backup instruments?
> and yet the plane appears to have no mechanical backup instruments[?]
This is unlikely in a modern aircraft because mechanical instruments to back up e.g., the artificial horizon / attitude indicator or directional gyro (DG) / heading indicator are:
1) Mechanically complex - the attitude indicator and DG make use of gyroscopes which rotate at up to 24,000 RPM along with other mechanisms. They are typically powered by vacuum or electric motors which consume relatively more power (or require vacuum lines and a vacuum pump)
2) Expensive to maintain - see (1) - they need to be serviced somewhat regularly
(3) Heavier than their solid-state counterparts
(4) Have [dramatically] different failure modes - instead of a display going dark, a DG will slowly drift as the gyroscope precesses, giving erroneous values. Same with the artificial horizon. This can lead to catastrophic results under instrument meteorologicalconditions (IMC) where the pilots rely solely on instruments to maintain essential things such as heading and level flight.
(5) Because of (4) they require additional redundancy to ensure instruments can be cross-checked with one another. This compounds (2) and (3)
"Glass" standby instruments come with significant upside and not much downside, which is why they've been preferred in larger/more expensive aircraft for a while. There is nothing inherently more or less reliable about them, being fully isolated and redundant just as old-timey mechanical backups are, and they offer a much richer presentation (typically like a small PFD). However, new things are usually more expensive, which IIUC is why they were adopted first in larger, more expensive aircraft. They were considered a luxury in GA until fairly recently.
It's just not a workable idea in general. There are checklists for stuff like instrument failure which can probably recover from a software bug like this.
It's sort of like how you don't need RAID for your offsite backup disks, just some parity for bit-rot.
The mechanical instruments would be the (additional) redundancy. The additional weight/lines/service is indeed burdensome even without redundant mechanical systems.
Even in TCP sequence numbers, it can be implemented incorrectly.
https://engineering.skroutz.gr/blog/uncovering-a-24-year-old...
https://www.cnet.com/culture/windows-may-crash-after-49-7-da...
Was it a cost issue?
Or was there an expectation that a regular maintenance check would occur within this time frame that involved a reboot as part of the maintenance check for diagnostics?