Secondly, the punishment meted out should be:
1. Proportional to the degree of carelessness (in this case not that much since he accidentally hit a wrong key adjacent to the right one, didn't mow down anybody while driving drunk)
2. Inversely proportional to the likelihood of the error (in this case the likelihood was very high since the reset key was a. uncovered/single-press b. right next to single reset key).
3. Proportional to intention (this was a completely unintentional error)
If you say, that the punishment should also be dependent on the degree of damage, I would say that the responsibility of managing the risk of such damage wasn't his but of the person responsible for implementing such a high risk design. If such a person is not around, find the person who approved such a design. Government departments are usually very good with paper trail.
So what did you do after you got fired from the embassy?
What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
So I guess it wasn't a big deal after all.Your number 2 is particularly wrong. For punishment to work in affecting behavior, you can't punish for a very unlikely event, especially accidental.
Punishment changes behavior by making people anxious and afraid of the punishment. If you punish something that's very unlikely, it does no good. It's like if pushing ctrl-F restarted the stations, and the guy has never pushed ctrl-F before, and never been punished for it. It's very unlikely that he would ever push ctrl-F. But he happens to trip getting up and accidentally hit ctrl-F while he's catching his balance. Does that warrant a heavier punishment? It's more unlikely, certainly, but what would the punishment change about his behavior?
Punishment works because you are afraid of it. It works because you want to avoid it. But accidents don't happen because of defiance or a rational decision making process.
If you were punished moderately and frequently for a common mistake like hitting F7, it could correct behavior because you would be more vigilant when hitting F6. Having it be proportional to the degree of carelessness in terms of correcting behavior is not important. If someone is more careless, they will get more frequent punishment. If the punishment is too strong, it will just make people fearful instead of correcting the behavior.
Firing someone who consistently makes mistakes is a corrective action, not punitive.
Punishment is generally more of a cultural thing and less of a means of correcting an issue. Punishment is expected, so it's delivered. In western culture we have a particular need to find someone responsible and punish them. Rarely though do you feel "I don't want to get punished, so I am going to do this right." but it's not uncommon to think "I don't want to get punished so I'll avoid this altogether."
Corrective behavior is better when it's not punitive. Look at the design of the software, correct that problem. Look at the systems that allowed this to happen, correct them. Work with the staff and find out why this could happen, help them correct it. If people are punished for writing the software poorly, they're just going to cover up the flaws that they find instead of bringing them to light to correct them. If staff are punished for making mistakes, they're going to hide them instead of seeing if they can fix them.
Punishment is often just a game to abdicate responsibility. "Oh, it wasn't my fault. It was his fault. The proof that it is his fault is that he got punished for it. I've done my part to solve this problem."
Especially in complex environments like corporations and government, I think that the last thing you should do is look for a person to blame. Instead of looking for the person responsible for implementing the design, or the person who approved it. Look at why it was implemented, how it was approved. Instead of pinning it on an individual, pin it on a system.
I think you should only look at an individual if they are committing malfeasance for the purpose of benefiting themselves outside of the system. If the person approved the design because they weren't aware of the potential risk, then find out why. If they approved it because there was supposed to be another safeguard to stop it from accidentally happening, find out why that wasn't there. If they approved it because they gave the contract to their friend who wasn't the best decision, and overlooked issues for a cut, then go ahead and blame them.
If there's a problem with the person, say the designer was just irreconcilably bad, then remove him. If it's a problem with training, then train him. If it was something he did as a greenhorn in the past, and now he's much better, then for God's sake don't punish him for a mistake he made years ago when he was put into a project that was more important than the skills he was hired with, unless he grossly lied about his skills.
I understand that we don't want a culture that fires people for making the sort of mistake anyone might make. But to be so careless on a day that is clearly an exception where something more important than standard business procedure is going on, can't you at least see why firing the intern for such a lack of mindfulness might at least make sense, even if you disagree with it?
I interned for a government organization that maintains hydroelectric dams and the software that controls them throughout the Southeastern US. A careless mistake could -- in the worst case -- cause blackouts, cost the company millions of dollars, or even cost lives (if the data-control feedback loop caused a turbine to spin up at the wrong time or to fail to shut off in an emergency). And, as is quite common in organizations with non-software-engineers running the show, the development processes were entirely haphazard. The environment was such that it would be really easy for me to push unreviewed code, or to make a stupid deployment mistake, or to be careless in a number of ways that the system didn't protect me against.
But it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day. If I demonstrated that I wasn't one of those people, I would fully expect to be fired.
This equivocates on "consequences" of actions, though. It's obvious that the consequences of hitting F7 before the incident were understood by all responsible to be low enough that any intern could be expected to make the right decision. After the incident, the consequences of hitting F7 were sharply increased such that no future intern would ever be allowed to make that decision. But then you can't make an argument that assumes "consequences" were the same at both points in time.
We make this fallacy all the time probably because we're designed by evolution to reassess the morality of an action based on consequences. It works as a social heuristic for shaming or rewarding people but it makes no rational sense that the morality of an action should retroactively change based on future consequences. You can see similar behavior in our rewarding athletes for profound genetic advantages, or punishing criminals for profound genetic deficits. The consequences somehow redeem or condemn, and they should do neither.
In my experience that is not nearly sufficient for implementing any process that can't tolerate errors. It is necessary to have conscientious people of course, but they still are humans. Given the opportunity for 2,000 hours a year, year after year, they will screw up.
Humans are very bad at following procedures. For recent examples, consider the people operating our nuclear missiles and those protecting our bomb-grade nuclear materials. If even they don't have enough motivation to follow procedure ...
In this situation, the secretary playing the game is just as culpable as the intern, which is to say, not really responsible.
In one case accidentally pressing the wrong key deleted incredibly important data, while in your case you have plenty of time to review and ensure quality at your leisure.
Count to ten article:
> I, naturally, felt terrible and was, appropriately, fired.
Honesty Wins article:
> But, naturally, that day was my last day of work at the American Embassy. But, not because I was fired; although, I might have been fired if that day didn’t just happen to be the last scheduled day of my summer internship.
It is simply that given the simplicity and consequences of the error, it is the type of thing I generally see people beating them selves up over until they can not forget it.
(In case you are saying to yourself but he did remember the key, look at the two versions in one he says F6 machine reboot F7 all reboot, and in the other he says F7 machine reboot F8 all reboot, indicating that while I hope he knew the keys functions then, he has since forgotten the exact key, or is substituting F keys for story telling purposes)
What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
Think about it: Unix is equally as "insane". If you're the guy on the console who meant to clean out some crap dir and accidentally typoed "rm -rf /" and then caused an international crisis you're going to get fired too.
Then years later HN will call for Dennis Ritchie to get fired instead.
I imagine that someone wanted someone's head, so whose head should it have been? They guy who wrote the system couldn't be fired, he was in a different company. And maybe a macro has been assigned to that key, so it wasn't his fault anyway.
lrwxrwxrwx 1 root root 20 Apr 27 17:02 cc -> /etc/alternatives/cc
Guess what happens when you paste a whole list of those into a console as root?
Far from perfect - it depended on the order of deletion - but a more general solution than preserve root. Of course it still requires the user to mark things they consider "important".
It is definitely a problem with Unix also.
you cannot fat finger rm -rf /
Responsibility flows upwards, not downwards. Its just unfortunate that the people at the bottom are often carrying the people above far more than they should...
An equally sufficient solution would have been to install a safety switch on any button with that much importance. Something like this but probably smaller, or just a plastic cover that fit over the F7 key: http://www.thinkgeek.com/product/15a5/
A fireable offense would be lying about the action or trying to cover it up.
Should the lady who asked for the reset be fired for playing a game and asking him to reset the computer?
Should the technician that didn't install some sort of safety be fired for not foreseeing this issue?
He would be much less likely to make the same mistake in the future than the person who would replace him.
If there are terminals that could erase a presidential report and there is no backup available, you send a non-critical staff member to guard every one of those terminals, or at least put a sticky-note in the middle of the monitor.
I'd say several other people deserved to be fired for this, but the intern was not one of them.
Also, whether or not it was appropriate is completely irrelevant to the story being told.
I have to routinely create and drop databases on my local system. Our production databases, which I also have to connect to, contain hundreds, maybe thousands, of person-months of work. I realized that it would be a good idea, before issuing DROP DATABASE commands, to deliberately stop and double-check what server I'm connected to. Luckily, I haven't screwed that one up yet.
Based solely on the shortened account, it was not appropriate at all to fire him. Convincing him that it was is just doubly inappropriate. There may be more to the story, but as it is, it looks like angry scapegoating against a hapless, lowest-level employee.
Japan, I guess? I've never been there but the story was consistent with my impression of their work culture.
Also, in Japanese companies, it's basically impossible to fire people. They can, however, be assigned to a desk in a windowless room and be given nothing to do for several years, until they take the hint and "voluntarily" quit.
Or the US navy crew who received medals after shooting down the Iranian airline.
Once there is loss of life, it is 100% politics afterwards with little to no practicality, just look at all the mass shootings where there were zero changes afterwards. We simply do not value life, it is politics first.
You make it seem like they received the medal for having shot down the plane. In reality, those who were awarded medals, were awarded Tour of Duty medals for their time spend in a combat zone. I believe the distinction is important, particularly since that class of medals are routinely awarded to individuals during their time in the military.
My answer would be no, you failed at your job regardless.
Same thing with military.
See John Allspaw's Swiss Cheese Theory : http://www.kitchensoap.com/2012/02/10/each-necessary-but-onl... .
[ Edit: I guess it's not Allspaw's model, but he applies it to systems engineering rather well - http://en.wikipedia.org/wiki/Swiss_cheese_model ]
"Accidents emerge from a confluence of conditions and occurrences that are usually associated with the pursuit of success, but in this combination—each necessary but only jointly sufficient—able to trigger failure instead."
The person who pushed the button is not at fault, the manager is not at fault, the guy who designed the button is not at fault - all are jointly responsible.
Blaming the intern does, however, reflect extremely poorly on Itoh and everyone else in the chain of command. A superior who demands retribution for a simple mistake that happened to cause him or her pain is basically worthless.
But, I forget, we're talking about Ronald Reagan.
No. This is something you would read in Design of Everyday Things where Don Norman would totally shame the the engineers who made that system. Software shouldn't be designed with the assumption that no one makes errors.
Why didn't the backups work? System wasn't "robust" enough. (Did I just use the word "robust"?)
I appreciate that the OP was a part of the situation, but conspiracy theories were not caused by this.
It was time of very high tension between the US and Soviet Union. So when a plane veers off the course into not just Soviet airspace, but into an explicitly cordoned off top secret area, ignores all communication attempts, ignores the presence of fighter jets and just keeps on flying, then the situation itself is a fertile soil for conspiracy theories.
Through the glass of a yellow newspaper box, the Miami News headline that the Soviets had shot down a plane carrying a Congressman. My first thought was "This is the war." Not 'a' but 'the'. The primary stance of the US military was squared off against the USSR and had been for more than 30 years.
Incidentally, "features" like this are why I don't trust systems that have some centralised control - IMHO giving any one individual (or organisation, in many cases these days) such power over others is not a good thing.
"Not long after I arrived in my office, I received a call from a secretary in the Agriculture Department who liked to play a computer game before her workday started. Her favorite game had a bug that regularly froze her workstation. [...] I realized that I had mistakenly hit F7 and reset all the workstations in the embassy. This realization didn’t bother me much, because no one except the Agriculture section secretary was usually on the computer system this early in the morning."
I'm sure I'd have thought something like: "Phew! Glad I made that mistake now, rather than at 11am when everyone was half-way through their morning's work. Likely no harm done at all, and I'm going to be really careful with that command in the future. Yup, definitely dodged a bullet there..."
Why didn't Reagan respond immediately? Well, he was waiting to hear from Chancellor Gorkon that the KAL flight had been successfully beamed aboard and was en route to Pluto, of course... Clearly they'd have their shit together better so it couldn't have been a 23-year old rebooting all the computers accidentally and wiping out hours of critical work -- that would just be ridiculous...
It's comforting to think that people can control the direction of every choice in the world, and that someone is at the helm.
It's uncomfortable to think about the daily series of random, unconnected decisions that drive the direction of our species.
If I had made the same mistake twice without any attempts to fix the situation long term, then, yes, I think that would have been a fire-able offense.
If you're working with people who care primarily about their own positions and egos without regard to the team as a whole, well, be prepared to be thrown under the bus when it comes time for those people to protect themselves.
Ugh.
Those with automation capabilities: keep this lesson in mind, because it will happen to you in production one day. 'dsh -a reboot' is incredibly easy to type and can have disastrous effects. Creating abstraction layers around common admin tasks can help catch simple mistakes and give prompts before dangerous behavior.
... incompetence like that comes from having F6 next to F7 and no checks or authorisation needed for a potentially dangerous action etc. Processes should be designed for people to make the common mistakes... its what they do.
hmmm, I am pretty sure Mr. Itoh was not Japanese working in the American embassy. I am pretty sure he was American.
Fire the idiot who wrote that function.
"I know what I'm doing when I hit F7, but the damn system makes me sit there for 30 seconds before it does what I told it to do! Piece of junk."
The result was that software in that era tended to come with a lot more sharp edges. The age of the Recycle Bin that would save you from yourself didn't arrive until administering systems became something the general public was expected to do.
I recall that time I wrote a batch manager for the VAX 11/780 at Caltech High Energy Physics. It consisted of a program to monitor the batch queue and start jobs as scheduled ("BATch MANager", or "BATMAN"), and a program for users to submit jobs ("Run Overnight Batch INput", or "ROBIN").
The configuration file for BATMAN was stored in /etc/batman.
During development, I occasionally had to "rm /etc/batman". Of course, out of habit, as soon as I typed "/etc/" my fingers would automatically type "passwd", and once I did not catch this in time. Oops. It happened to be a Sunday morning at around 7AM, and I had to call the other admin, who handled backups, to come in and restore that file. He was annoyed.
The second time I did this, he was pretty pissed.
The third time, I fortunately had been working at the terminal we had in the machine room, and managed to shut down power to the machine before the write buffers were flushed, and the file was OK after fsck. I didn't have to deal with an angry co-admininstrator that time. Just angry physicists.
The other admin (Norman Wilson, in case anyone knows him or he reads HN) then made a link named /etc/safe_from_tzs to /etc/passwd to stop my nonsense once and for all.
That worked until the first time I wanted to overwrite /etc/batman instead of rm it.
That led to a cron job that maintained a copy of /etc/passwd in a separate file, and periodically checked to see if it were missing or misformatted, and restored it if so.
No amount of training can prevent something like this. It's like today's browsers where the tab can be closed with ctrl+w and the whole window with ctrl+q. It doesn't matter how many times you've done it and how used are you to the position of the 'w'. One day you will close the whole window by accident.
The power switch on the IBM PCs were way at the back so that people couldn't unintentionally reset the computer. The same thinking went into Ctrl-Alt-Del, which was a combination that people wouldn't accidentally hit.
So having a system where F7 would reboot the entire system was pretty dumb, even in the early 80s.
As in: I hit F6, "Do you want to reboot this?" dialog pops, I hit 'Y', "Do you really really want to reboot this?", I hit 'Y' again.
Instead of actually reading what it says, you just instead press F6-Y-Y in quick succession.
Modern interfaces sometimes make you type some kind of string to confirm, but most either use a password (like sudo) or some hardcoded string that everyone eventually memorizes.
But even today, Windows 7 only makes you click that one button in UAC, and most people probably do it without even thinking about it.
I wonder if he had a supervisor sysadmin that he was working under, but given how he described his boss, that seems unlikely as well.
If something has changed over these years, it's the overall understanding of design principles and their popularization. (Thanks, Don Norman and other people in the field!)
Training has nothing to do with it. Even if you can train a person to work with a badly designed system without making mistakes (often), designing the system well in the first place is almost always significantly easier and cheaper.
For example, accidental key presses can be easily prevented by requiring the user to type a command of reasonable length. Typing "reboot-all-workstations" is not that difficult, but it would definitely prevent the incident described in the article.
"take a deep breath, count to ten"
Doesn't answer "who" put the button in there, I'm just thinking out-loud! :)
Or just the old rm -rf /
Older stuff asks for much less confirmations.
1. we know a heluva lot more about human factors design
2. we have a heluva lot more excess computer power that can be devoted to human factors
Remember that warning functions are easy to ignore. Never use a warning when you mean undo.