undefined | Better HN

0 pointskoeselitz13y ago0 comments

Yes, the purpose of RAID is not to fail silently; but it's hardware, and hardware can be flawed. That's just a fact. There are different levels of RAID precisely because there are different levels of redundancy - that is, different extents to which the possibility of failure is minimized. No hardware is flawless, though.

Please note that I have not said "it is likely to fail" or "you should expect that it will probably fail." I agree that it shouldn't be something that keeps a person up at night. But the simple fact is that, when data is important, you should prepare for that possibility (and others) by backing up. RAID does not solve all problems, and it is not guaranteed, as unlikely as failure might be.

Moreover - in saying that it simply isn't RAID if it ever fails silently, you're attempting to define away a nonsemantic problem. The point of a starter motor on a car is to start the engine. If the starter motor fails to start the engine, I guess I could make an Aristotelian argument that it has ceased to be a starter motor, or even perhaps that it was never a starter motor in the first place. But what practical good does that do anybody?

All hardware has the potential to fail. Yes, people should buy hardware that is less likely to fail. I'm pretty sure they already do that, though.

0 comments

its_so_on13y ago

Hi,

You might read this first:

http://news.ycombinator.com/item?id=4057912

you can reply to that as well here if you want.

I think we're in very general agreement. Although you yourself did not say "it is likely to fail" or "you should expect that it will fail", this is exactly the sentiment I was replying to was.

Regarding your "all hardware possibly failing" and the example of a starter motor to imply that I am trying to disappear a technical problem with a semantic argument, I think I am (especially in that cousin reply) being quite a bit more specific.

Basically, when it comes to safety mechanisms that exist as a layer on top of a process and aren't necessary at all, I simply shouldn't have to even think about reinventing another safety mechanism on top of the safety mechanism. Get one that isn't defective.

A hard-drive isn't defective just because it fails: it's expected to. A RAID controller is also expected to fail...JUST NOT SILENTLY.

In the seatbelt example: should you even think about having to tie your seatbelt to the buckle with sturdy rope, for real safety in case the seatbelt just doesn't buckle when it seems to, or comes undone like a ripped shirt button at the slightest firm tug?

No. You should get an actual seatbelt.

Basically, the standard you hold a control layer to is different from the standard you hold an underlying process to.

It would be like the difference between your brake failing and your (for added safety) handbreak failing, which you only engage on top of the motor's brake anyway. If the motor brake fails you would start rolling (if you're on a bit of an incline). But you shouldn't even have to think about a hand-brake 'just failing' in the same condition.

Sure it can fail if you are being towed without being lifted, or whatever, in an extreme situation. But in a normal situation?

Basically, it is a difference of both category/kind AND of degree.

I am certainly not saying that a parking brake can never fail. I am not saying a raid controller can never fail.

I am saying that both of these, when they are layers on top of a normal process, should be out of sight, below your threshold of having to control for it. If they're not, you need to get a different one.

You don't get six insurance policies against the same earthquake possibility, hoping that they won't ALL decide to out-lawyer you or go bankrupt. You get real insurance that's properly reinsured. Check up on them. Find a real one.

Raid failure is fine. Silent raid failure is not fine.

(checksum failure with an exception is fine; checksum failure with no exception, warning, or error, just a random checksum produced - or a check randomly passing when the checksum doesn't match the one you provided, is not okay. fix your checksum, get a real one - don't build another layer on top, for the cases that your checksum is a randomized print statement or your insurance policy a monthly donation from you to a non-charitable organization that puts aside a portion to out-lawyer you with if you try to make a claim, with the rest spent on advertising or being their profit. That's not an insurance policy, that's a scam.)

koeselitzOP13y ago

Yeah, I think we're in general agreement. With RAID that is the way RAID is supposed and expected to be, the chance of silent failure very, very small.

j / k navigate · click thread line to collapse