Zeroing buffers is insufficient (opens in new tab)

(daemonology.net)

299 pointsMartinodF11y ago130 comments

130 comments

Part 2 is correct in that trying to zero memory to "cover your tracks" is an indication that You're Doing It Wrong, but I disagree that this is a language issue.

Even if you hand-wrote some assembly, carefully managing where data is stored, wiping registers after use, you still end up information leakage. Typically the CPU cache hierarchy is going to end up with some copies of keys and plaintext. You know that? OK, then did you know that typically a "cache invalidate" operation doesn't actually zero its data SRAMs, and just resets the tag SRAMs? There are instructions on most platforms to read these back (if you're at the right privilege level). Timing attacks are also possible unless you hand-wrote that assembly knowing exactly which platform it's going to run on. Intel et al have a habit of making things like multiply-add have a "fast path" depending on the input values, so you end up leaking the magnitude of inputs.

Leaving aside timing attacks (which are just an algorithm and instruction selection problem), the right solution is isolation. Often people go for physical isolation: hardware security modules (HSMs). A much less expensive solution is sandboxing: stick these functions in their own process, with a thin channel of communication. If you want to blow away all its state, then wipe every page that was allocated to it.

Trying to tackle this without platform support is futile. Even if you have language support. I've always frowned at attempts to make userland crypto libraries "cover their tracks" because it's an attempt to protect a process from itself. That engineering effort would have been better spent making some actual, hardware supported separation, such as process isolation.

acqq11y ago

The "right privilege level" allows you to see anything that happens during the execution of the lower privilege levels. I can even single-step your application with the right privilege level. So the crypto services have to run at the high privilege level and ideally your applications should leave even the key management to the "higher privilege levels." That way attacking the application can leak the data, but not the key, that is, you can still have the "perfect forward secrecy" from the point of the view of the application. So you have to trust the OS and the hardware and implement all the tricky things on that level. Trying to solve anything like that on the language level doesn't seem to be the right direction of the attacking the problem.

tgflynn11y ago

So is it correct to say that if a process does not want to leak information to other processes with different user ID's running under the same kernel that a necessary (but not necessarily sufficient, due to things like timing attacks) condition is for it to ensure that any allocated memory is zero'd before being free'd ?

I wonder if current VM implementations are doing this systematically.

It seems like a kernel API to request "secure" memory and then have the kernel ensure zeroing would be useful. Without this I'm wondering if it's even possible for a process to ensure that physical memory is zero'd, since it can only work with virtual memory.

pslam11y ago

All kernels I know of zero all memory they hand over to user processes. It's been part of basic security for quite some time - exactly for this kind of thing. It's usually done on allocation, not free - it doesn't really matter which way around, but doing it "lazily" can often be better performance.

tgflynn11y ago

In that case your original comment looks like the way to go and should make pretty much everything else in this thread moot.

It seems like the key though is ensuring that your environment uses distinct non-root users for all security relevant processes so that a security bug in one process doesn't allow the attacker to gain access to others.

EDIT: On second thought there may be some advantage to effectively zeroing memory for security critical data within a process but the likely value add seems low to me. Once a process has been hacked it seems pretty unlikely that you can hope to control what information it leaks.

1 more reply

willvarfar11y ago

Excellent point! I really hope such a sensible suggestion is added to mainstream compilers asap and blessed in future standards.

Apologies to everyone suffering Mill fatigue, but we've tried to address this not at a language level but a machine level.

As mitigation, we have a stack whose rubble you cannot browse, and no ... No registers!

But the real strong security comes from the Mill's strong memory protection.

It is cheap and easy to create isolated protection silos - we call them "turfs" - so you can tightly control the access between components. E.g. you can cheaply handle encryption in a turf that has the secrets it needs, whilst handling each client in a dedicated sandbox turf of its own that can only ask the encryption turf to encrypt/decrypt buffers, not access any of that turf's secrets.

More in this talk http://millcomputing.com/docs/security/ and others on same site.

PhantomGremlin11y ago

> we have a stack whose rubble you cannot browse, and no ... No registers!

Wow. There's the Wheel of Reincarnation [1] in action. The Intel iAPX 432 microprocessor had similar ideas.[2] E.g. no programmer visible general purpose registers, "capability-based addressing" to control access to memory.

That was a mere 30+ years ago. Let's hope you're more successful than they were.

[1] http://www.catb.org/jargon/html/W/wheel-of-reincarnation.htm... [2] http://en.wikipedia.org/wiki/IAPX432#Object-oriented_memory_...

ams611011y ago

If I'm understanding the idea, this reminds me of the processor in my first computer, the TMS9900 used in the TI-99/4[a] computers.

This processor didn't have general purpose registers, it had "workspaces" in RAM that served as register sets. The processor had a workspace pointer register that pointed to the workspace currently in use. This was cool because it meant that a context switch could be achieved by just changing the workspace pointer. However the downside is that RAM access is slower than register access.

ChuckMcM11y ago

They were cool. That processor is fairly maligned in my experience but it had some solid ideas. I built a CPU in an FPGA that used a construction similar to a cache line to hold registers as an experiment once. The one line (64 bytes) was 'reserved' for 16 registers (8 GP, 4 index, and 4 process specific (SP, PC, STATUS, MODE)), a context switch reloaded the registers as it reloaded the cache. That made context switching a bit faster than a full on push/pop stream but not as fast as having dedicated register banks ala SPARC.

These last two posts by Colin though really got me thinking about the who push for a 'trusted computing base' that folks had back in the 90s. Basically the same argument was used then to justify specific hardware as part of the system for implementing the crypto bits. At the time I thought it was overkill but I can see now how such a system can contain implementation faults into more detectable domains.

cperciva11y ago

Just wondering, have you talked to the CHERI people? It sounds like there is a lot of commonality of interests there.

willvarfar11y ago

I've been following them but we haven't talked. Yet :)

polarix11y ago

What's the current status of the Mill project? Is there a proof of concept compiler / emulator? What's the bootstrap strategy to get things rolling?

Symmetry11y ago

Last I heard they're still limited to sims and are concentrating on patent filings. The bootstrap strategy is LLVM (once they get around LLVM assuming addresses are integers as opposed to the Mill's compound things) and to get Linux running on top of L4 which seems doable[1]. They say they're looking for a niche to start in before going after PCs.

[1]http://l4linux.org/

xorcist11y ago

Why on L4? Is Mill somehow tied to it, architecture-wise? Or is it just that L4 has a smaller footprint and is easier to port?

2 more replies

willvarfar11y ago

We are hard at work :)

There is no public SDK yet, and hardware is also under development.

We've had a simulator for a long time, and we show it off a bit in the Specification talk:

http://millcomputing.com/docs/specification/

AlyssaRowan11y ago

It's becoming gradually more tempting to write a crypto library in assembly language, because at least then, it says exactly what it's doing.

Alas, microcode, and unreadability, and the difficulty of going from a provably correct kind of implementation all the way down to bare metal by hand.

The proposed compiler extension, however, makes sense to me. Let's get it added to LLVM & GCC?

ctz11y ago

That works for well-defined ISAs (like ARM), but not for those with undocumented pipelines, or instructions defined by practise (like x86 and amd64).

In other words, if you write a crypto library in x86 assembler, Intel don't guarantee that they won't introduce a side channel in their next chip model or stepping.

AlyssaRowan11y ago

Sadly, I know that only too well: hence my "alas, microcode" comment! A prefix or mode or something which allows code to handle secure data and it gets constant-time multipliers, for example, or true µop-level register zeroisation, would be handy, but also close to unverifiable - we just have to sort of trust it, which sucks.

Until then, we do the best we can with turtles all the way down. Software running under that same undocumented pipeline is going to find it very hard to access or leak (accidentally or otherwise) internal registers, at least.

For the other avenue of attack (cold-boot attacks), it's also notable that registers, at least, have extremely fast remanence compared to cache, or DRAM - bit-fade is a very complex process, but broadly speaking, faster memory usually fades faster.

Digression along that vein: I basically pulled off a cold-boot attack on my Atari 520 STe in the early 1990s (due to my wanting to, ahem, 'debug' a pesky piece of software that played shenanigans with the reset vector and debug interrupts), with Servisol freezer spray pointed directly at the SIMMs in my Xtra-RAM Deluxe RAM expansion (and no, cold-boot attacks are not new, GCHQ's known about them for at least 3 decades and change under the peculiarly-descriptive cover name NONSTOP, I believe?). It just seemed sensible to me: cold things move slower, and they had a particularly long (and very pretty) remanence - I was able to get plenty of data intact, including finding where I needed to jump to avoid the offending routine and continue my analysis with a saner technique (i.e. one that didn't make me worry about blowing up the power supply or electrocuting myself)! It's harder these days - faster memory - but the technique incredibly still works and was independently rediscovered as such more recently: very much a "wait, this still works on modern RAM?" moment for me. (By the way, when I accidentally pulled out the SIMMs with the internal RAM disabled - whoops - and rebooted the Atari on my first try, it actually powered up with an effect that I can only describe as "pretty rainbow noise with double-height scrolling bombs" that would not have looked out of place in a demoscreen! I don't know if that was just mine, but... the ROM probably never expected to find RAM not working, and I guess the error-plotting routine had a very pretty and unusual error in that event?)

I've never seen or heard of anyone pulling off a NONSTOP on a register in a CPU, or actually even on an L1, L2 or L3 cache (maybe an L2 or L3 might be possible, depending on design?). They're fast - ns->µs remanence? - and cooling doesn't help much. I don't know if it's possible at all, but I'd tentatively suggest that it might be beyond practical attack - unless the attacker has decapped the processor and it's already in their lab (in which case you're fucked, no matter what!). That's what suggests that approaches like TRESOR (abuse spare CPU debug registers to store encryption key; use that key to encrypt keys in RAM), despite being diabolical hacks, actually work.

If you fancy giving it a try in the wild by the way, I think a Raspberry Pi might be a good modern test subject - the RAM's exposed on top of the SoC, there are no access problems, and it's cheap so if it dies for science, it's not such a problem. (Of course, you'd want probably to want to change bootcode.bin so that it dumps the RAM after it enables it but before it clears it.) The VideoCore IV is kind of a beast - and is frustratingly close to being able to do ChaCha20 extraordinarily efficiently, if I can just figure out how to access the diagonal vectors... or if I can, or if I can fake it.

cesarb11y ago

If I wanted to read CPU registers from the outside, there's an easy way: JTAG. You should be able to halt the CPU, read (and modify!) the registers, and resume the CPU.

That should be possible even on x86, though on x86 the relevant documentation is probably hard to find. For some ARM processors, it should be as easy as installing openocd.

Of course, JTAG requires physical access to plug the debugging cable, which puts it in a different category of attack.

1 more reply

Symmetry11y ago

There's a fundamental difference between your main memory and your L3 in that the former is DRAM and the later is SRAM. In DRAM you have a charge hidden in a well behind a single transistor and it's designed to be stable for a while (the refresh interval) without anybody doing anything to it. SRAM doesn't have that static component at all, it's a set of 6 or 8 transistors which have two stable configurations when powered and which lose their state as fast as all the other logic in your chip as soon as the power is cut.

You can play with the temperature if you want, but the mechanisms that prevent unauthorized access in normal conditions will have their lifetimes extended or decreased as much as you change the lifetime of the data you're trying to access. And liquid nitrogen temperatures at least tend to make everything happen faster in CMOS circuitry. That's governed by a complex interaction between the effect of temperature on carrier density and carrier mobility, so I'm not sure that you couldn't slow things down with, say, liquid helium, but even then I'm not sure you're buying anything.

1 more reply

dmm11y ago

Another good reason to write crypto in assembly is to ensure that the implementation is not susceptible to timing attacks. If your code has different code paths that take different amounts of clock time attackers can use that. This can be difficult to achieve in a high level language.

xxs11y ago

Using assembly won't preclude timing attacks vulnerability, esp on x64. Nowadays beating even the C compiler performance wise is exceeding difficult with hand written assembly.

pix6411y ago

The point isn't to be faster, it is to be consistent.

1 more reply

d0mine11y ago

David Beazley after analyzing 1.5 Tbytes of C++ code shows in "Some Lessons Learned": C++ -- SUCKS, Assembly code -- ROCKS http://www.youtube.com/watch?v=RZ4Sn-Y7AP8#t=2049

geertj11y ago

This is what djb is doing using his "qhasm" assembly like language. He seems to be doing it for performance though, not to work around too aggressive compilers.

As an alternative, maybe write crypto algorithms in LLVM IR?

floody-berry11y ago

Adding an annotation for qhasm where stack variables/registers would be zero'd at the end of the function if they still contained sensitive data would be great.

What I'd really like to see is qhasm put on github along with the syntax files he or others create. q files aren't really useful without the syntax files they were made for, and without a central repo, custom made syntaxes will be a mis-mash of random decisions and instructions.

cesarb11y ago

For AESNI, you probably are already using some sort of assembly to call the instructions. In the same assembly, you could wipe the key and plaintext as the last step.

For the stack, if you can guess how large the function's stack allocation can be (shouldn't be too hard for most functions), you could after returning from it call a separate assembly function which allocates a larger stack frame and wipes it (don't forget about the redzone too!). IIRC, openssl tries to do that, using an horrible-looking piece of voodoo code.

For the registers, the same stack-wiping function could also zero all the ones the ABI says a called function can overwrite. The others, if used at all by the cryptographic function, have already been restored before returning to the caller.

Yes, it's not completely portable due to the tiny amount of assembly; but the usefulness of portable code comes not from it being 100% portable, but from reducing the amount of machine- and compiler-specific code to a minimum. Write one stack- and register-wipe function in assembly, one "memset and I mean it" function using either inline assembly or a separate assembly file, and the rest of your code doesn't have to change at all when porting to a new system.

kabdib11y ago

I don't think this can be a language feature. It's more a platform thing: Why is keeping key material around on a stack or in extra CPU registers a security risk? It's because someone has access to the hardware you're running on. (Note that the plain-text is just as leaky as the key material. Yike!)

So stop doing that. Have a low-level system service (e.g., a hypervisor with well-defined isolation) do your crypto operations. Physically isolate the machines that need to do this, and carefully control their communication to other machines (PCI requires this for credit card processing, btw). Do end-to-end encryption of things like card numbers, at the point of entry by the user, and use short lifetime keys in environments you don't control very well.

The problem is much, much wider than a compiler extension.

clarry11y ago

Sensitive information doesn't exist in a vacuum. What we need to protect is more than some keys that can be carefully loaded onto a crypto processor hiding in a secure bunker. Yes, users should have security too. The point of entry matters too.

So how do you get that isolated box onto everyone's computer and phone? How do you move these users' sensitive information onto that isolated box without leaving a trace on their non-isolated computer? How do you move their keys around?

When you use two systems to process sensitive information, you have at least two problems to solve...

userbinator11y ago

This is also why dedicated cryptoprocessors exist, with special features for attack resistance; I'm not completely certain about this, but I'd think the software running on those does not have to zero memory containing keys, because the whole environment that said software runs in has been secured from the outside already, and if it's possible to read any memory or run untrusted code from outside on those without being detected, then there are far bigger problems to worry about...

AlyssaRowan11y ago

Having seen a few existing designs of those, up-close and personal - actually they do have to worry about zeroisation, quite an awful lot.

And sometimes they don't worry enough either. They ought to fail a FIPS-style audit for that. But, well... they ought not to contain proprietary LFSR "crypto" algorithms, either. They are not as well audited, or as publicly designed, as they ought to be: many are as black-box closed-source as they could possibly be.

They tend to be based on extraordinarily old architectures with new bits glued on - think Intel 8051, that kind of era. If you're really lucky you might get an ARM, or at least a Thumb. People making them are notoriously hyper-conservative (most don't support ECC yet, and many don't even go above RSA-2048 or SHA-1 without going to firmware), and minimise any changes, perhaps for cost reasons, the effects of which are not always positive (actually, CFRG are discussing that general area right now in the context of side-channnel defences for elliptic-curve crypto).

So, how would you think that environment translates to writing secure firmware, or designing secure, state-of-the-art hardware? ;-)

lazyjones11y ago

Are current GPUs suitable subsystems for running properly isolated cryptographic algorithms? If not, why not? If yes, perhaps a well-audited open source library would be possible.

1 more reply

dmm11y ago

Remember this the next time someone says "C is basically portable assembler." It's not precisely because you can do many things in assembly that you can't directly do in c such as directly manipulate the stack and absolutely control storage locations.

pbsd11y ago

> For encryption operations these aren't catastrophic things to leak — the final block of output is ciphertext, and the final AES round key, while theoretically dangerous, is not enough on its own to permit an attack on AES

This is incorrect. The AES key schedule is bijective, which makes recovering the last round key as dangerous as recovering the first.

cperciva11y ago

Oops, quite right. I was looking at the "mix and xor" and my brain jumped to "oh, this is the standard hash idiom" and I completely missed the fact that the word being xored is not the word being mixed...

tptacek11y ago

How hard is that attack to code? I have a hard time imagining a case where a target leaks just a subkey, so this is one of those things I knew "about" but not "how".

cperciva11y ago

Dead simple. 2nd year undergraduate programming assignment.

tptacek11y ago

Is it perhaps so simple that... Colin Percival could effectively describe how to do it in an HN comment, perhaps even challenging someone like Thomas Ptacek to code it up and publish it instead of just yakking on HN like he always does I hate him so much?

1 more reply

pbsd11y ago

It's pretty straightforward to just iterate the key schedule backwards using the inverse S-box and a few xors; no need for any fancy stuff.

p4bl011y ago

cperciva already answered, so I'll just add that most side-channel attacks (at least those using power analysis) on AES typically focus on the last round.

tptacek11y ago

A-ha. That makes a lot of sense. Thank you!

nly11y ago

Anything sent over HTTP(S), such as your credit card numbers and passwords, likely already passes through generic HTTP processing code which doesn't securely erase anything (for sure if you're using separate SSL termination). Anything processed in an interpreted or memory safe language puts secure erasure outside of your reach entirely.

Afaict there's no generic solution to these problems. 99.9% of what these code paths handle is just non-sensitive, so applying some kind of "secure tag" to them is just unworkable, and they're easily used without knowing it... it only takes one ancillary library to touch your data.

Taek11y ago

Some of this can be addressed by never giving sensitive data to remote servers. This wouldn't work for credit cards, but with Bitcoin you never need to let a non-bitcoin library touch your private key, because that's not going over https.

Similarly, if you encrypt all of your information from within a safe library before handing it out to unsafe libraries, they can't leak anything. This can add overhead and redundant encryption (and you still need to trust that the remote server processing your data is safe), but there are steps you can take to be more safe.

Someone11y ago

"As with "anonymous" temporary space allocated on the stack, there is no way to sanitize the complete CPU register set from within portable C code"

I don't know enough of modern hardware, but on CPUs with register renaming, is that even possible from assembly?

I am thinking of the case where the CPU, instead of clearing register X in process P, renames another register to X and clears it.

After that, program Q might get back the old value of register X in program P by XOR-ing another register with some value (or just by reading it, but that might be a different case (I know little of hardware specifics)), if the CPU decide to reuse the bits used to store the value of register X in P.

Even if that isn't the case, clearing registers still is fairly difficult in multi-core systems. A thread might move between CPUs between the time it writes X and the time it clears it. That is less risky, as the context switch will overwrite most state, but, for example, floating point register state may not be restored if a process hasn't used floating point instructions yet.

zAy0LfpBZLC8mAC11y ago

Register renaming doesn't work like that. How could register contents of a process changing randomly even be usable for anything? Register renaming is about dynamically mapping a small number of ISA register names to a larger number of hardware registers to increase parallelism, but the whole reason for the exercise is that those additional registers don't have ISA names, so you obviously can't read them explicitly, at least not as part of the normal instruction set, who knows what backdoors some CPUs might have ...

Tuna-Fish11y ago

Once a rename register is garbage collected, it's flagged as "not ready" which is a state in which any instruction attempting to read it will block. They can only be scheduled once it's been written to.

xxs11y ago

Register renaming is transparent (aside performance) even to assembly. Multi-core system are irrelevant as each core has the same set of registers and registers are not (visibly) shared amongst cores.

ggchappell11y ago

This article makes a good point, but I think the problem is even worse than he describes.

Computer programs of all kinds are being executed on top of increasingly complicated abstractions. E.g., once upon a time, memory was memory; today it is an abstraction. The proposed attribute seems workable if you compile and execute a C program in the "normal" way. But what if, say, you compile C into asm.js?

Saying, "So don't do that" doesn't cut it. In not too many years I might compile my OS and run the result on some cloud instance sitting on top of who-knows-what abstraction written in who-knows-what language. Then someone downloads a carefully constructed security-related program and runs it on that OS. And this proposed ironclad security attribute becomes meaningless.

So I'm thinking we need to do better. But I don't know how that might happen.

danielweber11y ago

You remind me why it's so hard to do secure deletion: there are a bunch of abstractions built on old assumptions that no one cares about secure deletion. If you forget your pointer to that memory, it can be reused, so it's effectively deleted, we're all good, right? Meanwhile, the file you "sync"ed to disk might be synced to a network drive or flash memory or a zillion cache layers.

I think we need, right at the base metal, a way of saying "this data needs to not be copied" and/or "if you do copy it you must remember all copy locations so we can sanitize them all." And then we require every abstraction on up to have a way of maintaining this, the same way all the abstractions are required to, say, let us read data.

Or I guess this is part of what HSMs are supposed to do -- do all your "secure" work in something that is very strictly controlled.

Nursie11y ago

And if I run my C program in an emulator that allows me to freeze it and dump memory I can do this stuff too...

The point is, if you want security you need to look at the whole system and in the situation you describe you can't guarantee it, no.

I'm not going to say "So don't do that", but I am going to say "If you're going to do things like that, please realise that the assumptions the system security was built on no longer hold true".

I think to do it better we just need to pay a bit more attention. And try not to let ourselves get into situations (cough heartbleed cough) where memory zeroing is actually an important feature. IE - by the time the attacker is able to read your process memory you're probably already screwed.

eru11y ago

You could try for an abstraction / language that provides deterministic execution.

anon411y ago

If I have enough control to the point where I can read your memory in some way, I can just use ptrace. Heck, I could attach a debugger. It seems ludicrous to want that level of protection out of a normal program running on Mac/Win/Linux.

Now, if your decryption hardware was an actual separate box, where the user inserts their keys via some mechanism and you can't run any software on it, but simply say "please decrypt this data with key X", then we'd be on to something. It could be just a small SoC which plugs into your USB port.

Or you could have a special crypto machine kept completely unconnected to anything, in a Faraday cage. You take the encrypted data, you enter your key in the machine, you enter the data and you copy the decrypted data back. No chance of keys leaking in any way.

Nursie11y ago

One of the other things you're sorta-describing is an HSM.

These are dedicated boxes that just do crypto. You keep them on the network or attached via a serial port or... whatever. Accessible to your machines but not the outside world. Then you send them messages to ask them to encrypt and decrypt data for you. That way the keys never leave the box. The HSM doesn't accept new software, nor does it ever expose the keys to anyone.

They are, however, quite expensive.

nathan711y ago

What you're describing is called a smartcard, and readily available on the market. I keep my PGP key on one.

hollerith11y ago

Does your PGP key stay on the smartcard or is a copy of it transferred to your computer on occasion?

nathan711y ago

The key can be generated on the smartcard, and it's not possible to transfer it out of the smartcard by design. (anything that calls itself a smartcard but allows this isn't a smartcard)

Nursie11y ago

If it's a properly designed smartcard system then the key never leaves the card.

Chiba-City11y ago

Please, assembly is OK. It's not even magic or special wizardry. My dad programmed and maintained insurance industry applications in assembly side by side with many other normal office workers for decades. Assembly is OK.

pinkyand11y ago

Assembly is bad for auditability, which is important in crypto to prevent subtle errors.

AlyssaRowan11y ago

As a seasoned and experienced reverse-engineer myself, I'm (genuinely) curious where you got that impression. Do you find it unapproachable?

Assembly is the simplest language you can write a computer program in, for a certain very textbook definition of "simple" - it's just that you actually have to do everything by hand that you normally wouldn't. And yes, that can be a pain in the ass, and yes, you do have to watch out for not seeing the wood for the trees - but one thing it most definitely is, is auditable.

Bearing in mind, say, the utter briar-patch that is OpenSSL: a crufty intractably complex library written in a high-level language with myriad compiler bug workarounds, compatibility kludges and where - despite it being open source, and "many eyes making bugs shallow" - few eyes ever actually looked, or saw, or wanted to see, and when attention was finally paid to it, it was found wanting... might not assembly be perhaps better for a compact, high-assurance crypto library? Radical, I know, but perhaps an approach that's worthy of consideration.

I understand you may well be more familiar with high-level languages, and I don't know if you're confident about your ability to audit that - but I must point out, if you're auditing it from source, you're trusting the compiler to faithfully translate it. So to actually audit the code, you need to include the compiler in that audit. Compilers have (lots of) bugs and oversights too (lots of OpenSSL cruft is compiler bug workarounds, it seems?): as the article points out, existing compilers just weren't really designed to accommodate writing secure code.

Meanwhile an assembler makes a direct translation from source assembly to object machine code - that is deterministic (a perniciously-hard process with compilers) and much more easily, and automatically, auditable and indeed directly reversible.

To be clear, I'm not suggesting we replace, say, libsodium with something written in assembly language tomorrow! There are good high-level language implementations. And inline assembly is already used in some places for certain functions, including this exact one (zeroing memory), to try to minimise the compiler second-guessing us. But as the article points out, that approach only takes us so far, and it's something we need to be guarded against when trying to write secure code.

tedunangst11y ago

The briar patch of OpenSSL is more in the high level protocol code, and not the asm crypto (the perl obfuscation layer makes it fun, but isn't a major source of bugs). I would not want to write a robust asn.1 parser in assembly. Lots of other cruft works around the presence or absence of various #define values in header files. Rewriting in assembly is not going to solve the problem of deciding how big socklen_t is.

1 more reply

Symmetry11y ago

If you restrict yourself to super simple things like zeroing chunks of memory and the unused stack then it still might be acceptable.

xxs11y ago

Assembly is not really portable and error prone. I don't consider it anywhere wizardry (or hard) but corner cases are hard in C and in assembly even harder.

vidarh11y ago

In the cases where the alternative is fighting the compiler every step of the way, assembly may very well be less error prone as long as you limit it to exactly the small areas where it will help.

cheez11y ago

The suggestion has the right idea, but the wrong implementation. The developer should be able to mark certain data as "secure" so the security of the data travels along the type system.

Botan, for example, has something called a "SecureVector" which I have never actually verified as being secure, but it's the same idea.

cperciva11y ago

This was my initial idea, but talking to compiler developers convinced me that the dataflow analysis needed for this would be tricky. They were much happier with the idea of a block-scope annotation.

cheez11y ago

Similar data-flow analysis techniques as volatile.

delinka11y ago

Why are there no suggestions to change processors accordingly? Intel should be considering changing the behavior of its encryption instructions to clear state when an operation is complete or at the request of software. Come to think of it, every CPU designer should be considering an instruction to clear the specified state (register set A, register set B) when requested by software. Then, the compiler can effectively support SECURE attributed variables, functions, or parameters without needing to stuff the pipleline with some kind of sanitizing code.

tedunangst11y ago

You can clear the CPU state. But how is the CPU to know when it's safe to clear unless the software tells it?

db99999911y ago

Try:

  #include <string.h>

  void bar(void *s, size_t count)
  {
        memset(s, 0, count);
        __asm__ ("" : "=r" (s) : "0" (s));
  }

  int main(void)
  {
        char foo[128];
        bar(foo, sizeof(foo));
        return 0;
  }

  gcc -O2 -o foo foo.c -g
  gdb ./foo
  ...
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00000000004003d0 <+0>:	sub    $0x88,%rsp
   0x00000000004003d7 <+7>:	mov    $0x80,%esi
   0x00000000004003dc <+12>:	mov    %rsp,%rdi
   0x00000000004003df <+15>:	callq  0x400500 <bar>
   0x00000000004003e4 <+20>:	xor    %eax,%eax
   0x00000000004003e6 <+22>:	add    $0x88,%rsp
   0x00000000004003ed <+29>:	retq   
  End of assembler dump.
  (gdb) disassemble bar
  Dump of assembler code for function bar:
   0x0000000000400500 <+0>:	sub    $0x8,%rsp
   0x0000000000400504 <+4>:	mov    %rsi,%rdx
   0x0000000000400507 <+7>:	xor    %esi,%esi
   0x0000000000400509 <+9>:	callq  0x4003b0 <memset@plt>
   0x000000000040050e <+14>:	add    $0x8,%rsp
   0x0000000000400512 <+18>:	retq   
  End of assembler dump.

jedisct111y ago

That should be __asm__ __volatile__ but extended inline asm, even with no actual opcode (even if "nop" would work pretty much everywhere) is not portable. So at this point, you might just use clang/gcc/icc pragmas instead.

erik12311y ago

It very much looks like a situation in which the system has already been compromised and is running malicious programs that it shouldn't. These malicious programs could still face the hurdle of being held at bay by the permission system that prevents them from reading your key file.

However, they could indeed be able to circumvent the permission system by figuring out what sensitive data your program left behind in uninitialized memory and in CPU registers.

Not leaving traces behind then becomes a serious issue. Could the kernel be tasked with clearing registers and clearing re-assigned memory before giving these resources to another program? The kernel knows exactly when he is doing that, no?

It would be a better solution than trying to fix all possible compilers and scripting engines in use. Fixing these tools smells like picking the wrong level to solve this problem ...

clarry11y ago

I'm not sure this is the scenario we're fighting. The problem is when your program (which handles sensitive data) has a flaw in it: for example, it might be possible to trick it into leaking uninitialized data (possibly out of bounds) over the wire. Another potential issue is core dumps (and maybe swapping, but that's a little different). You don't want sensitive data to be written on the disk.

Malicious programs running with your program's privileges are a different scenario altogether, and usually they can do a lot of damage. Want sensitive information out of another process? Try gdb.

But yes, it is trivial for the kernel to zero a page before handing it out.

tgflynn11y ago

What about malicious programs without privileged access ? Is it possible for them to just keep requesting new memory pages from the kernel and see leaked data that was free'd by another process they shouldn't have access to or is this something kernels are already preventing ?

gioele11y ago

WRT the AESNI leaking information in the XMM registers, wouldn't starting a fake AES decryption solve the problem?

Also, wouldn't a wrapper function that performs the AES decryption and then manually zeroes the registers be a good enough work around?

AlyssaRowan11y ago

If you're using AES-NI, you're already using an intrinsic. I haven't yet met a compiler "smart" enough to recognise an AES implementation and replace it with AES-NI, and god, I hope I never do.

Yes, you probably ought to be clearing xmm* registers touched by it, and that would I hope be good enough.

The point in the article about compiled code very seldom touching xmm* so that if you don't wipe it - is doing so currently common practice? I haven't checked, but I feel like that would be something that needs checking! - it's hanging around and you might leak it, is completely valid, however.

Demiurge11y ago

Every time I read one of these posts about a clever "attack vector", how something can be gleaned from this special register, or a timing attack, somesuch, I remember about a theory that the sound of a dinosaurs scream can be extracted from the waves impact made on a rocks crystal structure.

I googled pretty hard for real life example uses of a timing attack, and now using of stale data on the register, but couldn't find anything. Does anyone know of examples of this actually being done?

Taek11y ago

These types of attacks though only require one person to create a system that can reliably exploit them, and then the vulnerability will be in the wild and a more significant problem. Pulling off this type of attack is difficult, but you only need one piece of malware that has a reliable way to exploit this in a general case and then it becomes available to every script kiddie who finds some motivation for stealing private keys.

These type of attacks also might become more of a problem as more sensitive computation is done on shared machines (IE cloud compute).

So, while there's no reason to panic because these security features aren't implemented hardly anywhere, you can't let the issues sit unaddressed for long periods of time.

Demiurge11y ago

But there is a whole range of potential issues. Or things compiler developers can do. As any task, they should be sorted, weighted by ease of exploitation and ease of solving. What I suspect, and I'm just curious to see if I am wrong, is that developers postulate vulnerabilities that real hackers would never bother with, and miss what they really go for, such as trivial mistakes, such as forgetting bounds checking.

So, I've seen a lot of (conceptually) trivial exploits and combinations of trivial exploits, but I would love to see a real world example of someone collecting enough information from an 'bad RNG', registers, or timing, to do anything with it.

anon138511y ago

For examples of real implementations of timing attacks, try this: http://www.contextis.com/documents/2/Browser_Timing_Attacks....

Some of those are fixed now, but the history stealing link redrawing one is still an issue as far as I know (or at least, this bug is still open https://bugzilla.mozilla.org/show_bug.cgi?id=884270 ).

Demiurge11y ago

Thanks, that's pretty awesome. However, I was talking about attacks relying on non-constant time memory copy or math function that is used to somehow defeat a server or cryptography.

joshzayin11y ago

A few examples:

"Remote Timing Attacks are Practical" https://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

"RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis" http://www.tau.ac.il/~tromer/acoustic/

lnanek211y ago

Doesn't actually seem true. OK, running the decrypt leaves the key and data in SSE registers that are rarely used where it might be looked up later by attackers. There isn't any portable way to explicitly clear the registers. Then why not just run the decrypt again with nonsense inputs when you are done to leave junk in there instead? Yes, inefficient, but a clear counter example. You could then work on just doing enough of the nonsense step to overwrite the registers.

TheLoneWolfling11y ago

> Then why not just run the decrypt again with nonsense inputs when you are done to leave junk in there instead?

Because the compiler is perfectly within its rights to optimize that out!

MichaelGG11y ago

Not if you write the junk output to volatile variable, right?

TheLoneWolfling11y ago

Let me clarify.

If you use a deterministic nonsense value, the compiler can turn the result of decrypt(nonsense) into a constant at compile time, and just directly output the constant to the volatile variable, without actually ever calling decrypt again at runtime. So it can turn this:

    decrypt(real);
    nonsense = <whatever>;
    volatile junk = hash(decrypt(nonsense));

Into this:

    decrypt(real);
    volatile junk = <the appropriate constant value>;

But even if you nonsense is non-deterministic (although I question where you are getting the entropy - if you're using a syscall / random / etc your performance has potentially just gone out the window), the compiler is well within its rights to optimize the second junk decrypt of the nonsense input differently than the first (real) decrypt - in such a way that it does not overwrite everything left behind by the first decryption.

(Same with encryption)

1 more reply

lifeisstillgood11y ago

That's just a workaround though. It still does not invalidate the basic thrust "I can't write code to handle keys in C and be sure I have not left copies anywhere"

The proposal seems goods

ge0rg11y ago

Even if the proposed feature is added to C and implemented, there is still the (practical) problem of OS-level task switching: when your process is interrupted by the scheduler, its registers are dumped into memory, from where they might even go into swap space.

It would be consequential (but utterly impractical) to add another C-level primitive to prevent OS-level task suspension during critical code paths. Good luck getting that into a kernel without opening a huge DoS surface :)

kazinator11y ago

The obvious fix is to address "might go into swap space". However, the real problem is that the process can be interrupted at any time and examined, not that the registers might go to swap.

If someone has the root privs to peek at your memory, they can also stop your process at any time and examine all the registers, whether they were swapped out to disk or not.

Moving the crypto code into the kernel and running with disabled interrupts doesn't help because the attacker is already assumed to have super-user privileges (they can peek at arbitrary RAM, after all). There are also non-maskable interrupts.

You basically cannot hide the machine state from someone who controls the machine: not without splitting the machine itself into additional privilege levels, such that there is a security level that is not accessible even to the OS kernel. The sensitive crypto routines run in that level. The manufacturer of the SoC provides these as firmware, and the regular kernel has no visibility to the internals.

ARM has a security model that supports this.

There is also something even more paranoid called TrustZone: http://en.wikipedia.org/wiki/ARM_architecture#Security_exten...

zvrba11y ago

Posts like this make me just more convinced about that C combines the worst of "portability" and "assembly" into "portable assembly".

cousin_it11y ago

I don't completely understand the C spec. Would the following approach work for zeroing a buffer?

1) Zero the buffer.

2) Check that the buffer is completely zeroed.

3) If you found any non-zeros in the buffer, return an error.

Is the compiler still allowed to optimize away the zeroing in this case?

zAy0LfpBZLC8mAC11y ago

You are mixing up the C level buffer abstraction and some potentially underlying RAM. C doesn't deal with RAM, only with the abstraction, so you can't look at the RAM in C, you only can look at the abstract buffer, and the only thing that the compiler has to guarantee is that the abstraction holds - namely, that after you write zeroes into the abstract buffer, a subsequent conditional that checks whether the buffer is zeroed will branch accordingly, which is a fact that is trivial to evaluate during compile time, and as soon as the compiler has determined that the conditional is statically determined, it can eliminate any alternative branches as dead code and translate the abstract buffer write into a NOOP at the machine code level.

pascal_cuoq11y ago

> Is the compiler still allowed to optimize away the zeroing in this case?

Yes, completely. In the snippet below, the compiler is allowed to eliminate all code after “leave secrets in array c”.

  {
    char c[2];
    ... /* leave secrets in array c */
    memset(c, 0, 2);
    c[0] = 0;
    c[1] = 0;
    memset(c, 0, 2);
    if (c[0] || c[1]) exit 1;
  }

The compiler is also allowed to compile the last three instructions below as if they were “return 0;”

  {
    char c[2];
    ... /* leave secrets in array c */
    c[0] = 0;
    c[1] = 0;
    return c[0] + c[1];
  }

lazyjones11y ago

> In the snippet below, the compiler is allowed to eliminate all code after “leave secrets in array c”

gcc 4.4.5 doesn't though (-O3), it still clears the stack once and performs the comparison.

I believe these optimizations can be defeated by declaring a global

  volatile char fill = 0;

and using that instead of 0 in memset().

MichaelGG11y ago

It's not guaranteed to defeat the optimization. For instance, it could just read fill into two registers and do the comparison there.

nly11y ago

Is the compiler still allowed to optimize away the zeroing in this case?

With 'volatile', generally not, modulo bugs. Without volatile, it would never return an error.

Rapzid11y ago

I was wondering that too.. I would think that simply accessing the any byte in the buffer afterwards would prevent the compiler optimizing it out.

xxs11y ago

That depends how much the compiler can optimize (away). If the next call is free(), it's quite trivial to skip the zeroing and just take the correct branch.

I am still uncertain while people want to just 'zero' it. Filling random data (just one random() call) and then using inline PRNG, then summing the result, storing it globally in volatile would reliable 'zero' the data but it's quite CPU intense.

gizmo68611y ago

You still are not guaranteed to clear the buffer like that.

  for (int i=0; i<len; i++){
    sensitiveBuffer[i]=random();
  }
  int sum=0;
  for (int i=0; i<len; i++){
    sum+=sensitiveBuffer[i];
  }
  volitileVar=sum;

Using loop fusion, the compiler can optimize this to: int sum=0; for (int i=0; i<len; i++){ sensitiveBuffer[i]=random(); sum+=sensitiveBuffer[i]; } volitileVar=sum;

Which it can then optimize to: int sum=0; for (int i=0; i<len; i++){ sum+=random(); } volitileVar=sum;

In fact, as the article points out, the compiler can legally transform:

  reallyZeroBuffer(sensitiveBuffer);

into

  pointlesslyCopy(sensitiveBuffer);
  reallyZeroBuffer(sensitiveBuffer);

1 more reply

zAy0LfpBZLC8mAC11y ago

Why shouldn't the compiler be able to figure out that you sum a series of numbers that aren't used anywhere else, thus don't need to be spilled to RAM?

1 more reply

ausjke11y ago

There are some chips providing zeroizing a small region of device memory when needed and it's specially designed to hold encryption keys etc. It's also done by hardware.

rsync11y ago

Would running your file system read only and optimizing the system for fast bootup be a workaround ? If so you could zero successfully by rebooting...

tedunangst11y ago

After what? Every https request? Simply exiting the process is sufficient to prevent most info leaks, but even that's much too slow and not even a solution. The class of bugs here is that sensitive data is in memory and then the same program inadvertently leaks it while performing some other operation. If you reboot before the leak, you won't make it to that other operation, sure, but your program won't be much use either.

User logs in by sending password. System transitions to authorized state. System wants to wipe password to avoid later leak. If you reboot at this point, the user will no longer be authorized.

higherpurpose11y ago

> It is impossible to safely implement any cryptosystem providing forward secrecy in C

What about Rust?

pcwalton11y ago

Rust has the same problem described in the article.

nathan711y ago

I'm guessing that mitigating this at the Rust level isn't doable, because its memory model has the same properties with regards to zeroing. To change that, LLVM support would be needed. This does make me wonder — how do you integrate this into a type system? Rust has already done a pretty awesome job at integrating memory-safety into the type system, but memory-secure type systems seem fairly unexplored.

Jweb_Guru11y ago

Memory-security as defined in this article isn't exactly safe. It's just a mitigation feature. The comments above provide plenty of examples of how once an attacker is on the system, he or she can easily get past any language-level construct you care for. If the system were completely memory-safe (which would mean no memory safety bugs in the hardware, kernel, SELinux (or some other kernel extension that lets you do things like deny ptrace), LLVM, the Rust compiler, any libraries you're using, or your program itself, and you weren't doing anything that completely negated all those benefits like JIT compiling code, then you don't need to zero the memory at all--it will do nothing for you as a mitigation technique, and you'd be "memory safe" by your definition. But you're not out of the woods yet, because even with zeroing you are STILL vulnerable to ordinary, non-memory-safety bugs in your code allowing that data to be read. Multiple threads, forgotten heap allocations, and so on. A user with sufficient privileges could glance at the available cache lines. Etc. Anyway, this entire scenario is a fantasy because your system isn't fully memory safe :)

The only way I can think of to actually guarantee real memory security in any meaningful way is to completely verify a much smaller system (not just memory safety, but that it's actually bug-free), isolate it at the hardware level, and do all of your computation using that hardware isolation feature. It has to be hardware because, for example, there's no reasonable way to deterministically erase data swapped to SSD. You'd still be susceptible to hardware bugs, but you can't ever protect against those completely. So basically, get an HSM :)

j / k navigate · click thread line to collapse

130 comments

pslam11y ago

Part 2 is correct in that trying to zero memory to "cover your tracks" is an indication that You're Doing It Wrong, but I disagree that this is a language issue.

acqq11y ago

tgflynn11y ago

I wonder if current VM implementations are doing this systematically.

pslam11y ago

tgflynn11y ago

In that case your original comment looks like the way to go and should make pretty much everything else in this thread moot.

1 more reply

willvarfar11y ago

Excellent point! I really hope such a sensible suggestion is added to mainstream compilers asap and blessed in future standards.

Apologies to everyone suffering Mill fatigue, but we've tried to address this not at a language level but a machine level.

As mitigation, we have a stack whose rubble you cannot browse, and no ... No registers!

But the real strong security comes from the Mill's strong memory protection.

More in this talk http://millcomputing.com/docs/security/ and others on same site.

PhantomGremlin11y ago

> we have a stack whose rubble you cannot browse, and no ... No registers!

That was a mere 30+ years ago. Let's hope you're more successful than they were.

[1] http://www.catb.org/jargon/html/W/wheel-of-reincarnation.htm... [2] http://en.wikipedia.org/wiki/IAPX432#Object-oriented_memory_...

ams611011y ago

If I'm understanding the idea, this reminds me of the processor in my first computer, the TMS9900 used in the TI-99/4[a] computers.

ChuckMcM11y ago

cperciva11y ago

Just wondering, have you talked to the CHERI people? It sounds like there is a lot of commonality of interests there.

willvarfar11y ago

I've been following them but we haven't talked. Yet :)

polarix11y ago

What's the current status of the Mill project? Is there a proof of concept compiler / emulator? What's the bootstrap strategy to get things rolling?

Symmetry11y ago

[1]http://l4linux.org/

xorcist11y ago

Why on L4? Is Mill somehow tied to it, architecture-wise? Or is it just that L4 has a smaller footprint and is easier to port?

2 more replies

willvarfar11y ago

We are hard at work :)

There is no public SDK yet, and hardware is also under development.

We've had a simulator for a long time, and we show it off a bit in the Specification talk:

http://millcomputing.com/docs/specification/

AlyssaRowan11y ago

It's becoming gradually more tempting to write a crypto library in assembly language, because at least then, it says exactly what it's doing.

Alas, microcode, and unreadability, and the difficulty of going from a provably correct kind of implementation all the way down to bare metal by hand.

The proposed compiler extension, however, makes sense to me. Let's get it added to LLVM & GCC?

ctz11y ago

That works for well-defined ISAs (like ARM), but not for those with undocumented pipelines, or instructions defined by practise (like x86 and amd64).

In other words, if you write a crypto library in x86 assembler, Intel don't guarantee that they won't introduce a side channel in their next chip model or stepping.

AlyssaRowan11y ago

cesarb11y ago

If I wanted to read CPU registers from the outside, there's an easy way: JTAG. You should be able to halt the CPU, read (and modify!) the registers, and resume the CPU.

That should be possible even on x86, though on x86 the relevant documentation is probably hard to find. For some ARM processors, it should be as easy as installing openocd.

Of course, JTAG requires physical access to plug the debugging cable, which puts it in a different category of attack.

1 more reply

Symmetry11y ago

1 more reply

dmm11y ago

xxs11y ago

Using assembly won't preclude timing attacks vulnerability, esp on x64. Nowadays beating even the C compiler performance wise is exceeding difficult with hand written assembly.

pix6411y ago

The point isn't to be faster, it is to be consistent.

1 more reply

d0mine11y ago

David Beazley after analyzing 1.5 Tbytes of C++ code shows in "Some Lessons Learned": C++ -- SUCKS, Assembly code -- ROCKS http://www.youtube.com/watch?v=RZ4Sn-Y7AP8#t=2049

geertj11y ago

This is what djb is doing using his "qhasm" assembly like language. He seems to be doing it for performance though, not to work around too aggressive compilers.

As an alternative, maybe write crypto algorithms in LLVM IR?

floody-berry11y ago

Adding an annotation for qhasm where stack variables/registers would be zero'd at the end of the function if they still contained sensitive data would be great.

cesarb11y ago

For AESNI, you probably are already using some sort of assembly to call the instructions. In the same assembly, you could wipe the key and plaintext as the last step.

kabdib11y ago

The problem is much, much wider than a compiler extension.

clarry11y ago

When you use two systems to process sensitive information, you have at least two problems to solve...

userbinator11y ago

AlyssaRowan11y ago

Having seen a few existing designs of those, up-close and personal - actually they do have to worry about zeroisation, quite an awful lot.

So, how would you think that environment translates to writing secure firmware, or designing secure, state-of-the-art hardware? ;-)

lazyjones11y ago

Are current GPUs suitable subsystems for running properly isolated cryptographic algorithms? If not, why not? If yes, perhaps a well-audited open source library would be possible.

1 more reply

dmm11y ago

pbsd11y ago

This is incorrect. The AES key schedule is bijective, which makes recovering the last round key as dangerous as recovering the first.

cperciva11y ago

tptacek11y ago

How hard is that attack to code? I have a hard time imagining a case where a target leaks just a subkey, so this is one of those things I knew "about" but not "how".

cperciva11y ago

Dead simple. 2nd year undergraduate programming assignment.

tptacek11y ago

1 more reply

pbsd11y ago

It's pretty straightforward to just iterate the key schedule backwards using the inverse S-box and a few xors; no need for any fancy stuff.

p4bl011y ago

cperciva already answered, so I'll just add that most side-channel attacks (at least those using power analysis) on AES typically focus on the last round.

tptacek11y ago

A-ha. That makes a lot of sense. Thank you!

nly11y ago

Taek11y ago

Someone11y ago

"As with "anonymous" temporary space allocated on the stack, there is no way to sanitize the complete CPU register set from within portable C code"

I don't know enough of modern hardware, but on CPUs with register renaming, is that even possible from assembly?

I am thinking of the case where the CPU, instead of clearing register X in process P, renames another register to X and clears it.

zAy0LfpBZLC8mAC11y ago

Tuna-Fish11y ago

xxs11y ago

ggchappell11y ago

This article makes a good point, but I think the problem is even worse than he describes.

So I'm thinking we need to do better. But I don't know how that might happen.

danielweber11y ago

Or I guess this is part of what HSMs are supposed to do -- do all your "secure" work in something that is very strictly controlled.

Nursie11y ago

And if I run my C program in an emulator that allows me to freeze it and dump memory I can do this stuff too...

The point is, if you want security you need to look at the whole system and in the situation you describe you can't guarantee it, no.

I'm not going to say "So don't do that", but I am going to say "If you're going to do things like that, please realise that the assumptions the system security was built on no longer hold true".

eru11y ago

You could try for an abstraction / language that provides deterministic execution.

anon411y ago

Nursie11y ago

One of the other things you're sorta-describing is an HSM.

They are, however, quite expensive.

nathan711y ago

What you're describing is called a smartcard, and readily available on the market. I keep my PGP key on one.

hollerith11y ago

Does your PGP key stay on the smartcard or is a copy of it transferred to your computer on occasion?

nathan711y ago

The key can be generated on the smartcard, and it's not possible to transfer it out of the smartcard by design. (anything that calls itself a smartcard but allows this isn't a smartcard)

Nursie11y ago

If it's a properly designed smartcard system then the key never leaves the card.

Chiba-City11y ago

pinkyand11y ago

Assembly is bad for auditability, which is important in crypto to prevent subtle errors.

AlyssaRowan11y ago

As a seasoned and experienced reverse-engineer myself, I'm (genuinely) curious where you got that impression. Do you find it unapproachable?

tedunangst11y ago

1 more reply

Symmetry11y ago

If you restrict yourself to super simple things like zeroing chunks of memory and the unused stack then it still might be acceptable.

xxs11y ago

Assembly is not really portable and error prone. I don't consider it anywhere wizardry (or hard) but corner cases are hard in C and in assembly even harder.

vidarh11y ago

In the cases where the alternative is fighting the compiler every step of the way, assembly may very well be less error prone as long as you limit it to exactly the small areas where it will help.

cheez11y ago

The suggestion has the right idea, but the wrong implementation. The developer should be able to mark certain data as "secure" so the security of the data travels along the type system.

Botan, for example, has something called a "SecureVector" which I have never actually verified as being secure, but it's the same idea.

cperciva11y ago

This was my initial idea, but talking to compiler developers convinced me that the dataflow analysis needed for this would be tricky. They were much happier with the idea of a block-scope annotation.

cheez11y ago

Similar data-flow analysis techniques as volatile.

delinka11y ago

tedunangst11y ago

You can clear the CPU state. But how is the CPU to know when it's safe to clear unless the software tells it?

db99999911y ago

Try:

  #include <string.h>

  void bar(void *s, size_t count)
  {
        memset(s, 0, count);
        __asm__ ("" : "=r" (s) : "0" (s));
  }

  int main(void)
  {
        char foo[128];
        bar(foo, sizeof(foo));
        return 0;
  }

  gcc -O2 -o foo foo.c -g
  gdb ./foo
  ...
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00000000004003d0 <+0>:	sub    $0x88,%rsp
   0x00000000004003d7 <+7>:	mov    $0x80,%esi
   0x00000000004003dc <+12>:	mov    %rsp,%rdi
   0x00000000004003df <+15>:	callq  0x400500 <bar>
   0x00000000004003e4 <+20>:	xor    %eax,%eax
   0x00000000004003e6 <+22>:	add    $0x88,%rsp
   0x00000000004003ed <+29>:	retq   
  End of assembler dump.
  (gdb) disassemble bar
  Dump of assembler code for function bar:
   0x0000000000400500 <+0>:	sub    $0x8,%rsp
   0x0000000000400504 <+4>:	mov    %rsi,%rdx
   0x0000000000400507 <+7>:	xor    %esi,%esi
   0x0000000000400509 <+9>:	callq  0x4003b0 <memset@plt>
   0x000000000040050e <+14>:	add    $0x8,%rsp
   0x0000000000400512 <+18>:	retq   
  End of assembler dump.

jedisct111y ago

erik12311y ago

However, they could indeed be able to circumvent the permission system by figuring out what sensitive data your program left behind in uninitialized memory and in CPU registers.

It would be a better solution than trying to fix all possible compilers and scripting engines in use. Fixing these tools smells like picking the wrong level to solve this problem ...

clarry11y ago

Malicious programs running with your program's privileges are a different scenario altogether, and usually they can do a lot of damage. Want sensitive information out of another process? Try gdb.

But yes, it is trivial for the kernel to zero a page before handing it out.

tgflynn11y ago

gioele11y ago

WRT the AESNI leaking information in the XMM registers, wouldn't starting a fake AES decryption solve the problem?

Also, wouldn't a wrapper function that performs the AES decryption and then manually zeroes the registers be a good enough work around?

AlyssaRowan11y ago

If you're using AES-NI, you're already using an intrinsic. I haven't yet met a compiler "smart" enough to recognise an AES implementation and replace it with AES-NI, and god, I hope I never do.

Yes, you probably ought to be clearing xmm* registers touched by it, and that would I hope be good enough.

Demiurge11y ago

I googled pretty hard for real life example uses of a timing attack, and now using of stale data on the register, but couldn't find anything. Does anyone know of examples of this actually being done?

Taek11y ago

These type of attacks also might become more of a problem as more sensitive computation is done on shared machines (IE cloud compute).

So, while there's no reason to panic because these security features aren't implemented hardly anywhere, you can't let the issues sit unaddressed for long periods of time.

Demiurge11y ago

anon138511y ago

For examples of real implementations of timing attacks, try this: http://www.contextis.com/documents/2/Browser_Timing_Attacks....

Some of those are fixed now, but the history stealing link redrawing one is still an issue as far as I know (or at least, this bug is still open https://bugzilla.mozilla.org/show_bug.cgi?id=884270 ).

Demiurge11y ago

Thanks, that's pretty awesome. However, I was talking about attacks relying on non-constant time memory copy or math function that is used to somehow defeat a server or cryptography.

joshzayin11y ago

A few examples:

"Remote Timing Attacks are Practical" https://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf

"RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis" http://www.tau.ac.il/~tromer/acoustic/

lnanek211y ago

TheLoneWolfling11y ago

> Then why not just run the decrypt again with nonsense inputs when you are done to leave junk in there instead?

Because the compiler is perfectly within its rights to optimize that out!

MichaelGG11y ago

Not if you write the junk output to volatile variable, right?

TheLoneWolfling11y ago

Let me clarify.

    decrypt(real);
    nonsense = <whatever>;
    volatile junk = hash(decrypt(nonsense));

Into this:

    decrypt(real);
    volatile junk = <the appropriate constant value>;

(Same with encryption)

1 more reply

lifeisstillgood11y ago

That's just a workaround though. It still does not invalidate the basic thrust "I can't write code to handle keys in C and be sure I have not left copies anywhere"

The proposal seems goods

ge0rg11y ago

kazinator11y ago

The obvious fix is to address "might go into swap space". However, the real problem is that the process can be interrupted at any time and examined, not that the registers might go to swap.

If someone has the root privs to peek at your memory, they can also stop your process at any time and examine all the registers, whether they were swapped out to disk or not.

ARM has a security model that supports this.

There is also something even more paranoid called TrustZone: http://en.wikipedia.org/wiki/ARM_architecture#Security_exten...

zvrba11y ago

Posts like this make me just more convinced about that C combines the worst of "portability" and "assembly" into "portable assembly".

cousin_it11y ago

I don't completely understand the C spec. Would the following approach work for zeroing a buffer?

1) Zero the buffer.

2) Check that the buffer is completely zeroed.

3) If you found any non-zeros in the buffer, return an error.

Is the compiler still allowed to optimize away the zeroing in this case?

zAy0LfpBZLC8mAC11y ago

pascal_cuoq11y ago

> Is the compiler still allowed to optimize away the zeroing in this case?

Yes, completely. In the snippet below, the compiler is allowed to eliminate all code after “leave secrets in array c”.

  {
    char c[2];
    ... /* leave secrets in array c */
    memset(c, 0, 2);
    c[0] = 0;
    c[1] = 0;
    memset(c, 0, 2);
    if (c[0] || c[1]) exit 1;
  }

The compiler is also allowed to compile the last three instructions below as if they were “return 0;”

  {
    char c[2];
    ... /* leave secrets in array c */
    c[0] = 0;
    c[1] = 0;
    return c[0] + c[1];
  }

lazyjones11y ago

> In the snippet below, the compiler is allowed to eliminate all code after “leave secrets in array c”

gcc 4.4.5 doesn't though (-O3), it still clears the stack once and performs the comparison.

I believe these optimizations can be defeated by declaring a global

  volatile char fill = 0;

and using that instead of 0 in memset().

MichaelGG11y ago

It's not guaranteed to defeat the optimization. For instance, it could just read fill into two registers and do the comparison there.

nly11y ago

Is the compiler still allowed to optimize away the zeroing in this case?

With 'volatile', generally not, modulo bugs. Without volatile, it would never return an error.

Rapzid11y ago

I was wondering that too.. I would think that simply accessing the any byte in the buffer afterwards would prevent the compiler optimizing it out.

xxs11y ago

That depends how much the compiler can optimize (away). If the next call is free(), it's quite trivial to skip the zeroing and just take the correct branch.

gizmo68611y ago

You still are not guaranteed to clear the buffer like that.

  for (int i=0; i<len; i++){
    sensitiveBuffer[i]=random();
  }
  int sum=0;
  for (int i=0; i<len; i++){
    sum+=sensitiveBuffer[i];
  }
  volitileVar=sum;

Using loop fusion, the compiler can optimize this to: int sum=0; for (int i=0; i<len; i++){ sensitiveBuffer[i]=random(); sum+=sensitiveBuffer[i]; } volitileVar=sum;

Which it can then optimize to: int sum=0; for (int i=0; i<len; i++){ sum+=random(); } volitileVar=sum;

In fact, as the article points out, the compiler can legally transform:

  reallyZeroBuffer(sensitiveBuffer);

into

  pointlesslyCopy(sensitiveBuffer);
  reallyZeroBuffer(sensitiveBuffer);

1 more reply

zAy0LfpBZLC8mAC11y ago

Why shouldn't the compiler be able to figure out that you sum a series of numbers that aren't used anywhere else, thus don't need to be spilled to RAM?

1 more reply

ausjke11y ago

There are some chips providing zeroizing a small region of device memory when needed and it's specially designed to hold encryption keys etc. It's also done by hardware.

rsync11y ago

Would running your file system read only and optimizing the system for fast bootup be a workaround ? If so you could zero successfully by rebooting...

tedunangst11y ago

User logs in by sending password. System transitions to authorized state. System wants to wipe password to avoid later leak. If you reboot at this point, the user will no longer be authorized.

higherpurpose11y ago

> It is impossible to safely implement any cryptosystem providing forward secrecy in C

What about Rust?

pcwalton11y ago

Rust has the same problem described in the article.

nathan711y ago

Jweb_Guru11y ago

j / k navigate · click thread line to collapse