LLVM patch to fix half of Spectre attack (opens in new tab)

(reviews.llvm.org)

433 pointsKristine19758y ago242 comments

242 comments

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

Its hard to get your head around how big a deal this is. This vulnerability is so bad they killed x86 indirect jump instructions. It's so bad compilers --- all of them --- have to know about this bug, and use an incantation that hacks ret like an exploit developer would. It's so bad that to restore the original performance of a predictable indirect jump you might have to change the way you write high-level language code.

It's glorious.

jasode8y ago

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

[EDITED following text to replace "Intel bug" with "Spectre bug" based on ars and jcranmer clarification. The Intel Meltdown can be fixed with operating system update patches for kpti instead of a complete recompile.]

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

Imagine the economics of all these mitigations. Also imagine that each of the cloud vendors AWS/Google/Azure/Rackspace had very detailed Excel spreadsheets extrapolating cpu usage for the next few years to plan for millions of $$$ of capital expenditures. Because of the severe performance implications of the bugfix (5% to 50% slowdown?), the cpu utilization assumptions in those spreadsheets are now wrong. They will have to spend more than they thought they did to meet goals of workload throughput.

There are dozens of other scenarios that we can't immediately think of.

ars8y ago

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

2 more replies

ghaff8y ago

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

2 more replies

strongholdmedia8y ago

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

Not meaning to be that rude, yet this itself summarises (and the issue perhaps will shed more light on) how stupid an idea is to let everybody run untrusted code from other peoples, let alone third party stuff like "privacy-intrusion-as-a-service" startups et aliae.

late2part8y ago

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

2 more replies

ikeyany8y ago

Is this a 5% to 50% performance hit on all workloads or specific workloads?

1 more reply

voidmain8y ago

And I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem. They were just the variants that the few people in on this found time to develop exploits for. There can now be security bugs in things your program doesn't do; it seems like there is room for nearly unlimited creativity in finding them.

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

jcranmer8y ago

I think the main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise." AMD and ARM are probably still exploitable with unknown exploits, possibly even at Meltdown-levels of exploitability, but people haven't taken the time to reverse-engineer the microarchitecture enough to find the exploits.

If I were developing processors, I'd be having emergency meetings on trying to craft exploits to figure out where our processors' weaknesses are. While being happy that Intel is getting all the bad PR for this and I'm not.

3 more replies

geertj8y ago

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

1 more reply

pdpi8y ago

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

rayiner8y ago

Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?

simias8y ago

Legitimate question: on any non-shared non-virtualized system is there any reason to enable these workarounds besides running sandboxed applications such as javascript in a web browser (or flash/java applets/Active X, but those are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code anyway. Privilege escalation is always a bad thing of course, but for single user desktop machines getting user shell access as an attacker means that you can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be a website running untrusted JS. For all other applications running on my machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at CPU vendors. Because in 2017 you need to enable a turing complete language interpreter in your browser in order to display text and pictures on many (most?) websites.

Gopher should've won.

4 more replies

catnaroek8y ago

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

tcoppi8y ago

Seems like the ultimate end-game here is to have mini-vms for every process using CPU-level ring protection. If you can't speculate across privilege levels, only inside them, it isn't a security problem anymore.

3 more replies

tptacek8y ago

We have different utility functions, you and I.

1 more reply

bouk8y ago

tptacek exploits computers for a living, so it's glorious for him :)

2 more replies

wglb8y ago

I like to think of a lot of vulnerability discovery and research as solving a puzzle. In the sense that this puzzle has so many far reaching implications makes it totally compelling to me. tqbf says "glorious", and I couldn't disagree.

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

wyager8y ago

> leveraging fundamental assumptions about high-performance CPU design.

I believe the generalized fix is to restore the entire CPU state after a mispredict. You’d either need to add an extra copy of the entire processor state (tens of megabits) for every simultaneous predict you support ($$$) or keep track of how to revert all changes and revert them one at a time ($, slow).

3 more replies

js28y ago

CPUs have been vulnerable to this attack since 1995. How did it collectively take us 22 years to figure this out? I know it's a highly esoteric complex attack, but there's no shortage of clever hackers in the world.

tzahola8y ago

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

2 more replies

jcranmer8y ago

It's sort of been well-known that speculative execution opens up the possibility of side-channel attacks for quite some time. Hell, it's long-known that SMT (e.g., HyperThreading) can leak keys in a not-really-fixable way.

What's new and surprising is the power of these side-channel attacks--you can use these, reliably, to exfiltrate arbitrary memory, including across privilege modes in some cases (apparently, some ARM cores are affected by the latter vulnerability, in addition to Intel).

pixl978y ago

Honestly we knew about this in the 70s. Mainframe/time share systems had lots of protections against attacks like this. The problem is mainstream computing when cheap/single user and attempted to build a multi user/untrusted code execution environment on top of it. Now it's come back to bite us in the ass.

Danihan8y ago

>there's no shortage of clever hackers in the world.

Are you sure?

jmull8y ago

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

Here that perhaps means associating cache lines with a speculative branch somehow so that they aren't accessible until/unless the speculative branch becomes the real branch. (I have no idea exactly how that would be done or what the performance cost might be... I'd really need to know the details of how speculative execution is implemented in a particular CPU to even be able to guess.)

jncraton8y ago

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b8y ago

Prediction: This will be just like any vulnerability disclosure. The infosec people and media will scream hysterically about how game changingly bad it is. The OS vendors will patch, and business will go on as usual.

leeoniya8y ago

i know this came out as a leak, but makes one wonder how "responsible" even a Jan 9 official announcement would have been. the scope is absolutely terrifying. this bug will be exploitable for a very long time.

jopsen8y ago

They had like 6 months or so... how is more time going to make things less painful?

Piskvorrr8y ago

Jan 9, 2019? 2050? How much longer is long _enough_?

1 more reply

dzdt8y ago

When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

jerf8y ago

That's bad. A single 5% hit might not be the end of the world, but 5% here and 10% there and another 5% over there in the common case adds up badly enough. Doubly-pathological cases (indirect calling-heavy code calling lots of syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they were even slightly relaxed about it prior to disclosure I expect there's going to be some very hurried attempts to engineer some solutions pronto. This is the sort of thing where it might even be worth throwing away all of your future roadmap plans and just getting a revision of the current chips out there ASAP, whatever that may do to the rest of your roadmap.

ant6n8y ago

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!

3 more replies

Paul-ish8y ago

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

mappu8y ago

I use a binary distro and definitely don't want to be running massively slowed-down software mitigations on a corrected CPU.

Although actually, we already are - binary distros already don't take into account per-microarchitecture scheduling, nor any ISA extensions above a common baseline (e.g. just SSE2, no autovectorising to AVX2 etc).

This might provide enough impetus to restructure how binary distros work and get the whole distro compiled with some newer CPU flags (march={first corrected architecture}?) but in the short term i assume every package will take the hit.

Great time to learn about source-based distros!

vfaronov8y ago

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

nindalf8y ago

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

tzahola8y ago

I thought it would be Moore's law that forces people to care about their codes' performance. I was wrong, but am nevertheless happy about the recent developments. Programming will become an art once again :)

crb0028y ago

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

chrisper8y ago

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

AaronFriel8y ago

This is brutal for all interpreted/JITed languages and all statically compiled languages with dynamic dispatch. I can hardly imagine worse news for performance oriented engineers. And what's worse is that dynamic libraries will probably need to be rebuilt with these mitigations in mind, so nearly everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive applications in these languages. There's a whole lot of Java, .NET, and JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy (i.e.: vtable using) C++ will get slower. Rust trait objects get slower. Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs will want to switch to emitting "retpoline" code for dynamic dispatch. There's no world in which I don't expect the JVM, V8, and others to switch to these by default soon.

rntz8y ago

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

Would this produce a data dependency that prevents speculative execution from reading an out-of-bounds memory address? (Ignore for the moment that a sufficiently smart compiler might "optimize" out the modulo here.)

jzl8y ago

A new thing that's going to become a standard part of systems engineering: deciding whether any given system needs to run with or without these kinds of protections. Do you want the speed of speculative execution or do you want Meltdown/Spectre protection? In some cases lack of protection is fine. But figuring out the answer for any given system is often going to take expert-level security knowledge. Security is all about multiple layers of protection, and even a non-public facing machine might benefit from these layers depending on the context.

s4vi0r8y ago

Spectre relies on tricking the CPU into branch predicting its way into accessing protected memory, no? Is it not possible that we can keep most of the performance benefits of speculative execution by somehow having a built in "Hey, never ever speculate that I'll want to access this region of memory" sort of thing?

lorenzq8y ago

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

senatorobama8y ago

Uh, isn't this what AMD does?

crb0028y ago

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece8y ago

This is what KPTI does, wipe caches, and if you did this often in user code, performance degradation would be all over the place. Also, heavy AVX routines that use encryption keys... would be great to attack.

ece8y ago

More likely, this is a shift back to in-order processors, if the solutions aren't workable. If you're in an embedded scenario, sure you can make more trade-offs and have more control, but it's not going to look great when it happens to get hacked.

leni5368y ago

It has an interesting performance impact on calls to dynamic libraries. One alternative approach would be to avoid the indirect calls through not using '-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'. This causes the relocations to happen at the direct calls and not through a GOT.

The obvious drawback that it effectively disables sharing code in memory, it would still allow sharing code on disk though. So it would be a middle ground between the current state in dynamic and static linking.

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson8y ago

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP8y ago

And once one knows the technical background, one is better positioned to consider the response of Linus Torvalds to the idea that the entire Linux kernel be recompiled for all x86 CPUs with a compiler that implements this.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

tptacek8y ago

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

coolspot8y ago

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

kough8y ago

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest8y ago

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones8y ago

And the people who invented computers, programming languages, the internet, and all the learning resources, that allow me to get a paycheck writing extremely high level application code that feels like a coloring book in comparison. Truly the shoulders of giants.

jacksmith210068y ago

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov8y ago

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency8y ago

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

pjc508y ago

Define "trusted"? Who do you trust to do your verification, and how much does it cost?

1 more reply

dingo_bat8y ago

Oops I basically wrote an identical reply before reading yours.

Klathmon8y ago

I'm not more knowledgeable than you, but I think I agree.

side channels have always been some of the most insidious exploits. Many are basically un-solvable (timing attacks are always going to leak some information, and compression is basically completely at odds with secure information storage), many more are easily enough overlooked that it would be easy to maliciously include them without raising any eyebrows, and the "fixes" for them almost always murder performance.

I think the only real fully-encompassing solution to this is a redesign in how we use computers. Either a massive step backwards on performance and turning off most automatic "optimizations" until they can be proven through a much more rigorous process (both in compilers, and in hardware), or a significant change in how computers are architected adding more hardware level isolation for processes and systems running on the machine (just daydreaming now, but something like a cluster of isolated micro-CPUs that run one application only).

dingo_bat8y ago

How about not running untrusted code? That seems to be much easier to do and won't kill performance. Kill js on the web. Run only apps signed by Microsoft. Develop ML based malware fingerprinting that can recognise timing attack patterns. Throwing OO execution away shouldn't be an option in the long term.

3 more replies

Dylan168078y ago

> I think the only real fully-encompassing solution to this is a redesign in how we use computers.

You can do a lot by separating these speedup mechanisms across security boundaries. The biggest factor that makes this hard to mitigate is in-process security boundaries. Total isolation between processes is neither necessary nor sufficient.

rsync8y ago

"What do people more knowledgeable in the field think about this?"

https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

(from 2007)

phkahler8y ago

RISC-V impact? With all the reports of these attacks, I have not seen mention of risc-v. Since they are in the process of finalizing a lot of specs including memory model and privileged instructions, I wonder if there will be last minute changes to mitigate these vulnerabilities.

bem948y ago

The problem (in my understanding) is not with the specification of the x86 ISA, but with the implementation of the speculative execution micro-architecture and probably the memory sub-system as well. That is why Intel is so badly affected by the problem, but not AMD, despite them both implementing the same instruction set.

RISCV has already had to fix its memory consistency model, so it is not without problems. But it that is a spec bug, not an implementation bug. Whether there is an out of order, speculative execution RISCV core in the wild which suffers from this is as far as I know very unlikely. If there is, no doubt it's designers have had a busy time lately.

earenndil8y ago

My understanding is that meltdown is due to the implementation, but spectre occurs due to issues with the specification.

leoc8y ago

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...

https://millcomputing.com/docs/prediction/

gpderetta8y ago

The only speculation done on the mill currently is on whether it will ever be released, so I think they'll be safe.

sp3328y ago

Don't worry, I was wondering this myself. There doesn't seem to be anything official or even any discussion on the forums yet.

Tuna-Fish8y ago

The details that this attack depends on are outside the architecture of the system, in the microarchitecture. A cpu of almost any architecture can be vulnerable or not depending on how it was implemented, thus Ryzen is immune to the worst variant while both Intel and the fastest Arm cpus are vulnerable.

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

ars8y ago

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

Keyframe8y ago

RISC-V is an ISA, so it depends on the implementation.

phkahler8y ago

From one of the papers:

>> While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruc- tion set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

bpye8y ago

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

coldcode8y ago

I remember doing tricks like this in 6502 assembly and in other early processors. Amazing that to stop these attacks you have to come up with clever tricks again. Back in the 80's I would have never imagined this type of attack being something to worry about.

FLUX-YOU8y ago

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

dzdt8y ago

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

1 more reply

DiThi8y ago

I think it means they're tricks for better performance when you _don't_ have speculative execution.

gpderetta8y ago

Speculative execution is as old as branch prediction, which is very, very old.

peapicker8y ago

This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after all, all I have to do to write code with the exploit is be able to remove the patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg8y ago

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or to modify the inserted assembly (this being both hard and dangerous), someone who wants to exploit the bug will just add their own inserted assembly code that exploits the bug, and a patched compiler won't help one bit in that case.

cws1258y ago

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

It appears that Skylake and later can actually predict retpolines? Some hardware features called IBRS, IBPB, STIBP (not a lot of details on this are out there) are supposedly coming in a microcode update.

jgowdy8y ago

The problem I see with this concept is ROP mitigations like Intel’s control flow enforcement don’t seem compatible with intentionally using tweaked addresses with ret. The address they inject won’t match the shadow stack and the program will be terminated.

DannyBee8y ago

This is true, and so far, nobody has a better idea. (IE i would expect that unless someone comes up with one, that hardware CFE in its current form dies and won't happen for Intel until the processors are changed in a way that mitigation is not needed)

teilo8y ago

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als08y ago

Assuming the compiler knows what it's doing :)

teilo8y ago

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

1 more reply

nathell8y ago

I can't help thinking of how the early-ITS approach to security (not only was there none, but looking at other users' work was a deliberate feature) was embraced by its users. I'm way too young to remember, but it rings a bell somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users might inflict, and ways to prevent or mitigate, but little to the malice itself. Whence does it arise? What emotions drive those users? What unmet needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I intend to not run them, to the possible extent. I intend to keep aside a VM with patches for critical stuff, like banking or others' data entrusted to me. But I don't want my machine to be slowed down just because someone, sometime, might invest effort in targeting these attacks at it. Given how transparent I want to be with my life, that's a risk I'm willing to take.

fwip8y ago

Most attacks aren't targeted at specific people. Hackers don't want to read your emails, they want your credit-card information, digital account passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the drive-by javascript doesn't care about your secrets - it'll hack you anyway. Best-case scenario, you lose access to a bunch of accounts you used to use and need to create new identities from scratch. Worst-case, they clean you out financially, steal your identity, etc.

fooker8y ago

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

tptacek8y ago

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

kibwen8y ago

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

StavrosK8y ago

Pretty well, thanks. What I'm wondering is: The attack is using the data fetched into the cache from a speculative indirect jump to do a timing attack and discover what's in the former, correct? Why can't the CPU mark the cache area it fetched in the speculative jump as "stale" and discard it? Why wouldn't that fix the problem?

1 more reply

nwmcsween8y ago

Intel is going to release a microcode update for BTB control apparently.

revelation8y ago

retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

littlestymaar8y ago

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

4 more replies

sanxiyn8y ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

dingo_bat8y ago

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

contrarian_8y ago

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam8y ago

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg8y ago

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches occur to addresses marked as "access denied". Instead the fetch happens and a line of normally inaccessible memory is put into cache by code that should not be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked when they encounter permission denied errors from the paging system and do not change the cache state. That blocks the Meltdown attack, but not the Spectre attack.

Pelam8y ago

I thought about that too... AFAIK currently paging system is not generally accessible to userland programs like browsers. They would need some way to setup different contexts for untrusted javascript code and the internal services that the javascript can call.

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

jacobolus8y ago

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

Pelam8y ago

Something like the portal calls and "turfs" described in there could help.

userbinator8y ago

This is horrible, really really horrible. And I'm not talking about the bug itself, but the mitigation --- which is basically "stop using indirect jump and call instructions and recompile all your software". The latter is beyond unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as much as possible, but this is basically giving an implicit message to the hardware designers: "it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software."

hn_throwaway_998y ago

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's hardware, that cannot be patched. Of course they will change chip designs going forward, but what else do you suggest folks do with the billions of chips that exhibit this problem?

ychen3068y ago

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron648y ago

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued8y ago

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm8y ago

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

JDevlieghere8y ago

Why would you need to recompile the compiler? Both variants only provide read access.

1 more reply

strongholdmedia8y ago

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb0028y ago

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil8y ago

The site is down for me. HN hug of death?

arboroia8y ago

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

XnoiVeX8y ago

Yes. Give it about 5 minutes. It will load without images.

hultner8y ago

It was a bit slow but eventually loaded for me.

mayoralito8y ago

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

lousken8y ago

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka8y ago

Mill can't come soon enough.

mike_hearn8y ago

What makes you think the Mill would be immune to these issues?

eptcyka8y ago

Mill has no speculative execution.

1 more reply

marcosdumay8y ago

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

silimike8y ago

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters8y ago

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

j / k navigate · click thread line to collapse

242 comments

tptacek8y ago

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

It's glorious.

jasode8y ago

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

There are dozens of other scenarios that we can't immediately think of.

ars8y ago

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

2 more replies

ghaff8y ago

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

2 more replies

strongholdmedia8y ago

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

late2part8y ago

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

2 more replies

ikeyany8y ago

Is this a 5% to 50% performance hit on all workloads or specific workloads?

1 more reply

voidmain8y ago

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

jcranmer8y ago

3 more replies

geertj8y ago

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

1 more reply

pdpi8y ago

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

rayiner8y ago

simias8y ago

Gopher should've won.

4 more replies

catnaroek8y ago

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

tcoppi8y ago

3 more replies

tptacek8y ago

We have different utility functions, you and I.

1 more reply

bouk8y ago

tptacek exploits computers for a living, so it's glorious for him :)

2 more replies

wglb8y ago

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

wyager8y ago

> leveraging fundamental assumptions about high-performance CPU design.

3 more replies

js28y ago

tzahola8y ago

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

2 more replies

jcranmer8y ago

pixl978y ago

Danihan8y ago

>there's no shortage of clever hackers in the world.

Are you sure?

jmull8y ago

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

jncraton8y ago

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b8y ago

leeoniya8y ago

jopsen8y ago

They had like 6 months or so... how is more time going to make things less painful?

Piskvorrr8y ago

Jan 9, 2019? 2050? How much longer is long _enough_?

1 more reply

dzdt8y ago

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

jerf8y ago

ant6n8y ago

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!

3 more replies

Paul-ish8y ago

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

mappu8y ago

I use a binary distro and definitely don't want to be running massively slowed-down software mitigations on a corrected CPU.

Great time to learn about source-based distros!

vfaronov8y ago

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

nindalf8y ago

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

tzahola8y ago

crb0028y ago

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

chrisper8y ago

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

AaronFriel8y ago

What a mess.

rntz8y ago

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

jzl8y ago

s4vi0r8y ago

lorenzq8y ago

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

senatorobama8y ago

Uh, isn't this what AMD does?

crb0028y ago

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece8y ago

leni5368y ago

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson8y ago

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP8y ago

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

tptacek8y ago

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

coolspot8y ago

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

kough8y ago

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest8y ago

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones8y ago

jacksmith210068y ago

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov8y ago

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency8y ago

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

pjc508y ago

Define "trusted"? Who do you trust to do your verification, and how much does it cost?

1 more reply

dingo_bat8y ago

Oops I basically wrote an identical reply before reading yours.

Klathmon8y ago

I'm not more knowledgeable than you, but I think I agree.

dingo_bat8y ago

3 more replies

Dylan168078y ago

> I think the only real fully-encompassing solution to this is a redesign in how we use computers.

rsync8y ago

"What do people more knowledgeable in the field think about this?"

https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

(from 2007)

phkahler8y ago

bem948y ago

earenndil8y ago

My understanding is that meltdown is due to the implementation, but spectre occurs due to issues with the specification.

leoc8y ago

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...

https://millcomputing.com/docs/prediction/

gpderetta8y ago

The only speculation done on the mill currently is on whether it will ever be released, so I think they'll be safe.

sp3328y ago

Don't worry, I was wondering this myself. There doesn't seem to be anything official or even any discussion on the forums yet.

Tuna-Fish8y ago

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

ars8y ago

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

Keyframe8y ago

RISC-V is an ISA, so it depends on the implementation.

phkahler8y ago

From one of the papers:

bpye8y ago

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

coldcode8y ago

FLUX-YOU8y ago

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

dzdt8y ago

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

1 more reply

DiThi8y ago

I think it means they're tricks for better performance when you _don't_ have speculative execution.

gpderetta8y ago

Speculative execution is as old as branch prediction, which is very, very old.

peapicker8y ago

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg8y ago

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

cws1258y ago

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

jgowdy8y ago

DannyBee8y ago

teilo8y ago

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als08y ago

Assuming the compiler knows what it's doing :)

teilo8y ago

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

1 more reply

nathell8y ago

fwip8y ago

fooker8y ago

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

tptacek8y ago

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

How'd I do?

kibwen8y ago

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

StavrosK8y ago

1 more reply

nwmcsween8y ago

Intel is going to release a microcode update for BTB control apparently.

revelation8y ago

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

littlestymaar8y ago

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

4 more replies

sanxiyn8y ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

dingo_bat8y ago

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

contrarian_8y ago

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam8y ago

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg8y ago

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

Pelam8y ago

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

jacobolus8y ago

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

Pelam8y ago

Something like the portal calls and "turfs" described in there could help.

userbinator8y ago

hn_throwaway_998y ago

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

ychen3068y ago

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron648y ago

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued8y ago

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm8y ago

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

JDevlieghere8y ago

Why would you need to recompile the compiler? Both variants only provide read access.

1 more reply

strongholdmedia8y ago

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb0028y ago

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil8y ago

The site is down for me. HN hug of death?

arboroia8y ago

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

XnoiVeX8y ago

Yes. Give it about 5 minutes. It will load without images.

hultner8y ago

It was a bit slow but eventually loaded for me.

mayoralito8y ago

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

lousken8y ago

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka8y ago

Mill can't come soon enough.

mike_hearn8y ago

What makes you think the Mill would be immune to these issues?

eptcyka8y ago

Mill has no speculative execution.

1 more reply

marcosdumay8y ago

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

silimike8y ago

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters8y ago

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

j / k navigate · click thread line to collapse