Reading privileged memory with a side-channel

593 comments

hyperion20108y ago

An analogy that was useful for explaining part of this to my (non-technical) father. Maybe others will find it helpful as well.

Imagine that you want to know whether someone has checked out a particular library book. The library refuses to give you access to their records and does not keep a slip inside the front cover. You can only see the record of which books you have checked out.

What you do is follow the person of interest into the library whenever they return a book. You then ask the librarian for a copy of the books you want to know whether the person has checked out. If the librarian looks down and says "You are in luck, I have a copy right here!" then you know the person had checked out that book. If the librarian has to go look in the stacks and comes back 5 minutes later with the book, you know that the person didn't check out that book (this time).

The way to make the library secure against this kind of attack is to require that all books be reshelved before they can be lent out again, unless the current borrower is requesting an extension.

There are many other ways to use the behavior of the librarian and the time it takes to retrieve a book to figure out which books a person is reading.

edit: A closer variant. Call the library pretending to be the person and ask for a book to be put on hold. Then watch how long it takes them in the library. If they got that book they will be in and out in a minute (and perhaps a bit confused), if they didn't take that book it will take 5 minutes.

3148y ago

Your analogy is more apt for side-channel attacks in general. Here is a more specific version for Meltdown:

A library has two rooms, one for general books and one for restricted books. The restricted books are not allowed out of the library, and no notes or recordings are allowed to be taken out of the restricted room.

An attacker wants to sneak information out of the restricted room. To do this the pick up a pile of non-restricted books and go into the restricted room. Depending on what they read in there they rearrange the pile of non-restricted books into a particular order. A guard comes along and sees them, they are thrown out of the restricted room and their pile of non-restricted books is put on the issue desk ready to be put back into circulation.

Their conspirator looks at the order of the books on the issue desk and decodes a piece of information about the book in the restricted room. They repeat this process about 500000 times a second until they have transcribed the secret book.

pgt8y ago

I don't understand this explanation :/ Why is the room considered restricted if you can go inside? Do I know how all books that exist in the library? How does the order of the thrown out books pertain to the secret book?

1 more reply

lma218y ago

What is the analogy behind being able to go into the restricted room?

2 more replies

prab978y ago

I don't understand how this info can be used for getting what was inside the book? If my understanding of your explanation is correct, book name is analogous to memory address. When the victim (legit process) returned the book with name X (called free on the mem block X), the librarian (OS) erased all pages of the book and repurposed it for printing another book before handing it out to the evil dude(snoopy process).

myrmi8y ago

My attempt, assuming that the books only contain one character each:

The librarian has a list of books you're not allowed to take out. You request one of those books (book X), but it takes a while for search to run to see whether you're allowed to or not. While you're waiting, you say "actually, I'm not really interested in taking out book X, but if the content of that book is 'a', I'd like to take out book Y. If the content of that book is 'b', I'd like to take out book Y+1, and so on".

The librarian is still waiting for the search to complete to see if you can take out book X, but doesn't have anything better to do, so looks inside it, sees that the letter is 'b', and goes and gets book Y+1 so she can hand it over to you.

Now, the original check to see if you can take the first book out completes, and the librarian says "I'm sorry, I can't let you have book X, and I can't give you the book I fetched that you are allowed to take out, otherwise you'd know the content of the forbidden book."

Now, you request book 'Y', which you are allowed. The librarian goes away for a few minutes, and returns with book 'Y', and hands it over to you. You request book 'Y+1', and she hands it over immediately. You request book 'Y+2', and she goes away for a few minutes again, and hands it over.

You now know that Y+1 was (probably) the book she fetched when you made the forbidden request, and therefore that the letter inside the forbidden book was 'b'.

3 more replies

tamriel8y ago

The person checking out the book is a program, so they aren't the brightest.

They check out the book called "how to go to facebook.com". Then they check out "how to type a password". Then they check out "Typing '1234' for Dummies".

I bet you'll never figure out how to get into their facebook account.

pgt8y ago

Fantastic explanation of cache timing attacks. This morning I was explaining spectre to non-technical people and let me tell you, "leaking L1 CPU cache memory," is a real party starter. So I'm using there librarian example going forward.

1 more reply

technics2568y ago

Thank you for this. Would you say this applies to both Spectre and Meltdown, or one and not the other?

fooker8y ago

This is a general explanation of side channel attacks, as I understand.

3 more replies

jotux8y ago

Papers describing each attack:

https://meltdownattack.com/meltdown.pdf

https://spectreattack.com/spectre.pdf

From the spectre paper:

>As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs (cf. Listing 2).

Scary stuff.

jhallenworld8y ago

"Meltdown" is an Intel bug.

"Spectre" is very bad news and affects all modern CPUs. Mitigation is to insert mfence instructions throughout jit generated sandboxed code making it very slow, ugh. Otherwise assume that the entire process with jit generated code is open to reading by that code.

Any system which keeps data from multiple customers (or whatever) in the same process is going to be highly vulnerable.

kibwen8y ago

> Mitigation is to insert mfence instructions throughout jit generated sandboxed code making it very slow, ugh.

Here's the synchronized announcement from Chrome/Chromium: https://sites.google.com/a/chromium.org/dev/Home/chromium-se...

"Chrome's JavaScript engine, V8, will include mitigations starting with Chrome 64, which will be released on or around January 23rd 2018. Future Chrome releases will include additional mitigations and hardening measures which will further reduce the impact of this class of attack. The mitigations may incur a performance penalty."

Chrome 64 will be hitting stable this month, which means that it ought to be possible to benchmark the performance penalty via testing in Chrome beta. Anybody tried yet?

dannyw8y ago

The mitigations are to disable SharedArrayBuffer and severely round performance.now(). Not good that there aren’t other less intrusive ways to mitigate.

6 more replies

simion3148y ago

From the article it seems that is not 100% sure AMD and ARM are not affected by metldown, only that they could not trigger the issue, but authors mention this

"However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed."

jhallenworld8y ago

I would think that any sane implementation would not transmit privileged data to waiting instructions.

Look at their Listing 2: Instructions 5 - 7 will be waiting for the privileged data from line 4 (they are not speculatively executed since they have a data dependency on line 4).

So why is Intel releasing the privileged data to the waiting instructions? An answer could be that violation checking is delayed until retire, but other implementations are possible.

Anyway, so it could be that AMD and ARM are vulnerable, but it's possible that they are not.

2 more replies

asveikau8y ago

> Mitigation is to insert mfence instructions throughout jit generated sandboxed code making it very slow, ugh. Otherwise assume that the entire process with jit generated code is open to reading by that code.

It seems like keeping untrusted code in a separate address space would be a suitable workaround? A lot of comments here seem to be implying that meltdown-style reading of separate address spaces is possible via Spectre, and my read is that it wouldn't.

1 more reply

Certhas8y ago

After skimming the articles it sounds like a lot hinges on just how hard Spectre is to pull off in practice/in the wild. Anyone have any insights on that?

labster8y ago

I don't know much about this particular flaw, but I imagine it'll be pretty hard until someone releases an exploit kit, and then pretty easy after that.

1 more reply

user59944618y ago

They say they can reliably read memory around 120kB/s with one vulnerability and 1kB/s with the other. It just works, all the time. Some of the PoC takes a few minutes to initialize.

I'd say difficulty level is easy.

2 more replies

FranOntanaya8y ago

It doesn't look that hard, and in any case, we will suffer the mitigation consequences either way.

1 more reply

olliej8y ago

it isn't. There have been a few PoC referenced on twitter, and the spectre paper itself reference a PoC in browser hosted javascript (e.g. random ad can scan memory)
it's obviously not a free + zero time activity, but I'm going to assume someone making an ad to scan memory isn't super concerned about end user cpu usage or battery life..

binarycrusader8y ago

"Spectre" is very bad news and affects all modern CPUs

It's not yet clear whether it affects all modern CPUs, notably I have yet to see any mention of modern POWER/MIPS/SPARC-based designs. If someone has pointers, those particular cases would probably be quite interesting.

2 more replies

ErikAugust8y ago

I've thrown the C code in the Spectre paper up if anyone wants to feel the magic: https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9...

dpwm8y ago

Just tested this on systems of varying age.

Works on processors going back as far as 2007 (the oldest I have access to now is an Athlon 64 X2 6000+), but the example code relies on an instruction that the Atom D510 does not suport.

Because Spectre seems to be an intrinsic problem with out-of-order execution, which is almost as old as the FDIV bug in intel processors, I would be very surprised if the Atom D510 did not turn out to be susceptible using other methods as outlined in the paper.

EDIT: I originally suspected this instruction was CLFLUSH and erroneously claimed the D510 doesn't support sse2. It does support sse2, so it must be that it does not support the RDTSCP instruction used for timing.

EDIT: This gets very interesting. I made some modifications to use a CPUID followed by RDTSC, which now runs without illegal instructions and works everywhere the previous version worked. Except on the D510, this runs but I cannot get the leak to happen despite exploring values of CACHE_HIT_THRESHOLD. Could the Atom D510 really be immune from Spectre?

2 more replies

ojiikun8y ago

Thanks for this. Would love an annotated version of this if anyone is up for it. My C is pretty good, but some high level "what is being done here" and "this is what shouldn't work" comments would be cool to see.

JetSpiegel8y ago

Worked on a Intel(R) Celeron(R) CPU 847 @ 1.10GHz.

With "gcc (GCC) 7.2.1 20171128", remove the parenthesis from CACHE_HIT_THRESHOLD macro[1] to compile correctly.

[1]: https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9...

airesQ8y ago

Worked on my system.

(Set cache_hit_threshold to the default value of 80, my cpu is an Intel i7-6700k.)

diyseguy8y ago

I thought it was supposed to be exploitable by javascript? If you can get to the machine and run c code, well, that doesn't seem like an exploit?

7 more replies

leni5368y ago

Work's on Intel® Core™2 Duo Processor T7200. Had to replace __rdtscp(&junk) with __rdtsc(), the Core 2 doesn't have the former.

sqldba8y ago

Works on MacOS 10.13.2.

Looking back, the Mac patches were to address KPTI (Meltdown) which is separate to Spectre.

bspammer8y ago

Terrifyingly, this seems to work on a DigitalOcean droplet. I'm assuming this means people could potentially read memory from other VMs on the same system, albeit with a great deal of difficulty.

diyseguy8y ago

So... it reads a string that was declared at the top of the file?

theptip8y ago

I think this means we should consider all browser processes to be completely insecure, until mitigations are applied (e.g. Chrome's Site Isolation: https://www.chromium.org/Home/chromium-security/ssca).

Looks like any session token/state could be exfiltrated from your Gmail tab to a malicious JS app running in-process, for example.

Am I overreacting here?

kibwen8y ago

> Am I overreacting here?

Still skimming the paper, but the JS attack appears to be processor-intensive (please chime in if you interpret it differently!). Any widespread, indiscriminate use of such an attack in the wild seems like it would eventually be detected as surely as client-side cryptocurrency mining was discovered. If you aren't a valuable target, if you don't visit sites that are shady enough to discreetly mine bitcoin in your browser, and if you use an adblocker to defang rogue advertisers, then you probably shouldn't lose too much sleep over this (which is not intended to diminish how awesome (in the biblical sense) this attack is).

That said, if there were ever a time to consider installing NoScript, now's it: https://addons.mozilla.org/en-US/firefox/addon/noscript/

jwilk8y ago

And if you're a web developer, now it the good moment to make sure your site works correctly when JS is disabled.

2 more replies

voidmain8y ago

It seems like practical attacks rely on having a reasonably precise timer available. The spectre paper uses SharedArrayBuffer to synthesize a timer, which is a recent and obscure feature:

https://groups.google.com/a/chromium.org/forum/#!topic/blink...

https://groups.google.com/forum/#!topic/mozilla.dev.platform...

Chrome and Firefox's "intent to ship" posts both contain claims to the effect that there probably aren't any really serious timing channel attacks, which... seems to have been disproved. Why isn't SharedArrayBuffer already being disabled as a stopgap? I think users can turn it off in firefox, how about Chrome?

4 more replies

drdrey8y ago

Doesn't each tab run in a separate process?

2 more replies

InclinedPlane8y ago

This is so incredibly bad. Spectre is basically unpatchable. We can do better than we are now with patches but it's all just turd polishing, essentially. A proper fix will require new CPU hardware. And as a kicker? Leaks are basically undetectable.

bonzini8y ago

New CPU microcode is enough, though at a performance price. On pre-Zen AMD there is also a chicken bit to disable indirect branch prediction. (It feels good to be finally able to speak about this freely!!!)

I don't know for which processors Intel and AMD plan to release microcode updates.

1 more reply

minxomat8y ago

Getting flashbacks of brainsmoke's JS PoC: https://youtu.be/ewe3-mUku94?t=1766

Edit: Also, PoCs for unpatched Windows by pwnallthethings: https://github.com/turbo/KPTI-PoC-Collection

FLUX-YOU8y ago

I do also wonder if some speculative prediction / branching stuff can be controlled through undocumented CPU instructions: https://www.youtube.com/watch?v=KrksBdWcZgQ

waz0wski8y ago

rightfully so:

https://twitter.com/brainsmoke/status/948561799875502080

rsync8y ago

"As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs"

I am not sure what "the process in which it runs" means here ... do they mean private memory from within chrome ? Or within the child process spawned from chrome, or within the spawned JS sandbox or ... what ?

Practically speaking, I worry about a browser pageview that can read memory from my terminal process. Or from my 'screen' or 'sshd' process.

I think that is not a risk here, yes ?

marzell8y ago

Will it be nontrivial to detect or at least identify these types of exploits as they occur in the wild? Can protection software see these when they happen, assuming a best case scenario where the attack is carried out but doesn't specifically use these methods to hide or disable detection? Is there a general sense yet of whether this exploit is already being leveraged?

pogba1018y ago

Thanks for the links. As an undergrad with limited knowledge of this subject, I would love to see these annotated on Fermat's Library (https://fermatslibrary.com)

guftagu8y ago

Guys, can't we just detect a program doing spectre-like behavior and just kill it instead of having every other application suffer a performance hit by the proposed changes? Antivirus software already does similar stuff

mrmondo8y ago

"AMD chips are affected by some but not all of the vulnerabilities. AMD said that there is a "near zero risk to AMD processors at this time." British chipmaker ARM told news site Axios prior to this report that some of its processors, including its Cortex-A chips, are affected."

- http://www.zdnet.com/article/security-flaws-affect-every-int...

* Edit:

From https://meltdownattack.com/

Which systems are affected by Meltdown?

"Desktop, Laptop, and Cloud computers may be affected by Meltdown. More technically, every Intel processor which implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium and Intel Atom before 2013). We successfully tested Meltdown on Intel processor generations released as early as 2011. Currently, we have only verified Meltdown on Intel processors. At the moment, it is unclear whether ARM and AMD processors are also affected by Meltdown.

Which systems are affected by Spectre?

Almost every system is affected by Spectre: Desktops, Laptops, Cloud Servers, as well as Smartphones. More specifically, all modern processors capable of keeping many instructions in flight are potentially vulnerable. In particular, we have verified Spectre on Intel, AMD, and ARM processors."

MikeHolman8y ago

Looks like everyone is vulnerable to arbitrary user memory reads, while Intel and ARM are vulnerable to arbitrary kernel memory reads as well.

mrmondo8y ago

Thanks for clarifying Mike, should be interesting to see how this actually pans out.

comex8y ago

Not just user memory reads. AMD CPUs won’t speculate loads from userland code directly to kernel memory, ignoring privilege checks (“Meltdown”). But they are still subject to the “Spectre” attack, which can disclose kernel memory by taking advantage of certain code patterns (which normally would be harmless) in kernel code.

1 more reply

buryat8y ago

That article links a commit [1] that contradicts this statement

> AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

And Axios [2] that Zdnet quotes gave a comment from AMD:

> "To be clear, the security research team identified three variants targeting speculative execution. The threat and the response to the three variants differ by microprocessor company, and AMD is not susceptible to all three variants. Due to differences in AMD's architecture, we believe there is a near zero risk to AMD processors at this time. We expect the security research to be published later today and will provide further updates at that time."

And a comment from ARM: > Please note that our Cortex-M processors, which are pervasive in low-power, connected IoT devices, are not impacted.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/...

[2] https://www.axios.com/how-the-giants-of-tech-are-dealing-wit...

ethbro8y ago

My read is that vulnerable processors generally have to:

1. Have out of order execution

2. Have aggressive speculative memory load / caching behavior

3. Be able to speculatively cache memory not owned by the current process (either kernel or otherwise)

4. Have deterministic ways of triggering a speculative load / read to the same memory location

2 is probably the saving grace in ARM / low power land, given they don't have the power budget to trade speculative loads for performance (in the event they're even out of order in the first place).

Caveat: I'm drinking pretty strong Belgian beer while reading through these papers.

Pyxl1018y ago

How does that pertain to the vulnerabilities that involve eBPF? My understanding is that eBPF code executes within the kernel, and so would run at the same privilege level.

mtgx8y ago

"Intel suffers a Meltdown" should be an apt headline for tomorrow's headlines.

ece8y ago

Another good article: https://www.theregister.co.uk/2018/01/02/intel_cpu_design_fl...

"AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."

michaelermer8y ago

Google's post is newer and has more insights, the registry article is now outdated.

ece8y ago

The register has more details. Worth a read.

1 more reply

mrmondo8y ago

Yeah that's a good one, hope they keep it updated / link to new info / posts as they come out.

endymi0n8y ago

Hard to find a good spot for this, but: Thanks to anyone involved! From grasping the magnitude of this vulnerability to coordinating it with all major OS vendors, including Open Source ones that do all of their stuff more or less „in the open“, it was almost a miracle that the flaw was leaked „only“ a few days before the embargo - and we‘ll all have patches to protect our infrastructure just in time.

Interestingly, it also put the LKML developers into an ethical grey zone, as they had to deceive the public the patch was actually fixing something else (they did a good and right thing there IMHO).

Despite all the slight problems along the way, kudos to any of the White Hats dealing with this mess over the last months and handling it super graceful!

pacavaca8y ago

Consider how many other of such "gray" patches could already be in the kernel ;)

1 more reply

tonmoy8y ago

I'm not that savvy with security so I need a little help understanding this. According to the google security blog:

> Google Chrome

> Some user or customer action needed. More information here (https://support.google.com/faqs/answer/7622138#chrome).

And the "here" link says:

>Google Chrome Browser

>Current stable versions of Chrome include an optional feature called Site Isolation which can be enabled to provide mitigation by isolating websites into separate address spaces. Learn more about Site Isolation and how to take action to enable it.

>Chrome 64, due to be released on January 23, will contain mitigations to protect against exploitation.

>Additional mitigations are planned for future versions of Chrome. Learn more about Chrome's response.

>Desktop (all platforms), Chrome 63:

> Full Site Isolation can be turned on by enabling a flag found at chrome://flags/#enable-site-per-process. > Enterprise policies are available to turn on Site Isolation for all sites, or just those in a specified list. Learn more about Site Isolation by policy.

Does that mean if I don't enable this feature using chrome://flags and tell my grandma to do this complicated procedure I (or she) will be susceptible to getting our passwords stolen?

013a8y ago

It probably means if you want mitigations right now, you can flip that flag. Otherwise wait for Chrome to auto-update with new versions that have mitigations enabled by default.

mintplant8y ago

Would I be correct in assuming a browser-level mitigation isn't necessary if you're running a patched OS?

bpye8y ago

The OS patch stops you reading kernel space from user space trivially (ie. without eBPF in the Project Zero example). You can still cause leakage from the same context, for example, the V8 JIT can read all of the processes memory, without site isolation that can include data on other web pages, passwords, cookies, etc.

londons_explore8y ago

Your OS needs patching, as do any programs which handle secret stuff like passwords, cookies, or tokens and interact with the internet (ie. web browsers).

tonfa8y ago

Wasn't there a PoC for a second issue of js reading memory from its own process? Could potentially be an issue (eg reading data from another website)

olliej8y ago

tytso8y ago

From a recently posted patch set:

Subject: Avoid speculative indirect calls in kernel

Any speculative indirect calls in the kernel can be tricked to execute any kernel code, which may allow side channel attacks that can leak arbitrary kernel data.

So we want to avoid speculative indirect calls in the kernel.

There's a special code sequence called a retpoline that can do indirect calls without speculation. We use a new compiler option -mindirect-branch=thunk-extern (gcc patch will be released separately) to recompile the kernel with this new sequence.

We also patch all the assembler code in the kernel to use the new sequence.

khc8y ago

Link?

language8y ago

Text and patch start here: https://lkml.org/lkml/2018/1/3/780

Also, see Linus' response here: https://lkml.org/lkml/2018/1/3/797

2 more replies

nlh8y ago

"Before the issues described here were publicly disclosed, Daniel Gruss, Moritz Lipp, Yuval Yarom, Paul Kocher, Daniel Genkin, Michael Schwarz, Mike Hamburg, Stefan Mangard, Thomas Prescher and Werner Haas also reported them; their [writeups/blogposts/paper drafts] are at"

Does anyone have any color/details on how this came to be? A major fundamental flaw exists that affects all chips for ~10 years, and multiple independent groups discovered them roughly around the same time this past summer?

My hunch is that someone published some sort of speculative paper / gave a talk ("this flaw could exist in theory") and then everyone was off to the races.

But would be curious if anyone knows the real version?

sprkyco8y ago

https://cyber.wtf/2017/07/28/negative-result-reading-kernel-...

Failed attempt in July which is being attributed as earliest work via https://twitter.com/lavados/status/948700783259811847

tdullien8y ago

Jann Horn's results & report pre-date the blog post though. The topic was "ripe", so to speak, so multiple parties investigated it at roughly the same time.

ehsankia8y ago

Yeah, the blog post says they knew since June 2017, with that blog post being from July.

> This initial report did not contain any information about variant 3. We had discussed whether direct reads from kernel memory could work, but thought that it was unlikely. We later tested and reported variant 3 prior to the publication of Anders Fogh's work at https://cyber.wtf/2017/07/28/negative-result-reading-kernel-....

gsnedders8y ago

AIUI, Anders Fogh has collaborated with people at TU Graz on various occasions previously: I'd assume they already knew about his work prior to the blog post.

y04nn8y ago

There was a paper "A Javascript Side-Channel Attack on LLC" in 2015 which seem similar to me, maybe it drove some research toward timing/caching mechanism at the CPU level and its exploitation with a 'side channel' attack.

agumonkey8y ago

Considering it includes most cpus from the last decade (or even last two), shouldn't they delay it a little bit longer so that not only cloud businesses but also more mainstream companies get the time to deploy patches and tests ?

rarudduck8y ago

Azure's response: https://azure.microsoft.com/en-us/blog/securing-azure-custom...

This part is interesting considering the performance concerns:

"The majority of Azure customers should not see a noticeable performance impact with this update. We’ve worked to optimize the CPU and disk I/O path and are not seeing noticeable performance impact after the fix has been applied. A small set of customers may experience some networking performance impact. This can be addressed by turning on Azure Accelerated Networking (Windows, Linux), which is a free capability available to all Azure customers."

boulos8y ago

Disclosure: I work on Google Cloud.

If you run a multitenant workload on a linux system (say you're a PaaS or even just hosting a bunch of WordPress side by side) you should update your kernel as soon as is reasonable. While VM to VM attacks are patched, I'm sure lots of folks are running untrusted code side by side and need to self patch. This is why our docs point this out for say GKE: we can't be sure you're running single tenant, so we're not promising you there's no work to do. Update your OSes people!

mike_hearn8y ago

No offence intended as I'm sure it's a bit of a madhouse there right now, but is your statement really correct? I read the Spectre paper quite carefully and it appears to be unpatchable. Although the Meltdown paper is the one that conclusively demonstrated user->kernel and vm->vm reads with a PoC, and Spectre "only" demonstrated user->user reads, the Spectre paper clearly shows that any read type should be possible as long as the right sort of gadgets can be found. There seems no particular reason why cross-VM reads shouldn't be possible using the Spectre techniques and the paper says as much here:

For example, if a processor prevents speculative execution of instructions in user processes from accessing kernel memory, the attack will still work.

and

Kernel mode testing has not been performed, but the combination of address truncation/hashing in the history matching and trainability via jumps to illegal destinations suggest that attacks against kernel mode may be possible. The effect on other kinds of jumps, such as interrupts and interrupt returns, is also unknown

There doesn't seem to be any reason to believe VM to VM attacks are either patched nor patchable.

My question to you, which I realise you may be unable to answer - how much does truly dedicated hardware on GCE cost? No co-tenants at all except maybe Google controlled code. Do you even offer it at all? I wasn't able to find much discussion based on a 10 second search.

2 more replies

willsr8y ago

Interesting that they left it this late.

boulos8y ago

Disclosure: I work on Google Cloud.

Like the AWS reboots, people will notice. So in the interest of the embargo, both Azure and AWS waited to update as late as they felt was safe. Since we do live migrations and host kernel updates all the time, nobody noticed us :).

webaholic8y ago

Someone correct me if I understood this wrong. The way they are exploiting speculative execution is to load values from memory regions which they don't have permission to a cache line, and when the speculation is found to be false, the processor does not undo the write to the cache line?

The question is, how is the speculative write going to the cache in the first place? Only retired instructions should be able to modify cache lines AFAIK. What am I missing?

Edit: Figured it out. The speculatively accessed memory value is used to compute the address of a load from a memory location which the attacker has access to. Once the mis-speculation is detected, the attacker will time accesses to the memory which was speculatively loaded and figure out what the secret key is. Brilliant!

violinist8y ago

Important to note that at this point they're only reading one bit at a time from kernel memory, but it could probably be changed to read more--exactly how many branches it could compare before the mis-speculation is detected is not discussed, and that could be an area for large speedups in the attack.

adriancooney8y ago

Wow, what a find for the Project Zero team. This team and idea can only be described as a success, well done.

chrisb8y ago

"These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running them."

Curious. All other reports I've read state that AMD CPUs are not vulnerable.

MBCook8y ago

See the Twitter thread here: https://twitter.com/nicoleperlroth/status/948678006859591682

(Edit: there are 9 posts total, go to her user page to see them all)

Seems there are two issues. One, called Meltdown, only effects Intel and is REALLY bad, but the kernel page table changes everyone is making fixes it.

The other, dubbed Spectre, is apparently common to the way all processors handle speculative execution and is unfixable without new hardware.

I’d like to know more about that but I haven’t seen anything yet.

Whoever discovered this stuff on Google’s team deserves some sort of computer security Nobel prize.

Cyph0n8y ago

That's not even close to a thread...

You can see all the tweets here (courtesy of @svenluijten): https://twitter.com/i/moments/948681915485351938.

taurath8y ago

After reading that thread, I sort of wonder if this is the catalyst for the next tech bust. Prices on the basic building block of the modern tech industry (a server shard) going up 30%, or even more as shared/virtual services must be decommissioned for isolation? Surely it’s an alarmist thing to think and I don’t think it’s likely, but if you asked me yesterday the likeihood of an underlying security vulnerability effecting every processor since 1995 I’d have said probably not.

Major props to the teams working on this... now time for us all to hold onto our pants as we ask for budget increases that will make shareholders demand blood.

1 more reply

PuffinBlue8y ago

The linked thread suggests that Spectre doesn't have _any_ mitigation.

> The business/economic implications are not clear, since eventually the only way to eradicate the threat posed by Spectre is to swap out hardware.

Is this fully accurate, there's no software mitigation available now?

From [0], the above may be true:

> There is also work to harden software against future exploitation of Spectre, respectively to patch software after exploitation through Spectre .

There is 'work'? No current patch? So Spectre is unpatched?

This point doesn't seem to be being highlighted but appears particularly important.

[0] https://meltdownattack.com/#faq-fix

2 more replies

MBCook8y ago

I didn’t find out there were more than 4 posts until after I made my comment (thus the edit).

Thanks for the handy link.

londons_explore8y ago

I can't really see how it would be fixable even with new hardware.

Speculative execution is fundamental to getting decent performance out of a CPU. Without it you should probably divide your performance expectations by 5 at least.

Rolling back all state rather than just user visible state in the CPU is neigh on impossible. When you evict something from the cache, you delete it. Undeleting is hard. There are also a lot of other non-user-visible bits of state in a CPU.

3 more replies

chainsaw108y ago

> computer security Nobel prize

While they're not as big of a deal AFAIK, we do have the Pwnie Awards: https://pwnies.com/

thrill8y ago

We should have an annual vulnerability/amelioration award (the Cerberus?) and give one to those guys.

azurezyq8y ago

You can find the details below. They've tried AMD CPUs also.

https://googleprojectzero.blogspot.com/2018/01/reading-privi...

forgot-my-pw8y ago

"We reported this issue to Intel, AMD and ARM on 2017-06-01"

What!

cesarb8y ago

You know it's a bad one when Project Zero allows more than its usual 90-day deadline...

2 more replies

static_noise8y ago

How much in advance do the intel managers have to register a stock sell?

3 more replies

noncoml8y ago

We won't know until we have the full details. From the linux patches it looked like that AMD x86-64 processors were not affected.

But the sentence you quote adds AMD back into play. Maybe some of its ARM processors? e.g. AMD Opteron A1100?

MBCook8y ago

They weren’t effected by the really bad Intel big names Meltdown. They’re still susceptible to Spectre.

btilly8y ago

And yet it also says that AMD devices running Android are not vulnerable.

I'd be curious how those two statements should be reconciled.

userbinator8y ago

Not exactly; it says "we are unaware of any successful reproduction of this vulnerability that would allow unauthorized information disclosure on ARM-based Android devices."

nine_k8y ago

Links to descriptions of similar vulnerabilities in AMD and ARM processors would be very welcome.

lettergram8y ago

Here's a list of what google tested:

Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (called "Intel Haswell Xeon CPU" in the rest of this document)

AMD FX(tm)-8320 Eight-Core Processor (called "AMD FX CPU" in the rest of this document)

AMD PRO A8-9600 R7, 10 COMPUTE CORES 4C+6G (called "AMD PRO CPU" in the rest of this document)

An ARM Cortex A57 core of a Google Nexus 5x phone [6] (called "ARM Cortex A57" in the rest of this document)

https://googleprojectzero.blogspot.com/2018/01/reading-privi...

digikata8y ago

So there's a bit of an unknown if AMD's most recent generation of processor has the Spectre vulnerability?

2 more replies

wintom8y ago

Sounds like maybe SOME ARM and SOME AMD are implicated, especially since the Android ARM CPUs appear to be fine...

revelation8y ago

There aren't really any special Android ARM CPUs, maybe they are confident it doesn't really work on Android because it's very difficult to get the timing precision and low-level assembly sequences in Java/ART compiled code. Though I wonder how that squares up with JNI.

I think the key to the statement is in any case that you need to differentiate between what is possible on the processor architecture level when you have full software control, and what is possible on an operating system level, where 3rd party applications are further restricted in various arbitrary ways such as only allowed to use Java, limited access to high resolution timing primitives, etc. that can make practical exploitation impossible, even if the flaw is present.

It's difficult to reason about because it's hard to tell if you can manipulate a JIT runtime into generating the code you need for the exploit to work - and as the JavaScript implementations show, the answer is often "yes".

mtanski8y ago

JIT engines (and compilers) often generate a familiar instruction patterns. Many JIT engines Target specific languages (like JS) and as result have "simpler" optimizers (less time to do this) and possibly more stable instruction patterns. So my money is on somebody fuzzing the required JS code.

bitmapbrother8y ago

You can develop Android applications in C/C++ using the NDK, thus, giving you full software control if needed.

justincormack8y ago

The ARM white paper goes into detail https://developer.arm.com/-/media/Files/pdf/Cache_Speculatio...

wnevets8y ago

yeah the intel response page is filled with people claiming intel is evil for even mentioning amd

https://news.ycombinator.com/item?id=16064545

tiles8y ago

To be fair, the Intel post alludes to collaborating with AMD/ARM on mitigating Spectre, but userspace memory leaking is wholly separate from kernel memory leaking (Meltdown, which only affects Intel processors).

noncoml8y ago

It's a developing story, but from the information we have so far, it does look like Intel involving AMD is a disingenuous since AMD processors are not affected by the most serious of the issues.

mtkd8y ago

It's too early to say which is ultimately the most real-world serious.

From the Spectre note (which does affect AMD):

In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it.

How quickly are we going to see attacks targeting BTC/ETH wallets, apps etc. on clients and cloud hosted exchanges?

2 more replies

wslh8y ago

Has Google the best security team in the world? It seems like Google security is in a complete different league. I cannot imagine how this impacts companies handling fiat money or cryptocurrencies in the cloud like Coinbase in AWS.

gervase8y ago

Project Zero is very well known for things exactly like this. Partially, it's because they are incredibly talented, but there are also talented people in academia and in other security consultancies. The biggest difference with Project Zero is that their primary [0] goal is altruistic: find vulnerabilities, and let people who can fix them know (vs publishing papers, securing paying clients, auctioning zero-days, etc), in the interests of making the internet and computing as a whole a safer place.

[0] Their secondary goals are to protect Google products and services, and to provide excellent PR in line with what we're discussing right here.

cjbprime8y ago

Worth noting that in this case, many of the authors are in academia. It wasn't solely a Google project.

tonfa8y ago

I read it as it was an independent discovery by project zero and by academia researchers.

1 more reply

ComodoHacker8y ago

>their primary [0] goal is altruistic: ... making the internet and computing as a whole a safer place.

Which is, at the same time, highly rational: to secure their entire market.

It's nice to have big corp's incentives aligned with the public good. Too bad it happens so rarely.

kzrdude8y ago

Hm that's tricky. These awesome findings didn't exactly provide net value for google, not even on the not so short term (next 10 years?). They've created a large problem for Google! :-)

ehsankia8y ago

Yes and no. The cost of some malicious party figuring it out and using it on Google would potentially been far greater than anything this could cost.

oh_sigh8y ago

How much value would be destroyed if black hats discovered the bug first and people started discovering that information from their Google cloud VMs was getting leaked?

Sure, Google could just patch themselves, but the information to recreate the issue would surely be leaked by a xoogler, since it only takes a single sentence describing the vuln for a competent sec team to recreate it

make38y ago

Discovering a vulnerability is not creating it. A vulnerability exists even if it was not publicly disclosed. There were probably other people already exploiting it.

tanilama8y ago

Imagine the law suit if Google fails to detect this....

mason2408y ago

Suppose you have transportaion company that owns 80% of the market share for everything transported on the roads. By car, van, truck, semi, everything.

Suppose your company also has a team that inspects public bridges to make sure they don't collapse.

Is it really altruistic, or given your market share is it a cost of business?

asgioiobuio8y ago

Oh, come on. Their only goal is to make Google money. The fact that they do useful work is a nice side effect, but if they didn't improve Google's security and give good PR there's not a chance in Hell Google would keep them around.

1 more reply

enneff8y ago

I don't know how you would evaluate such a thing as "best security team," but Project Zero certainly attracts a high calibre of security expert. If you're into breaking things, why wouldn't you want to break things with other bright people and the support of a massive corporation?

jacksmith210068y ago

How about based on how many of the serious issues are found by Google. It has been one after another.

Godel_unicode8y ago

They're definitely world class, but they're also loud about it. Consider that other teams perhaps have a different model. For instance, Microsoft's internal team surely finds lots of clever bugs that never get talked about in Microsoft products.

1 more reply

ebikelaw8y ago

I can't imagine better marketing for cloud services than making it as clear as you can that the world is a very dangerous place for computers and if you don't have a crack team of hundreds of battle-hardened security engineers then you have no business hooking your computers up to the internet.

snowwrestler8y ago

The irony here is that a good ol' dedicated hardware web server is far less susceptible to Meltdown or Spectre than Google Cloud, because only your code is running on the CPUs.

I predict tonight's disclosures will lead to an uptick in interest in running websites on dedicated hardware, like we did back at the turn of the century.

1 more reply

jopsen8y ago

Hmm, or maybe it's a reason to go with bare-metal providers :)

walterbell8y ago

If true, how are edge computers going to connect to the cloud?

ebikelaw8y ago

Chromebooks, obv. Integrated marketing strategy!

1 more reply

oh-kumudo8y ago

Cloud services will benefit from this greatly.

Spectre occurs no matter you are in cloud or not, while cloud companies can advertise themselves to help customers proactively to mitigate such risks.

rurban8y ago

Did you forget the Technical University of Graz students who came up with rowhammer and KAISER in the first hand?

wslh8y ago

Not really, they are mentioned in the article. What is fascinating is the number of discoveries by the Google team. The Wikipedia page has a summary of the prominent ones: https://en.wikipedia.org/wiki/Project_Zero_(Google) these are not bugs in obscure pieces of software but on major services and operating systems.

rurban8y ago

Come on, Google completely "forgets" to mention the others, whilst the others do mention Google who detected it independently. And then look who wrote the papers, exploits and patches.

1 more reply

dzhiurgis8y ago

I wonder how much of it relates to their mastery in data science.

I keep wondering if they got some “””AI””” fuzzer that helps them a ton? Plus tons of compute power to spend (remember SHA-256 clash they found “just because”?)

Ar-Curunir8y ago

No collision has been found in SHA-256. SHA-1 is the broken hash function

static_noise8y ago

So, as I gather, one of the main culprits is that unwinding of speculatively executed commands is done incompletely. That is something that the people doing the unwinding must have noticed and known. Somewhere the decision must have been made to unwind incompletely for some reasons (performance/power/cost/time).

As for the difference between AMD and intel. (From other posts here, not this one.) The speculative execution can access arbitrary memory locations on intel processors while this is not possible on AMD. This means that on intel processors you can probe any memory location with only limited privileges.

As for the affected AMD and ARM processors I'm none the wiser. How are they affected? Which models are affected? Does it allow some kind of privilege escalation? The next days will surely stay interesting.

cesarb8y ago

You can't unwind completely. Once the cache is full, to load something on the cache, it has to evict something else. You might be able to evict what you just loaded, but you can't undo the earlier eviction.

static_noise8y ago

Only if your speculative reads do cause irreversible side-effects on those caches. You could implement them in a way that doesn't modify the caches... but that would be complicated and probably use more power and have lower performance.

webaholic8y ago

One of the main reasons for speculative execution is to fetch data into the caches ahead of them being needed. If you don't modify the cache, then you throw that away.

May be one way would be to use a smaller, separate cache for speculative execution and then copy that value to the regular cache once speculation is confirmed? This would add a one cycle latency for cache-to-cache transfer but there might be better ways.

3 more replies

gmueckl8y ago

There does not need to be a performance hit, but cache complexity must rise: speculative execution must use a separate cache for any data that was fetched speculatively. Only when that branch is truly accepted must that data enter the "real" cache. As long as speculative execution does not go on for too long, these secondary caches can stay really tiny (a handful of cache lines maybe).

2 more replies

richadams8y ago

https://spectreattack.com/

Information site with some more information, and links to papers on the two vulnerabilities, called "Meltdown" and "Spectre" (with logos, of course).

(https://meltdownattack.com/ goes to the same site)

hrpnk8y ago

Both domains were registered on 2017-12-22. Given the planned disclosure on 9th January that Google mentions and MS and others coding patches silently [1], do the early reports [2] of kernel patches, does this mean that due to coding in the open the whole disclosure procedure has been vastly accelerated?

I wonder how the timing relates to New Year and many companies having holidays in CW1.

[1] https://lists.freebsd.org/pipermail/freebsd-security/2018-Ja...

[2] https://news.ycombinator.com/item?id=16046636

_delirium8y ago

Accelerated, but not vastly. Google's post says "We reported this issue to Intel, AMD and ARM on 2017-06-01", so the embargo still ended up holding for 7 months, even with it ending a week early. The domain registration dates of 2017-12-22 seem to be just when Google started to prepare for releasing the publicity materials, not when the vulnerability was discovered.

blattimwind8y ago

The Google Security Blog post actually says that the open development did not cause the early breakdown of the embargo in the last 1-2 hours, but

> We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming.

olliej8y ago

The problem isn't "it's not bought forward by that much relatively" in as much as you have an agreed timeline to have coordinated patches (e.g so one org doesn't push a fix before other orgs have). So if you have a bunch of orgs set up to do a release on day X, and then publish on X-[whatever] then you are effectively zero-daying.

Is it super important in this case? shrug.

But imagine for the sake of argument there was some undocumented cpu behaviour "if instruction x,y,z are executed in that order with these constants then catch fire", then having anyone pre-empt the agreed update time could be bad.

rofex8y ago

Sorry to be daft, but hasn't the Google Zero team jumped the gun on the coordinated disclosure date by publishing their blog post 6 days in advance?

1 more reply

partiallypro8y ago

I feel like the Meltdown logo was done by a real designer, and Spectre was designed by a bored developer.

stock_toaster8y ago

From the site:

> Both the Meltdown and Spectre logo are free to use, rights waived via CC0. Logos are designed by Natascha Eibl.

steveklabnik8y ago

It says at the bottom they were both done by the same person.

partiallypro8y ago

That's funny, but also makes me wonder how you get contracted to do logos for things like this. Based strictly on her LinkedIn, she doesn't work for Google. Maybe a friend of someone? Kind of a cool gig though.

1 more reply

tarruda8y ago

It seems that Richard Stallman is not so paranoid after all:

> I am careful in how I use the Internet.

> I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I usually fetch web pages from other sites by sending mail to a program (see https://git.savannah.gnu.org/git/womb/hacks.git) that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (using konqueror, which won't fetch from other sites in such a situation).

Ref: https://stallman.org/stallman-computing.html

1 more reply

debt8y ago

Speculative execution seems like something that would be very intuitively insecure even to a layperson(relative to the field of course).

I'm wondering, was this vulnerability theorized first and later found out to be an actual vulnerability? Or was this something that nobody had any clue about?

I'm only saying this, because from a security perspective, I imagine somewhere at some point very early on someone had to have pointed out the potential for something like speculative execution to eventually cause security problems.

I just don't understand how chip designers assumed speculative execution wouldn't eventually cause security problems. Is it because chip designers were prioritizing performance above security?

baybal28y ago

Long time ago, in a far away office park there was an intense discussion over so called "flag early" and "flag after" designs.

Flag early camp argument was - protected pages should not be allowed to be fetched to begin with by any insecure execution flow, and we need to pagefault before speculative execution

The "flag after" camp was all for post-factum pagefaulting when the branch has finished execution, so you do not need to pagefault for every branch, and only do it for the branch that has "won"

Chip design magazines from nineties has all that well covered.

mark-r8y ago

Speculative execution isn't supposed to leak information; if the speculative instructions aren't supposed to execute, all traces of them should be rolled back. I'd be curious to see what the details of this bug really are. I'm not sure how much will be disclosed in the interests of keeping exploits from popping up.

dboreham8y ago

"all traces" includes timing differences in execution of non-privileged code, which it turns out are not rolled back.

blattimwind8y ago

Or side-effects by loading data into the cache hierarchy.

djsumdog8y ago

According to this comment, it has been theorized for quite some time:

https://news.ycombinator.com/item?id=16066165

With this particular computer scientist, who talked about this problem before, referenced in Google's paper:

http://www.cs.binghamton.edu/~dima/

im3w1l8y ago

https://news.ycombinator.com/item?id=14988652

debt8y ago

Incredible that in this day and age that chip designers do not prioritize performance over security.

userbinator8y ago

I think the chip designers never thought of security much:

https://news.ycombinator.com/item?id=16062223

It's a similar situation to other timing attacks, which have been around practically as long as caches.

im3w1l8y ago

I don't think this is the last we have seen of side-channels, it's just a ridicolously hard problem to get right. And for that reason I can't feel too angry at the procesor makers.

And I certainly expect to see more things like this (but at least hopefully with lower bandwidth).

anonfunction8y ago

AMD put out an announcement:

https://www.amd.com/en/corporate/speculative-execution

aeleos8y ago

Wow so intel comes and says what is all the panic about there is nothing wrong (despite knowing this) and then amazon drops the we are updating everything right now bomb and then google drops the mother of all cpu bugs. In a previous thread someone was asking if it really is all that bad and at this point I think it’s safe to say that yea, it is.

partiallypro8y ago

So, is AMD effected or not? This seems fairly important. The Google blog post sort of goes against itself in this regard. AMD itself has said:

"The threat and the response to the three variants differ by microprocessor company, and AMD is not susceptible to all three variants. Due to differences in AMD's architecture, we believe there is a near zero risk to AMD processors at this time."

So either AMD is lying or Google's blog post is wrong. Granted AMD's statement is a bit muddled, not sure if they mean they aren't susceptible to all THREE variants (as in only 1/3) or they aren't susceptible to ALL three variants (as in none of them.)

zAy0LfpBZLC8mAC8y ago

As almost everyone seems to be getting this wrong: It's "affected", not "effected". "effected" means roughly "caused", while "affected" means roughly "influenced".

caf8y ago

AMD is saying that:

  !(susceptible_v1 && susceptible_v2 && susceptible_v3)

They are not saying that:

  !susceptible_v1 && !susceptible_v2 && !susceptible_v3

(the latter would be rendered in English as: "AMD is not susceptible to any of the the three variants")

Havoc8y ago

You've successfully made it less clear.

There is a nice table on AMD's website though:

https://www.amd.com/en/corporate/speculative-execution

AsyncAwait8y ago

It seems like it is not affected by the most serious bug, but may be by a lesser one.

partiallypro8y ago

That's what I'm thinking, effected by Spectre, but not by Meltdown. But more clarity would be appreciated on Google and AMD's front. I mean from a pure PR angle, AMD has a lot to gain if they can clear the air more.

olliej8y ago

it is effected by spectre, but not by the other.

guardiangod8y ago

From the Spectre paper-

We have empirically verified the vulnerability of several Intel processors to Spectre attacks, including Ivy Bridge, aswell and Skylake based processors. We have also verified the attack’s applicability to AMD Ryzen CPUs. Finally, we have also successfully mounted Spectre attacks on several Samsung and Qualcomm processors (which use an ARM architecture) found in popular mobile phones.

So in other word, the researcher haven't tried it on AMD processors, but they think the attack would work. AMD, on the other hand, is saying the attack won't work.

Frankly, I believe in PoC||GTFO, so AMD is safe in my book for now.

civilitty8y ago

Spectre and Meltdown are two different exploits. In this case, the paper is talking about Spectre and they did have a POC that worked on all three major processor families (except for deterministic execution engines in microcontrollers and low end CPUs from ARM as well as a few early Atoms from Intel). AMD is saying that Meltdown, which at first glance seems like the most serious one, doesn't effect their processors.

1 more reply

mike_hearn8y ago

That seems like a rather fragile interpretation of the statement. I had interpreted it to mean they tried the attack on Ryzen and it worked. Given the general nature of their technique why would it not work on AMD chips?

adrianpike8y ago

Can someone with a little more experience this low-level let me know if this is as bad as I think it is?

Because this looks real bad:

> Reading host memory from a KVM guest

trevyn8y ago

"We wrote a JavaScript program that successfully reads data from the address space of the browser process running it."

Yeah, it's pretty bad.

madez8y ago

A perfect occasion to invite others into my current exercise of using the web without JavaScript.

userbinator8y ago

...and for those of us who leave JS off by default except for a few very trusted sites, the bar for turning on JS on a site that asks to just went up a lot higher.

VikingCoder8y ago

http://spectreattack.com

bloorp8y ago

So is speculative execution just inherently flawed like this, or can we expect chips in 2 years that let operating systems go back to the old TLB behavior?

AndrewBissell8y ago

Yeah I was wondering this myself. Even if there's some fiddly hardware fix to make speculative execution secure, how much of its performance gains will we have to give up to get there?

mtanski8y ago

Speculative execution as a concept should not be flawed. My take is that the results of illegal speculation should never be leaked in a visable way.

bloorp8y ago

As I read through the meltdown paper, it looks really difficult to have the security we want and the performance we want at the same time. It's pretty crazy, but here's my limited understanding:

There's a huge shared buffer between two threads. 256 * 4K. One thread reads a byte of kernel memory, literally any byte it wants, and it then reads one of those 4K pages from that buffer in order to cache that one memory page that corresponds to the byte it just read. Then at some point the CPU determines that the thread shouldn't be permitted to access the kernel memory location, and rolls back all of that speculative execution, but the cached memory page isn't affected by the rollback.

The other thread iterates through those 256 pages, timing how long it takes to read from each page, and the one page that Thread A accessed will have a different (shorter?) timing because it's cached already. It now understands one byte of kernel memory that it shouldn't. That's just one byte but the whole process is so fast that it's easy to just go nuts on the whole kernel address space.

So what would the fixes be? Disable speculative execution? Only do it if the target memory location is within userspace, or within the same space as the executing address? Plug all of the sideband information leak mechanisms? I dunno.

5 more replies

jerf8y ago

I can imagine some ways to armor the branch predictor, similar in principle to how languages like Perl have to include a random seed in their hash code (in some circumstances) to avoid being able to pre-compute values that will all hash to the same thing [1]. There should be some ways to relatively cheaply periodically inject such a randomization into the prediction system enough to prevent that aspect of the attack. This will cost something but probably not be noticeable to even the most performance-sensitive consumers.

But no solution leaps to mind for the problem of preventing speculative code from leaking things via cache, short of entirely preventing speculating code from being able to load things into the cache. If nobody can come up with a solution for that, that's going to cost us something to close that side channel. Not sure what though, without a really thorough profiling run.

And I'd put my metaphorical $5 down on someone finding another side channel from the speculative code; interactions with changing processor flags in a speculative execution, or interaction with some forgotten feature [2] where the speculation ends up incorrectly undone or something.

Yuck.

[1]: https://blog.booking.com/hardening-perls-hash-function.html

[2]: https://www.youtube.com/watch?v=lR0nh-TdpVg - The Memory Sinkhole - Unleashing An X86 Design Flaw Allowing Universal Privilege Escalation (Dec 29, 2015)

2 more replies

rurban8y ago

Only on Intel. Others restrict prefetches on permissions.

I think this might even be fixed by microcode patches on Intel, at least os specific, looking at the first address bit.

richardwhiuk8y ago

If they could have done microcode patches they would have done, suggesting they can't.

1 more reply

blattimwind8y ago

So this confirms the suspicion that the bug allows VM-to-VM disclosure of memory, which would conclusively explain the rush.

rconti8y ago

What are the odds that the NSA already knew about this? Roughly 100%?

InclinedPlane8y ago

This is a toughie. These bugs are basically very difficult to mitigate completely without fixes at the hardware level. One might imagine the NSA being coy and patching their own OS's et al to the degree they can while working to exploit the bug in the wild. However, the reality is that this bug is almost worse for the NSA than for most other folks, because they have the most to lose if their security is breached. And they have a lot of machines out there. The idea of a bug of this severity that leaves no traces is probably leaving a lot of people at the NSA in cold sweats right now. Meaning that if they did discover it before other researchers it's questionable whether they would have tried to exploit it vs. driving towards the most rapid possible mitigation and fix.

user59944618y ago

Pretty close to 100%.

Google zero and academia researchers found it independently, following some talk about the concept a while back.

The 3 letters agencies have people of the same calibre working full time on that. They could find it too.

djsumdog8y ago

I dunno. Potentially. But these are incredibly complicated bugs, which involve timing at the hardware level. There's nothing close to this in the Vault7/8 leaks.

opportune8y ago

If I were a betting man, I'd place my bet on them knowing about this for a long time, and possibly even being behind the bugs' introduction in the first place

zipwitch8y ago

I'm not familiar with the intricacies of CPU design. What are the odds the NSA somehow arranged for these vulnerabilities to exist?

TheAlchemist8y ago

I believe most crypto exchanges are running in the cloud. What could possibly go wrong ?

umanwizard8y ago

I just sold all my altcoins for BTC on Binance as soon as I saw this and transferred them to gdax. Hopefully I can sell them for USD on gdax and transfer to a real bank before they get hacked.

candl8y ago

Why would you do that? If you are concerned for the security of your coins, you should have moved them to a wallet you own that is not hosted on an exchange. The bank you transfer your dollars to is just as likely to get hit by the exact same vurnerability. In addition you have to pay a fee to move your coins, then to wire the dollars to your bank account. Moving from crypto to fiat is also liable to taxation. If the sole goal is to secure your coins then I don't think that the whole process is worth the hassle. Moving them to a private wallet would suffice.

krrrh8y ago

> Moving from crypto to fiat is also liable to taxation

Trading between cryptoassets as GP suggests he has done already makes them subject to taxation. The fiat step isn’t needed.

umanwizard8y ago

Most banks don't use cloud providers AFAIK.

Also, real money transactions are much more likely than blockchain transactions to be reversible if fraudulent.

mwgalloway8y ago

The majority of coins on Coinbase are in cold-storage and crypto on Coinbase is insured against this type of breach. I personally wouldn't panic to get my coins out.

1 more reply

woodson8y ago

at least the dollars are insured ;-)

ateesdalejr8y ago

So, basically CPUs will read instructions inside a branch even if the branch is eventually going to evaluate to false. Does the CPU do this to optimize branch instructions? The results of instructions that are executed ahead of time are stored in a cache. How exactly does this exploit read from the cache? I understand it uses timing somehow but I'm not quite sure exactly how that works. (I mostly do software.)

cesarb8y ago

The cache in question is not something which stores the result of these speculatively executed instructions, but the normal L1-L2-L3 caches we are used to. The result of these instructions is discarded, but as a side effect, they may load something from memory into the cache. The exploit detects whether or not a particular memory address was loaded into the cache (reading from something already in the cache is much faster than reading from the main memory).

ateesdalejr8y ago

Thanks for the helpful answer. :) Things make much more sense to me now.

rocqua8y ago

It's a timing attack against the cache. The speculative execution might need to do a read, which means something would need to be evicted from the cache. This makes a subsequent read against that evicted adres slower.

This way you can detect things based on speculative execution. I don't know how they go from that to reading memory though.

pwg8y ago

> I don't know how they go from that to reading memory though.

That was the second bit of the example source code:

unsigned long index2 = ((value&1)*0x100)+0x200;

This creates one of two different addresses, depending upon the value of bit zero of the memory location being attacked. The two different addresses are farther apart than the size of a cache line.

> unsigned char value2 = arr2->data[index2];

This actually does the read from one of the two different addresses (which results in the value located at one of them becoming resident in cache). Note that the value returned here is a "don't care" item.

Then, after everything unwinds from the speculation, the follow on code on the real path would read from both of the two possible addresses that were put into "index2". The read that returns data faster must have been in cache. Knowing which one was in cache, you now know the value of bit zero of the target address location.

Repeat the same block of code for bits 1-7 and you'll have read a whole byte. Continue and you can read as much as you like. You just gather data very slowly (the article mentioned about 2000 bytes per second).

1 more reply

caf8y ago

You arrange things so that the speculated execution loads from an address you provide (this is the target address you want to read), then uses the result of that load to calculate the address of another load (this one, into a location that aliases in the cache with an address you can load directly yourself).

You can then use cache timing to see which address was read in the second load, which means you can see part of the value that was read in the first load. Rinse, repeat.

The variants mostly amount to differences in how you arrange the first part (speculated execution loading from an address you get to provide).

AndyNemmity8y ago

First implementation I've seen on twitter.

https://twitter.com/pwnallthethings/status/94869396135866777...

cfeeley8y ago

One of the meltdown paper writers evidently has a sense of humor since "hunter2" [0] is one of the passwords they use in their demonstration [1]

[0] http://bash.org/?244321

[1] https://meltdownattack.com/meltdown.pdf (page 13, figure 6)

f2f8y ago

hunter2 is the industry's accepted PoC password.

pit28y ago

wasn't it dolphin?

1 more reply

Havoc8y ago

So what exactly are they going to do about spectre? Seems pretty unstoppable from what I can see.

Can they disable speculative exec completely for sensitive boxes or is this too baked in?

Filligree8y ago

There's no mitigation. We'll need new CPUs.

Meanwhile, don't ever run untrusted code in the same process as any kind of secret. Better yet, don't ever run untrusted code.

hinkley8y ago

I wonder what fraction of data inside a kernel is really ‘private’.

Obviously we want 100% of the data in the kernel not to be writeable, but if only a small amount shouldn’t be accessible at all then maybe the long term solution is to handle that data in a special way. Something that makes using it slower but doesn’t make every other syscall suffer as much as a consequence.

Or maybe the solution is to prioritize moving more and more code into userspace.

2 more replies

Havoc8y ago

>We'll need new CPUs.

I don't think that's an option either.

_qbxp8y ago

Can someone more knowledgeable than me in regards to this vulnerability tell me:

1. How to best protect my local personal data from being subject to this?

2. Whether I should seriously consider pulling all my cryptocurrency off of any exchanges?

avaika8y ago

from my understanding:

- install security updates for your OS - if it's not ready yet: disable JavaScript in your browser by default and enable it only for resources you trust. otherwise just skip the page. execute third party code with extra caution. any suspicious code should go away (even not inside vm)

2: as long as it's stored in a wallet on your own hardware which you fully control, it should be safe enough

iand8y ago

2. Don't ever store large values of cryptocurrency on an exchange. Keep them offline in paper or hardware wallets.

zitterbewegung8y ago

So how much legal liability are they exposed to due to this security flaw?

Since this affects legacy systems that may not be able to be upgraded it seems like this issue will be around for a very long time.

userbinator8y ago

Since this affects legacy systems that may not be able to be upgraded it seems like this issue will be around for a very long time.

It also only affects "legacy systems" which routinely run nontrusted code. If it's something like e.g. a server in a bank, chances are everything running on it has already been accounted for. This isn't like e.g. Heartbleed where you could just connect to any open server and read its memory --- you have to somehow get your code to run on it first.

djsumdog8y ago

Really makes the case against going to the "cloud" (using hosted VM solutions) versus just using colocated servers running VMWare that you fully own and administer.

gldalmaso8y ago

However, since it seems like there is not much anyone can do to identify what is being leaked and what process did it, this does increase the risk that someone might exploit this internally and get away with it.

perennate8y ago

I can't understand this paragraph from [1]:

> Cloud providers which use Intel CPUs and Xen PV as virtualization without having patches applied. Furthermore, cloud providers without real hardware virtualization, relying on containers that share one kernel, such as Docker, LXC, or OpenVZ are affected.

I take it to imply that hypervisors that use hardware virtualization are not affected. However, the PoC that reads host memory from a KVM guest seems to contradict this.

Is it because on Xen HVM, KVM, and similar hypervisors, only kernel pages are mapped in the address space of the VM thread (so a malicious VM cannot read memory of other VMs), but on these other hypervisors, pages from other containers are mapped? Yet the Xen security advisory [2] says:

> Xen guests may be able to infer the contents of arbitrary host memory, including memory assigned to other guests.

Relatedly, what sensitive information other than passwords could appear in the kernel memory? I'd expect that at the very least buffers containing sensitive data pertaining to other VMs may be leaked.

[1] https://meltdownattack.com/ [2] https://xenbits.xen.org/xsa/advisory-254.html

caf8y ago

The kernel memory map generally includes the 'direct map' of all physical memory - so, everything that is resident is potentially at risk.

nickysielicki8y ago

> Meltdown breaks all security assumptions given by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation.

> On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer.

rtpg8y ago

Reading over this.... it sounds like ultimately the exploit in Linux still only works thanks to being able to run stuff in the kernel context through eBPF?

The first section states that even with the branch prediction you still need to be in the same memory context to be able to read other process's memory through this. But eBPF lets you run JIT'd code in the kernel context.

I guess this JITing is also the issue with the web browsers, where you end up getting access to the entire browser process memory.

But ultimately the dangerous code is still code that got a "privilege upgrade"? the packet filter code for eBPF, and the JIT'd JS in the browser exploit?

So if our software _never_ brought user's code into the kernel space, then we would be a bit safer here? For example if eBPF worked in... kernel space, but a different kernel space from the main stuff? And Site Isolation in Chrome?

caf8y ago

No. For that attack, the code that is speculatively executed does need to be in the target context, but that doesn't mean the code has to be attacker-supplied (that just makes it easier).

It's also possible to use existing code in the target context as the speculative execution path if it has the right form (and this is what P0's Variant 2 POC does, in that case by poisoning the branch predictor in order to make it speculatively execute a gadget that has the right form).

krylon8y ago

I should at first point out that I am by no definition an expert on CPU design, operating systems, or infosec.

But I just remembered that years ago the FreeBSD developers discovered a vulnerability in Intel's Hyperthreading that could allow a malicious process to read other processes' memory.[1]

To the degree that I understand what is going on here, that sounds very similar to the way the current vulnerabilities work.

For a while, back then, I was naive enough to think this would be the end of SMT on Intel CPUs, but I was very wrong about that.

So I am wondering - is this just a funny coincidence, or could people have seen this coming back then?

[1] http://www.daemonology.net/hyperthreading-considered-harmful...

makomk8y ago

The ARM whitepaper is also worth a read in terms of how it affects them and mitigations on that platform: https://developer.arm.com/support/security-update

KenoFischer8y ago

I'm really amazed by the simplicity of the meltdown gadget. After the initial blog post I played with a few variants, but always got the zeroed out register in the speculative branch. I guess what people (including me) were looking for here was some other side channel or instruction that did not have this mitigation in place (e.g. I had hoped a cmpxchg would leak whether the target memory address matches the register to compare with). The shl/retry loop makes a lot of sense if you instead assume that the mitigation was implemented improperly and can race subsequent uops. I really can't imagine why this data ever made it to the bypass network to be available to other uops.

alkonaut8y ago

I wonder if the whole thing with enormously complex CPUs requiring deep pipelines which in turn requires complex speculation etc was a design mistake? Is there an alternative history where mainstream CPUs are equally fast with a dumber/simpler design?

JdeBP8y ago

There may well be. One alternative design that could have become the mainstream is VLIW. One of the features of VLIW is the idea of predication. Instead of a branch instruction altering the instruction fetch stream based upon the value of a flag bit in some form of condition code register, instruction syllables are encoded with predicates which simply control whether or not the instruction syllable has any effect, based upon the value of a flag bit in some form of predicate register.

VLIW is not a panacaea, engineering being all about tradeoffs after all. But it was intended to not have the complex instruction dispatching logic, with things like speculative execution and branch prediction, in the processor. Instead, using a process called if-conversion the compiler combines the two possible results of a conditional branch into a single instruction stream where predicates control which instruction syllables are executed.

* http://web.eecs.umich.edu/~mahlke/papers/1996/schlansker_hpl...

* https://www.isi.edu/~youngcho/cse560m/vliw.pdf

* https://www.cse.umich.edu/awards/pdfs/p45-mahlke.pdf

* http://web.eecs.umich.edu/~mahlke/papers/1996/schlansker_hpl...

Observe, in considering this alternative history, that the Itanium had 64 predicate registers. People have, in the past few days in various discussions of this subject, criticized Intel for holding on to a processor design for decades and prioritizing backwards compatibility over cleaner architecture. They have forgotten that Intel actually produced a cleaner architecture, back in the 1990s.

cesarb8y ago

What requires the complex speculation is not the pipeline depth, it's the memory access latency.

Consider the following simple C code: "if (arr[idx]) { ... }". Without speculation, the core must stall until the condition has been read from memory, which can be hundreds of cycles if it's not in the cache. With speculation, these wasted cycles are instead used to do some of the work from most probable side of the branch, so when the condition finally arrives from memory, there's less work left to do.

The pipeline depth only affects what happens when the speculation predicted the wrong way: since the correct way is not on the pipeline, it has to fill the pipeline from scratch.

richardwhiuk8y ago

Not that we currently know about. RISC instead of CISC is better here as it shortens the pipeline, but even RISC processors do speculative predictions due to the cost of waiting till a branch is fully decided.

gpderetta8y ago

The vulnerability window is proportional to the size of the reorder buffer (hundreds of instructions); the pipeline length (tens of stages) is not important (except on strictly in order CPUs with no reorder buffer I guess).

Also modern OoO CISCs and RISCs have very similar pipeline depths for the same performance/power budget.

alkonaut8y ago

What about more radically different designs? E.g Mill or others?

1 more reply

Havoc8y ago

The speed gains are very real so I wouldn't call it a mistake.

Side-effects could obviously been mitigated better, but hindsight 20/20.

intsunny8y ago

Since no one has yet posted Amazon AWS security bulletin:

https://aws.amazon.com/security/security-bulletins/AWS-2018-...

kodablah8y ago

https://github.com/IAIK/meltdown 404's. I assume this is by intention? So full disclosure, but missing the code? Or is it somewhere else?

richardwhiuk8y ago

Due to early embargo lifting, I expect not everything's been publicized yet

bit_logic8y ago

According to the page, Project Zero only tested with AMD Bulldozer CPUs. Why didn't they use something based on Zen/Ryzen? It's not clear if the 3 issues affect Zen/Ryzen or not.

Havoc8y ago

Ryzen is affected by spectre but not meltdown by the looks of it

Unklejoe8y ago

Just an idea that I had:

If these exploits seem rely on taking precise timing measurements (on the order of nanoseconds), could we eliminate or restrict this functionality in user space?

The Spectre exploit uses the RDTSC instruction, and this can apparently be restricted to privilege level 0 by setting the TSD flag in CR4.

I know it would kind of suck, but it might be better than nothing.

I would think that most typical user applications wouldn't require that accurate of a time measurement. If they do, then maybe they can be white listed?

voidmain8y ago

Denying access to timers is kind of practical for browser JavaScript, and should and will happen. But it's not practical for native processes, because shared memory multithreading provides as high precision a timer as anyone could ask for: just increment a counter in a loop in a different thread.

In fact, the practical JavaScript attacks use this method (using SharedArrayBuffer) and the browsers are disabling this (new, little used) feature as a mitigation. But I'm afraid hell will freeze over before mainstream operating systems deny userspace access to clocks, threads, and memory mapped files, which is a lower bound on what it would take to make the attack much harder.

greenleafjacob8y ago

That is the approach Firefox is taking [1]:

> Since this new class of attacks involves measuring precise time intervals, as a partial, short-term, mitigation we are disabling or reducing the precision of several time sources in Firefox.

[1]: https://blog.mozilla.org/security/2018/01/03/mitigations-lan...

geertj8y ago

What is the reason that Intel would allow speculative instructions to bypass the supervisor bit and access arbitrary memory? That seems the root cause for Meltdown.

Is it that the current privilege level could be different between what it is now, and what it will be when the speculative instruction retires? If so then that seems a thin justification. CPL should not change often so it doesn't seem worth it to allow speculative execution for instructions where a higher CPL is required.

humanjvm8y ago

IIUC, these speculative instructions respect the current supervisor bit which was set by the previous faulting instruction.

cmurf8y ago

There are 3 known CVEs related to this issue in combination with Intel, AMD, and ARM architectures. Additional exploits for other architectures are also known to exist. These include IBM System Z, POWER8 (Big Endian and Little Endian), and POWER9 (Little Endian).

https://access.redhat.com/security/vulnerabilities/speculati...

anonu8y ago

How come this wasn't discovered sooner?

It would seem to me that all the really smart people who designed super-scalar processors and all the nifty tricks that CPUs do today - would have thought that these attacks would be in the realm of possibility. If that's the case - who's to say these attacks haven't been used in the wild by sophisticated players for years now?

Seems like the perfect attack. Undetectable. No log traces.

Darthy8y ago

Could somebody please coin a name for this? Wikipedia currently calls it "Intel KPTI flaw", but that is very vague. It's quite difficult to talk about something without a simple easy-to-remember name.

Edit: has been settled, it's https://en.wikipedia.org/wiki/Meltdown_(security_bug) .

frio8y ago

https://spectreattack.com/ :).

drvdevd8y ago

the MEMTHIEF bug

Pyxl1018y ago

Is there any information available about whether the Linux KPTI patch mitigates the ability to use eBPF to read kernel memory?

I'm asking because eBPF seems to execute within the kernel, and KPTI seemed to be about unmapping kernel page table when userspace processes execute.

Are there any mitigations to the eBPF attack vector?

brendangregg8y ago

sysctl -w kernel.unprivileged_bpf_disabled=1

I use eBPF all the time, but I never use it as non-root, so I haven't needed unprivileged bpf anyway.

update: that eBPF vector was already fixed, and another safety measure is already being considered https://lkml.org/lkml/2018/1/3/895

j_coder8y ago

Isn't possible for the kernel to patch all clflush instructions when the software is loaded to keep a circular list of all evicted addresses that would be evicted again on the interrupt that happens when the protected address is read? This way the the timing attack would not be possible.

koverstreet8y ago

self modifying code (which exists) would take a massive performance hit. any time a page is marked +X, the kernel would have to mark it -W, and then on page fault the kernel would have to check if userspace was changing something to a clflush instruction.

oh, and x86 has variable length instructions - the same byte stream can decode as different instructions depending on where you start - so i doubt it's possible at all on x86 without a massive performance hit (you'd have to keep track of every jump instruction in the entire address space...)

j_coder8y ago

You are right.

The best approach is to evict all user space pages from cache when an invalid page access happens if the page fault was caused by the software trying to read/write kernel space pages.

Massive performance hit but only to misbehaved software. Normal software will not have the performance hit of the current solution.

Kernel could even switch to unmapped kernel pages solution if too many read/write attempts.

quotemstr8y ago

clflush only makes the attack easier. There are other ways to flush the cache. Besides: code is mutable. You can just make a clflush instruction out of thin air without the loader's involvement.

j_coder8y ago

For software that requires self-modifying code to run the existing Linux kernel patch would apply (performance penalty). If there is other ways to flush the cache it is necessary to evict the entire software memory on the interrupt.

1 more reply

qaq8y ago

Are extensions like 1password vulnerable do they run in the same process as js from a page?

omeid28y ago

Process is irrelevant for both Meltdown and Spectre.

qaq8y ago

Process is relevant for what you can get to from V8 VM.

ionforce8y ago

Is this saying that AMD is affected? Is this the same as the Intel bug reported earlier?

txcwpalpha8y ago

Yes, AMD is affected. There are multiple vulnerabilities (Project Zero refers to 3 separate "variants", variant 1 and 2 being called "Spectre" and variant 3 being called "Meltdown"). The most serious variant only affects Intel, but can be patched. The other 2 variants affect AMD, ARM, and Intel, and cannot be patched.

See this excerpt from spectreattack.com:

>Which systems are affected by Meltdown?

>Desktop, Laptop, and Cloud computers may be affected by Meltdown. More technically, every Intel processor which implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium and Intel Atom before 2013). We successfully tested Meltdown on Intel processor generations released as early as 2011. Currently, we have only verified Meltdown on Intel processors. At the moment, it is unclear whether ARM and AMD processors are also affected by Meltdown.

>Which systems are affected by Spectre?

>Almost every system is affected by Spectre: Desktops, Laptops, Cloud Servers, as well as Smartphones. More specifically, all modern processors capable of keeping many instructions in flight are potentially vulnerable. In particular, we have verified Spectre on Intel, AMD, and ARM processors.

JL20108y ago

Of the variants of the attack that can leak privileged memory, AMD is only impacted if a non-default kernel configuration is enabled: "BPF JIT"

DannyBee8y ago

This is not correct. That's just what the POC chose to implement.

acoye8y ago

Google security blog says it is.

> These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running them.

https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...

AndyNemmity8y ago

That's unclear, to the point of being factually wrong. Variant 2 and Variant 3 POCs only affect Intel, and those are the ones people are most talking about, and at least to me, the most concerning.

Treating them as a group, ignores the very real differences in effect.

https://googleprojectzero.blogspot.com/2018/01/reading-privi...

cthalupa8y ago

https://meltdownattack.com/meltdown.pdf

>6.4 Limitations on ARM and AMD We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.

Seems like the possibility exists that AMD/ARM could be affected, based on the behavior they saw, but they were not able to successfully verify.

2 more replies

muxator8y ago

From https://meltdownattack.com/meltdown.pdf, page 12:

> Thus, the isolation of containers sharing a kernel can be fully broken using Meltdown.

j_coder8y ago

Looks like the information was somewhat public available since middle of the last year on https://cyber.wtf/2017/07/28/negative-result-reading-kernel-... and http://www.cs.binghamton.edu/%7Edima/micro16.pdf. Also similar methods from 2013 paper http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf (timing side channel attacks).

Any reason for the panic now? Any know malware using it?

jpatokal8y ago

No. This was all scheduled to be released on January 9th, but the release was sped up after people started connecting dots.

We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation.

https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...

j_coder8y ago

I know it was scheduled but the information on the links are public and prior to the scheduled disclosure. A hacker could figure out the problem by reading the available information before the Google Project Zero.

jakozaur8y ago

Juicy PoC exists?

delaaxe8y ago

Can someone show me an example of JavaScript code running in a browser that would display a password stored in kernel space?

Websites like the Guardian report that this is now the case but I don't understand how that's possible.

mdavidn8y ago

The kernel maps itself into the address space of each process as an optimization to increase the performance of system calls. So yes, it is possible.

1 more reply

bung8y ago

Will patches for this eventually trickle down to things like LineageOS?

phoe-krk8y ago

LineageOS is based on official Android sources. The moment the official Android kernel is patched, LineageOS will use the patch.

londons_explore8y ago

I like the way you say moment as if the android kernel is some single thing not a hodgepodge of hundreds of different kernels across tens of companies.

DarronWyke8y ago

Thanks to incidents like these, I'm very happily employed. One of the perks of working in infosec.

I hereby nominate 2018's song to be Billy Joel's We Didn't Start the Fire.

trendia8y ago

Does this vulnerability affect Linux only, or any operating system?

saemil8y ago

The issue is with the chip. So, it should impact any OS running on the chip. This would include Windows as well as MacOS running on Intel chips.

Splendor8y ago

Do we know how news of this got out before the disclosure date?

NelsonMinar8y ago

See this blog post, which is some very informed speculation based on public Linux kernel patch activity. http://pythonsweetness.tumblr.com/post/169166980422/the-myst...

fishywang8y ago

I couldn't find it in the blog post or the Compute Engine Security Bulletin, does anyone know which version of Linux Kernel contains the mitigation?

blattimwind8y ago

4.14.11 is the only stable kernel as of this writing.

mark-r8y ago

Based on a link here yesterday, there was a patch to the Linux kernel and comments associated with it.

evibeefi8y ago

This sounds really bad. I wonder: Will this have major implications on consumers other than slowed down devices?

gruez8y ago

does this mean the embargo is lifted?

ipsin8y ago

https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...

Yes, this explains why it was lifted.

acoye8y ago

ebonassi8y ago

Should we start to think seriously to adopt homomorphic encryption on virtualized environments?

Havoc8y ago

No wonder they were rushing this.

jasonlfunk8y ago

As a side topic, are we really in a place that even vulnerabilities need branding and websites?

simias8y ago

Why not? Those big security vulnerabilities are going to be discussed in years to come, might as well come up with something a little more catchy than CVE-2017-5753. I guess they could've gone with more descriptive names.

At least "spectre" and "meltdown" will be memorable even for non-technical people (who should probably be aware of the issue even if they don't understand the technical details). "Bounds check bypass" and "branch target injection" probably sound like random words stringed together for most people.

Havoc8y ago

Well these are essentially research papers - and people invested lots of time & it'll have an impact on their career.

So yeah making it nice & pretty seems appropriate just like a CV

hollerith8y ago

Thanks again to the geniuses who arranged things so that almost anyone can write code that I must run just so I can use the internet to find and to read public documents

(unless I undergo the tedious process of becoming a noscript user or something similar).

solotronics8y ago

best for now to get your crypto coins off the exchanges if you have them there

swampthinker8y ago

"Testing also showed that an attack running on one virtual machine was able to access the physical memory of the host machine, and through that, gain read-access to the memory of a different virtual machine on the same host."

Holy shit.

thaumaturgy8y ago

We should quote OpenBSD's Theo de Raadt here, all the way back from 2007:

"x86 virtualization is about basically placing another nearly full kernel, full of new bugs, on top of a nasty x86 architecture which barely has correct page protection. Then running your operating system on the other side of this brand new pile of shit."

https://marc.info/?l=openbsd-misc&m=119318909016582&w=2

djsumdog8y ago

Hmm. Is OpenBSD patched for Meltdown? I don't see anything on their main site.

oh-kumudo8y ago

> The infrastructure that runs Compute Engine and isolates customer workloads from each other is protected against known attacks. This also means that customer VMs are protected against known, infrastructure-based attacks from other malicious VMs.

Doesn't Google say that they are protected...?

thesandlord8y ago

This means that Customer A's VM cannot attack Customer B's VM.

However, if the OS inside the VM is unpatched, then code inside the VM can attack other code inside the VM. If for example you install some malware on your VM, it could use this attack.

(I am not a security expert, this is just my understanding and not a official Google statement)

oh-kumudo8y ago

Right. But it means that once the VM is fixed, assuming the customer does this, they are guarded from such attack right?

1 more reply

static_noise8y ago

This basically kills cloud computing for anything sensitive using shared hardware. In the short term this will actually be good for cloud providers because the demand for dedicated instances will shoot up as there is no short-term alternative.

thesandlord8y ago

untog8y ago

The short term answer is to patch the servers and swallow the 30% performance cut. Still likely cheaper than dedicated servers.

djsumdog8y ago

Which could mean huge sales for Intel, or even AMD, if Amazon, DigitalOcean, Linode and others want to rush to get that lost performance back.

Going to AMD would be incredibly expensive as you'd be replacing nearly everything, but if Intel gets new chips out in a reasonable amount of time, they might actually make a killing on this.

panarky8y ago

"Compute Engine customers must update their virtual machine operating systems and applications so that their virtual machines are protected from intra-guest attacks and inter-guest attacks that exploit application-level vulnerabilities."

"Compute Engine customers should work with their operating system provider(s) to download and install the necessary patches."

srcmap8y ago

Main/Big impacts are on the cloud computer.

For home computer, standard office use, there is no impact at this point, right?

static_noise8y ago

Until someone figures out how to exploit it using JavaScript. The speed this moves it could be any minute now.

kodablah8y ago

From spectre.pdf:

> In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it.

(granted I think site isolation, if enabled, mitigates crossing domain boundaries)

It goes on to show a sample JS impl that JITs into the expected insns using V8.

2 more replies

baybal28y ago

Yet another argument against running any native or 1-to-1 bytecode in the browser like WASM

baq8y ago

the big if is whether javascript code that can exploit this can be written. (edit: that's a yes, from the pdf itself...) if yes, nobody's safe, as any webpage (any webpage, even that ad in an iframe) could in theory read your password if it's anywhere in RAM.

djsumdog8y ago

Firefox and Chrome have both started posting mitigation strategies. They're mentioned in other comments, some depending on making time functions less accurate since this is a timing attack.

mdellavo8y ago

Chrome is listed as impacted. People use chrome password managers.

devinl8y ago

Chrome is listed as impacted due to javascript being able to read memory from outside the browser sandbox.

"In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it." - from the spectre paper

alexpersian8y ago

Was this quote removed from the article? I'm no longer able to find it.

baybal28y ago

>running on the host, can read host kernel memory at a rate of around 1500 bytes/second,

I kinda get how it works now. They force a speculative execution to do something with a protected memory address, and then measure the latency to guess the content. They did not found a way to continue execution after a page fault as rumors were.

The fact that speculative execution branch can access protected memory, but not to commit its own computation results to memory in ia32 was known since pentium 3 times.

It was dismissed as "theoretical only" vulnurability without possible practical application. Intel kept saying that for 20 years, but here it is, voila.

The ice broke in 2016 when Dmitry Ponomarev wrote about first practical exploit scenario for this well known ia32 branch prediction artifact. Since then, I believe, quite a few people were trying all and every possible instruction combination for use in timing attack until somebody finally got one that works that was shown behind closed doors.

Edit: google finally added reference to Ponomarev's paper. Here is his page with some other research on the topic http://www.cs.binghamton.edu/~dima/

azurezyq8y ago

link for details for that from Project Zero:

https://googleprojectzero.blogspot.com/2018/01/reading-privi...

sctb8y ago

Thanks! We've merged another thread and updated this link from https://security.googleblog.com/2018/01/todays-cpu-vulnerabi....

AnimalMuppet8y ago

Interesting. Quoting a fair-sized chunk for context:

> So far, there are three known variants of the issue:

> Variant 1: bounds check bypass (CVE-2017-5753) > Variant 2: branch target injection (CVE-2017-5715) > Variant 3: rogue data cache load (CVE-2017-5754)

> During the course of our research, we developed the following proofs of concept (PoCs):

> A PoC that demonstrates the basic principles behind variant 1 in userspace on the tested Intel Haswell Xeon CPU, the AMD FX CPU, the AMD PRO CPU and an ARM Cortex A57 [2]. This PoC only tests for the ability to read data inside mis-speculated execution within the same process, without crossing any privilege boundaries.

> A PoC for variant 1 that, when running with normal user privileges under a modern Linux kernel with a distro-standard config, can perform arbitrary reads in a 4GiB range [3] in kernel virtual memory on the Intel Haswell Xeon CPU. If the kernel's BPF JIT is enabled (non-default configuration), it also works on the AMD PRO CPU. On the Intel Haswell Xeon CPU, kernel virtual memory can be read at a rate of around 2000 bytes per second after around 4 seconds of startup time. [4]

> A PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific (now outdated) version of Debian's distro kernel [5] running on the host, can read host kernel memory at a rate of around 1500 bytes/second, with room for optimization. Before the attack can be performed, some initialization has to be performed that takes roughly between 10 and 30 minutes for a machine with 64GiB of RAM; the needed time should scale roughly linearly with the amount of host RAM. (If 2MB hugepages are available to the guest, the initialization should be much faster, but that hasn't been tested.)

> A PoC for variant 3 that, when running with normal user privileges, can read kernel memory on the Intel Haswell Xeon CPU under some precondition. We believe that this precondition is that the targeted kernel memory is present in the L1D cache.

If I'm reading this right, then the only POC that works against ARM is the first one, which lets you read data within the same process. Not too impressive. (Yes, I know that I'm reading into this that they tried to run all the POCs against all the processors. But the "Tested Processors" section lower down leads me to believe that they did in fact do so.)

The third and fourth POC seem to be Intel-specific.

makomk8y ago

The paper from the other people who discovered this says the same thing: "We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack de- scribed in Section 5, neither on ARM nor on AMD." The general purpose attack that leaks kernel memory, the one that KAISER fixes, only seems to work on Intel CPUs. Intel's press release was misleading.

AnimalMuppet8y ago

Well... reading further, below the details of the third POC, they say "Our research was relatively Haswell-centric so far. It would be interesting to see details e.g. on how the branch prediction of other modern processors works and how well it can be attacked."

So it seems like they tried it on AMD and ARM, but they tried much harder on Intel. That's less reassuring than my initial reading.

nickysielicki8y ago

Way better link, thanks.

feelin_googley8y ago

In 1-2 words, IMO, the problem is "over-optimisation".

It is perhaps beneficial to be using an easily portable OS that can be run on older computers, and a variety of architectures.

Sometimes older computers are resilient against some of todays attacks to the extent those attacks make assumptions about the hardware and software in use. (Same is true for software.)

When optimization reaches a point where it exposes one to attacks like the ones being discussed here, then maybe the question arises whether the optimization is actually a "design defect".

What is the solution?

IMO, having choice is at least part of any solution.

If every user is effectively "forced" to use the same hardware and the same software, perhaps from a single source or small number of sources, then that is beneficial for those sources but, IMO, counter to a real solution for users. Lack of viable alternatives is not beneficial to users.

pjf8y ago

More details at https://googleprojectzero.blogspot.com/2018/01/reading-privi...

masterleep8y ago

I wonder what this sentence in the Google product status page (https://support.google.com/faqs/answer/7622138) means, particularly what the inter-guest attack refers to:

arianvanp8y ago

What I understand is that the hypervisor of GCE has been patched already and so some customer running on the same machine as you can't exploit you. However if you are running KVM or something yourself on a Cloud instance (vm in a VM) then you should patch that.

zzzcpan8y ago

Does anyone know what kind of isolation still can work after all the patches? Let's say we want to host users' processes or containers and some of them could be pwned. I see Google claiming that their VMs are isolated between the kernel and each other.

shaklee38y ago

Intel has released a statement for the codename Meltdown bug:

https://newsroom.intel.com/news/intel-responds-to-security-r...

AndyNemmity8y ago

Again conflating the issue to include AMD. This feels so disingenuous.

infinity08y ago

> We have some ideas on possible mitigations and provided some of those ideas to the processor vendors; however, we believe that the processor vendors are in a much better position than we are to design and evaluate mitigations, and we expect them to be the source of authoritative guidance.

Intel: "Recent reports that these exploits are caused by a “bug” or a “flaw” [..] are incorrect."

So much for "authoritative guidance", fuck these guys.

Someone12348y ago

Arm also claims it is working as intended:

> Arm recognises that the speculation functionality of many modern high-performance processors, despite working as intended, can be used in conjunction with the timing of cache operations to leak some information as described in this blog.

I personally don't agree, but I guess they're trying to avoid needing to issue a recall for over ten years worth of CPUs?

wahern8y ago

Then surely you must also argue that all data-dependent, side-channel attacks, such as key recovery attacks against some cryptographic algorithm implementations, are the fault of the hardware.

Unlike Intel, ARM and AMD are implicated only where the attacker can inject code or data (specifically data that is manipulated by pre-existing vulnerable code) into the target address space. The particular kernel exploits require injection of a JIT-compiled eBPF program, as they said they were unable to locate any suitable gadgets in existing compiled kernel code. I wouldn't rule out gadgets being found in the future, but much like cryptographic software timing attacks, the proper fix is to refactor sensitive software logic to be data independent. There's no way to implement an out-of-order, superscalar architecture and protect against this stuff simply because of the nature of memory hierarchies. All you can do is 1) ensure that privilege boundaries are obeyed (like AMD and ARM do, but Intel notable doesn't), and 2) provide guaranteed, constant-time instructions that programmers and compilers can reliably and conveniently leverage. Unfortunately, all the hardware vendors have sucked at providing #2 (much timing resilient cryptographic software relies on implicit, historical timing behavior, not architecturally guaranteed behavior), but it nonetheless still requires cooperation by software programers, making it a shared burden.

Also, FWIW, basically everybody outside the Linux echo chamber has known that eBPF JIT and especially unprivileged eBPF JIT was a disaster waiting to happen. This is only the latest exploit it's been at the center of, and the 2nd in as many months. The amount of attention and effort that has gone into securing eBPF is remarkable, but at the end of the day even if you could muster all the best programmers for as much time as you wanted it's still an exceptionally risky endeavor. Everything we know about the evolution of exploits screams that unprivileged eBPF JIT is an unrelenting nightmare. But it's convenient, flexible, and performant, and at the end of the day that's all people really care about, including most Linux kernel engineers. The nature of the Linux ecosystem is that even if Linus vetoed unprivileged eBPF JIT (optional or not), vendors would have likely shipped it anyhow. It's an indictment of the software industry. Blaming hardware vendors (except for the Intel issue) is just an excuse that perpetuates the abysmal state of software security.

cthalupa8y ago

>The particular kernel exploits require injection of a JIT-compiled eBPF program, as they said they were unable to locate any suitable gadgets in existing compiled kernel code

Did they say that?

I don't see anything saying they were unable to, just that they didn't bother to because it would take effort.

>But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets.

1 more reply

justincormack8y ago

They go into detail in the white paper https://developer.arm.com/-/media/Files/pdf/Cache_Speculatio...

They are adding a new instruction to control speculation...

erikb8y ago

someone should honestly do a press release like "Intel Bug not actually Intel only" or give this thing a neutral name to search for.

AndyNemmity8y ago

Variant 2 and Variant 3 are Intel only. They are the most concerning as they break VM space.

dgomesbr8y ago

Great, embargo was in and google went ahead disclosing and saying hear we're here disclosing this (because they've patched)

londons_explore8y ago

3 or 4 people had bits of demo code up on twitter earlier today.

I implemented it myself simply based on the clues in the press release from AMD explaining why they weren't vulnerable. I don't even have a computer security background.

contrarian_8y ago

So the vulnerability likely isn't something nobody thought of, it's just that nobody seriously expected the CPU vendors to make the mistake of speculating across multiple loads and actually leaving observable modifications in the caches.

Note that even speculating across multiple loads could lead to observable side-effects by measuring memory bandwidth to differentiate between loads of accessible and silent page fault addresses. [1]

An interesting question is whether the CPU would also speculate on loads from mapped PCI device regions, as that could be also detectable in many different ways.

[1] https://eprint.iacr.org/2016/613.pdf

> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)

kbwt8y ago

The papers take a while to get to the point. I nearly fell asleep re-reading the same statements until they got to the point: speculative execution of buffer overflows.

Could have been said more concisely. Sadly, this seems to be the norm with academic texts.

pacavaca8y ago

It gives all the required context, much needed for an "average" engineer to understand it. Without that, most of the people, except the microchip engineers, would have to read about the related topics first anyways. I personally was surprised at how understandably everything was explained.

j / k navigate · click thread line to collapse

Reading privileged memory with a side-channel (opens in new tab)

593 comments