> I don't know how they go from that to reading memory though.
That was the second bit of the example source code:
unsigned long index2 = ((value&1)*0x100)+0x200;
This creates one of two different addresses, depending upon the value of bit zero of the memory location being attacked. The two different addresses are farther apart than the size of a cache line.
> unsigned char value2 = arr2->data[index2];
This actually does the read from one of the two different addresses (which results in the value located at one of them becoming resident in cache). Note that the value returned here is a "don't care" item.
Then, after everything unwinds from the speculation, the follow on code on the real path would read from both of the two possible addresses that were put into "index2". The read that returns data faster must have been in cache. Knowing which one was in cache, you now know the value of bit zero of the target address location.
Repeat the same block of code for bits 1-7 and you'll have read a whole byte. Continue and you can read as much as you like. You just gather data very slowly (the article mentioned about 2000 bytes per second).