* Using reinforcement learning so the computer can figure out how to parallelize code and models on its own. In experiments, the machine beats human-designed parallelization.
* Replacing B-tree indices, hash maps, and Bloom filters with data-driven indices learned by deep learning models. In experiments, the learned indices outperform the usual stalwarts by a large margin in both computing cost and performance, and are auto-tuning.
* Using reinforcement learning to manage datacenter power. Machine intelligence outperforms human-designed energy-management policies.
* Using machine intelligence to replace user-tunable performance options in all software systems, eliminating the need to tweak them with command line parameters like --num-threads=16, --max-memory-use=104876, etc. Machine intelligence outperforms hand-tuning.
* Using machine intelligence for all tasks currently managed with heuristics. For example, in compilers: instruction scheduling, register allocation, loop nest parallelization strategies, etc.; in networking: TCP window size decisions, backoff for retransmits, data compression, etc.; in operating systems: process scheduling, buffer cache insertion/replacement, file system prefetching, etc.; in job scheduling systems: which tasks/VMs to co-locate on same machine, which tasks to pre-empt, etc.; in ASIC design: physical circuit layout, test case selection, etc. Machine intelligence outperforms human heuristics.
IN SHORT: machine intelligence (today, that means deep learning and reinforcement learning) is going to penetrate and ultimately control EVERY layer of the software stack, replacing human engineering with auto-tuning, self-improving, better-performing code.
Eye-opening.
Ah so it appears they're advocating using neural networks as index functions to sorted arrays (hashmaps are simply sorted by hash instead of by something in the data).
So what they do is they take a FIXED set of data that you want to quickly lookup in, already sorted, train a model (2 layer 32 width, relu activation is one architecture, but they also train sequences of models, HUGE changes to error (as the cost of max and min error are huge, you minimize max error rather than average error)).
They have the following brilliant insight : an index over a database (which gives the position of the data given the search key) is a CDF (cumulative distribution function) ! That's brilliant ! Of course it is !
And of course, this is Google. Once you have an index trained (which is a linear operation), you can translate the neural network model directly into C++, and compile it into machine instructions that don't depend on anything like tensorflow libraries. The resulting code can be pasted into anything you want. This may work fast, but seems less then entirely practical ... although I guess you could do the same in Java far easier and you could just include that code.
Paper here: http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
Reading your cite, the practical issue seems to me to be that the optimizer's memory footprint costs may in fact negate any benefit (e.g. ~40% over LRU) obtained in reducing cache misses.
My gut feeling is that this approach (for online systems) may work best with a hardware component (a card hosting the 'experts' and their virtual model e.g. the "virtual cache"). The distributed variant also seems worth exploring.
> A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said
Good summary, but someone still has to write the machine intelligence!
That's a heck of a performance boost for a chip that's likely costing google way less than the nvidia flagship.
[1] http://www.tomshardware.com/news/nvidia-titan-v-110-teraflop...
Designing and taping out a new ASIC isn't cheap.
Presumably Google needs to use a fairly recent process (22nm or better?), which means GlobalFoundaries/TSMC or Samsung (do any of the Chinese native fabs have 22nm yet?). I wonder who us building them?
So many questions...
It's an accelerator to run Tensorflow graphs, and TF graphs essentially are converted to matrix operations and convolutions.
On a DL11 server, it will take about 60 hrs, and only cost you 15k upfront. The economics speak for themselves for fp32 training, at this moment in time.
In hardware, both digital and analog designers seem to use lots of heuristics in how they design things. Certainly could help there. Might be especially useful in analog due to small number of experienced engineers available.
In the case of the hash table, I assume it's using the model to compute the hash function.
"An ... approach to handling inserts is to build a delta-index. All inserts are kept in buffer and from time to time merged with a potential retraining of the model."
Another illuminating sentence from the paper was this:
>This leads to an interesting observation: a model which predicts the position given a key inside a sorted array effectively approximates the cumulative distribution function (CDF). We can model the CDF of the data to predict the position as: p = F(Key) ∗ N
Maybe it's just my very limited knowledge as an undergrad but I'm feeling that this can be the start of something big. Another idea that just came to me after is how much of this ML is applicable to the domain of cryptography. In my security class it seemed like much of the famous hash functions for example were somehow "found" in vast space of potential schemes.
For example:
The easiest and most abundant thing to learn on the web is unsurprisingly web development.
An ML engineer can use ML to optimize the data structures that he uses for his models.
I could not say the same about fields like biology or physics.