I think the `future of work` for machine learning practitioners will quickly separate into two groups: a very small and elite group that performs research and a much larger groups that use AutoML but whose jobs also deal more with data preparation (which gets automated also) and ML devops, supporting models in production.
In financial services in particular, there are tons of time series and regression problems on small data such that a neural network (beyond perhaps some super small MLP) would be a ridiculous thing to try.
I think the breakdown of workload you described will only happen in business departments where there is a need for large scale embedding models, enhanced multi-modal search indices, computer vision and natural language applications, and maybe a handful of things that eventually productize reinforcement learning. I could also see this happening in businesses that can benefit from synthetically generated content, like stock photography, essays / news summaries / some fiction, website generators, probably more.
What I described above is a tiny drop in the ocean of applied statistics problems that business have to solve.
Throw away all the BS. and, yes, it's obvious.
I suppose OP means there will be two groups: people who use AutoML and people who try to make AutoML better.
"Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks"
ENAS, the specific algorithm that they find does no better than chance, is in this library. My understanding is that the results are pretty generic though, i.e. NAS is very far from a solved problem. (Hyperparameter tuning for "classical" models are another matter. That's commoditized and available as a service at this point, see tpot, DataRobot, etc., etc.)
No Windows support in a Microsoft product. Curious.
This looks very useful for tuning hyper-parameters, and the fact that the tuned algorithm is treated as a black box makes this very flexible.
I think this does everything MLFlow does and more (besides maybe helping with deployment?)
In good old fashioned statistics there's the idea of the jackknife: for the i-th sample run a regression on all the data except i, and store statistics of interest (coefficients, predictions, etc). This gives you an ipso facto sampling distribution for the statistics of interest.
Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.
With said sampling distributions, whether from the jackknife or the bootstrap, you're able to test whether your model is valid -- what's the probability that it'll have significant coefficients or an r2/mae/mape score indicating predictive capacity.
Cross-validation (and even scikit-learn is starting to default to five folds not three) is a "lazy" version of this. You don't get a sampling distribution but at least you're able to know that a given model appears good because it grips the data with all its might and doesn't work out-of-sample.
sklearn even offers the jackknife under some ML-y name like "one at a time scoring".
Are people migrating from scikit to tensorflow in production for non-deep learning usecases ?
At least that's the behaviour of the platform[1] I am working on.
[1]: https://github.com/polyaxon/polyaxon#hyperparameters-tuning
BTW, I think all autoML solutions forget about end users. They all require too much engineering knowledge from the user. I think it will be nice to have an autoML solution that can be used by citizen data scientist.
[1]: https://nni.readthedocs.io/en/latest/sklearn_examples.html
in contrast, when we wrote bespoke GPU code for the graph, we saw a ~25x performance increase over relying on CPU plus MKL. I am being deliberately vague here and I cannot give further detail.
> possibly the world's first or second (full-time) CUDA programmer, with 14 filed patents, and the world's fastest implementations of molecular Dynamics (CUDA ports of Folding@Home and AMBER).
DNN require an architecture search, I.e. the building block are full layers, depth of the network, optimizer etc.
scikit learn search a parameter space, I.e. the algorithm weight are much much simpler and few.
So to sum up, DNN search involve big building blocks while scikit learn search (or for that reason any "classical ML" algorithm) is more of a parameter search.
[ The actual sci kit learn search would also include pre processing steps, which can be seen as a separate block]
Also, note that that DNN search is much more expensive than scikit learn search (100X) ]
The tools included in the repository are very broadly applicable and only a few of them are specifically targeted at neural architecture search.
[1] https://www.kdnuggets.com/2016/08/winning-automl-challenge-a... [2] https://openreview.net/forum?id=ByfyHh05tQ