AutoML toolkit for neural architecture search and hyper-parameter tuning (opens in new tab)

(github.com)

147 pointsmsalvaris7y ago59 comments

59 comments

I manage a machine learning team for a large financial services company and AutoML tools, Microsoft’s NNI included, are on our radar.

I think the `future of work` for machine learning practitioners will quickly separate into two groups: a very small and elite group that performs research and a much larger groups that use AutoML but whose jobs also deal more with data preparation (which gets automated also) and ML devops, supporting models in production.

mlthoughts20187y ago

This sounds like parody to me. There are so many problems in applied statistics, and neural networks are not helpful for most of them. Consider Bayesian analysis for very small data sets as an example (just the tip of the iceberg).

In financial services in particular, there are tons of time series and regression problems on small data such that a neural network (beyond perhaps some super small MLP) would be a ridiculous thing to try.

I think the breakdown of workload you described will only happen in business departments where there is a need for large scale embedding models, enhanced multi-modal search indices, computer vision and natural language applications, and maybe a handful of things that eventually productize reinforcement learning. I could also see this happening in businesses that can benefit from synthetically generated content, like stock photography, essays / news summaries / some fiction, website generators, probably more.

What I described above is a tiny drop in the ocean of applied statistics problems that business have to solve.

DebtDeflation7y ago

It's another example of the FAANG + Bay Area Startups world versus the other 99% of Corporate America. In the latter world, most of the "machine learning" in production is traditional stuff like Random Forest, SVM, and more recently Gradient Boosting. Hell, Marketing departments across the country are still running old school decision tree (CART and CHAID) models and logistic regression models written in SAS 20+ years ago. DL/NN is a minuscule proportion of production ML in the enterprise space.

1 more reply

byebyetech7y ago

Deep Learning also works on very small data sets by means of embeddings. A large model trained on large data sets can be used as feature extraction tool for training for small data sets.

2 more replies

human_scientist7y ago

The parent did not specifically talk about NNs. As I understand it AutoML could apply to all statistical endeavours that involve estimation (classical or bayesian).

1 more reply

mjburgess7y ago

The problem is "Applied Statistics" became "Machine Learning" which became "AI" which became "Deep Learning".

Throw away all the BS. and, yes, it's obvious.

bitL7y ago

Google, Facebook & MS already have even automated research, i.e. automated selection of a loss function, network architecture, individualized network topology etc. Amazon is not there yet. The rest of industry is still in "stone age", just "considering" using something like AutoML for basic hyperparameter tuning.

bitforger7y ago

If you automate it, is it still research? Research implies some sort of hypothesis testing, yes?

I suppose OP means there will be two groups: people who use AutoML and people who try to make AutoML better.

1 more reply

noelsusman7y ago

Hasn't this always been the case? Actually fitting a model has always been a pretty small part of an applied statistician's job. The real work is everything before and after that point.

williamsmj7y ago

I'd be interested in the creator's thoughts on this paper, "Random Search and Reproducibility for Neural Architecture Search", https://arxiv.org/abs/1902.07638, posted on the arxiv last week. Among other conclusions, they find:

"Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks"

ENAS, the specific algorithm that they find does no better than chance, is in this library. My understanding is that the results are pretty generic though, i.e. NAS is very far from a solved problem. (Hyperparameter tuning for "classical" models are another matter. That's commoditized and available as a service at this point, see tpot, DataRobot, etc., etc.)

wongarsu7y ago

> We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current stage.

No Windows support in a Microsoft product. Curious.

This looks very useful for tuning hyper-parameters, and the fact that the tuned algorithm is treated as a black box makes this very flexible.

yeahhhhh7y ago

Actually, they will support in Windows later. Due to many developers usually train their deep learning model in Linux, so they support Linux and Max first.

perturbation7y ago

Their example with LightGBM (https://nni.readthedocs.io/en/latest/gbdt_example.html) is very cool - I wanted to put together a custom script with mlflow + catboost + mlrMBO to do something similar, but this puts everything together in one package.

I think this does everything MLFlow does and more (besides maybe helping with deployment?)

yzh7y ago

I'm working on auto hyper-parameter tuning and network optimization, I always think that people have put too much focus on NAS, which aims to create a whole new network from scratch, but not nearly enough on hyper-parameter tuning and local structural optimizations for an existing network, which I think is more demanding at least in the industry. Looks less cool than NAS though, maybe that's the reason.

sgt1017y ago

I don't understand - isn't this model fishing? How is it different?

thanatropism7y ago

With training, test and validation sets.

In good old fashioned statistics there's the idea of the jackknife: for the i-th sample run a regression on all the data except i, and store statistics of interest (coefficients, predictions, etc). This gives you an ipso facto sampling distribution for the statistics of interest.

Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.

With said sampling distributions, whether from the jackknife or the bootstrap, you're able to test whether your model is valid -- what's the probability that it'll have significant coefficients or an r2/mae/mape score indicating predictive capacity.

Cross-validation (and even scikit-learn is starting to default to five folds not three) is a "lazy" version of this. You don't get a sampling distribution but at least you're able to know that a given model appears good because it grips the data with all its might and doesn't work out-of-sample.

sklearn even offers the jackknife under some ML-y name like "one at a time scoring".

glial7y ago

Yes, but that's not necessarily bad. You want a model that effectively captures the structure present in your dataset. There are currently only rules-of-thumb in model architecture, and it makes sense to explore the model space to determine which architecture and hyper parameters are suitable to the needs at hand. Two things save this from being a statistical sin: one, the final evaluation set is typically different than the validation set, and evaluation is only performed at the end of the 'fishing expedition', thus providing a reliable measure of the model's ability to generalize. Second, we're doing engineering here, not science, and our goal is to capture the structure of observations and not make a scientific claim about values of latent parameters.

sandGorgon7y ago

interesting - there's no scikit support, which for long has been the mainstay for data scientists everywhere.

Are people migrating from scikit to tensorflow in production for non-deep learning usecases ?

mmq7y ago

I think it should probably support scikit as well as any other library, since it's only making suggestions of hyper-parameters based on recorded/historical observations or random evaluations.

At least that's the behaviour of the platform[1] I am working on.

[1]: https://github.com/polyaxon/polyaxon#hyperparameters-tuning

pplonski867y ago

I think it all depends on the purpose of the library and who is a target user. The NNI is a package for tuning neural networks models, it will be mostly used in use cases that require deep neural networks, like image classification or voice recognition.

BTW, I think all autoML solutions forget about end users. They all require too much engineering knowledge from the user. I think it will be nice to have an autoML solution that can be used by citizen data scientist.

2 more replies

mmq7y ago

UPDATE: Looking at the docs, there's an example[1] using this library with scikit-learn.

[1]: https://nni.readthedocs.io/en/latest/sklearn_examples.html

scottlegrand27y ago

At a previous gig we tried to do this: port a computational graph that wasn't a neural network to tensorflow. It was a disaster. Tensorflow is very tightly optimized for the things Google think are important. if you fall off of those paths tensorflow is a god-awful slow tool to use. We saw a ~20x regression in performance.

in contrast, when we wrote bespoke GPU code for the graph, we saw a ~25x performance increase over relying on CPU plus MKL. I am being deliberately vague here and I cannot give further detail.

ec1096857y ago

You are somewhat uniquely qualified to do so:

> possibly the world's first or second (full-time) CUDA programmer, with 14 filed patents, and the world's fastest implementations of molecular Dynamics (CUDA ports of Folding@Home and AMBER).

1 more reply

williamsmj7y ago

There is scikit support. There's an example in the docs.

https://nni.readthedocs.io/en/latest/sklearn_examples.html

cuchoi7y ago

I think that for Neural Networks scikit has not been the "go to" library, in particular AutoML advertises that they automate neural architecture search which I don't think scikit allows a lot of flexibility for that.

samcodes7y ago

Have you seen TPOT [0]? AutoML library that uses genetic algorithms to write scikit code for you. So fun.

[0] https://github.com/EpistasisLab/tpot

ayidnelm7y ago

There's also auto scikit-learn https://github.com/automl/auto-sklearn if you haven't already come across that.

streetcat17y ago

scikit learn is a different type of search, hence it will not be supported by this tool or any DNN search tool.

DNN require an architecture search, I.e. the building block are full layers, depth of the network, optimizer etc.

scikit learn search a parameter space, I.e. the algorithm weight are much much simpler and few.

So to sum up, DNN search involve big building blocks while scikit learn search (or for that reason any "classical ML" algorithm) is more of a parameter search.

[ The actual sci kit learn search would also include pre processing steps, which can be seen as a separate block]

Also, note that that DNN search is much more expensive than scikit learn search (100X) ]

human_scientist7y ago

Automatically building a scikit learn estimator might include many conditional hyperparameters and also a very large amount of them (<100) [1]. However, performing joint architecture and hyperparameter search can be framed to be on a much simpler search space, e.g., for a recent paper that aims to automate the design of RNA molecules, we formulated a 14 dimensional search space which includes very little conditional hyperparameters [2].

The tools included in the repository are very broadly applicable and only a few of them are specifically targeted at neural architecture search.

[1] https://www.kdnuggets.com/2016/08/winning-automl-challenge-a... [2] https://openreview.net/forum?id=ByfyHh05tQ

williamsmj7y ago

This tool absolutely supports scikit-learn. Please see the docs. https://nni.readthedocs.io/en/latest/sklearn_examples.html.

nurettin7y ago

Do we need a hyper-parameter tuner tuner for this?

mlthoughts20187y ago

Stuart Geman (one of the inventors of Gibbs Sampling) always used to say, “Parameters are the death of an algorithm.”

nurettin7y ago

Environmental constraints (like width, height) are not bad. I would have argued Mr. Stuart.

angel_j7y ago

Does it test against and prevent over-fitting?

hestefisk7y ago

This is very cool.

j / k navigate · click thread line to collapse

59 comments

mark_l_watson7y ago

I manage a machine learning team for a large financial services company and AutoML tools, Microsoft’s NNI included, are on our radar.

mlthoughts20187y ago

What I described above is a tiny drop in the ocean of applied statistics problems that business have to solve.

DebtDeflation7y ago

1 more reply

byebyetech7y ago

Deep Learning also works on very small data sets by means of embeddings. A large model trained on large data sets can be used as feature extraction tool for training for small data sets.

2 more replies

human_scientist7y ago

The parent did not specifically talk about NNs. As I understand it AutoML could apply to all statistical endeavours that involve estimation (classical or bayesian).

1 more reply

mjburgess7y ago

The problem is "Applied Statistics" became "Machine Learning" which became "AI" which became "Deep Learning".

Throw away all the BS. and, yes, it's obvious.

bitL7y ago

bitforger7y ago

If you automate it, is it still research? Research implies some sort of hypothesis testing, yes?

I suppose OP means there will be two groups: people who use AutoML and people who try to make AutoML better.

1 more reply

noelsusman7y ago

Hasn't this always been the case? Actually fitting a model has always been a pretty small part of an applied statistician's job. The real work is everything before and after that point.

williamsmj7y ago

"Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks"

wongarsu7y ago

> We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current stage.

No Windows support in a Microsoft product. Curious.

This looks very useful for tuning hyper-parameters, and the fact that the tuned algorithm is treated as a black box makes this very flexible.

yeahhhhh7y ago

Actually, they will support in Windows later. Due to many developers usually train their deep learning model in Linux, so they support Linux and Max first.

perturbation7y ago

I think this does everything MLFlow does and more (besides maybe helping with deployment?)

yzh7y ago

sgt1017y ago

I don't understand - isn't this model fishing? How is it different?

thanatropism7y ago

With training, test and validation sets.

Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.

sklearn even offers the jackknife under some ML-y name like "one at a time scoring".

glial7y ago

sandGorgon7y ago

interesting - there's no scikit support, which for long has been the mainstay for data scientists everywhere.

Are people migrating from scikit to tensorflow in production for non-deep learning usecases ?

mmq7y ago

I think it should probably support scikit as well as any other library, since it's only making suggestions of hyper-parameters based on recorded/historical observations or random evaluations.

At least that's the behaviour of the platform[1] I am working on.

[1]: https://github.com/polyaxon/polyaxon#hyperparameters-tuning

pplonski867y ago

2 more replies

mmq7y ago

UPDATE: Looking at the docs, there's an example[1] using this library with scikit-learn.

[1]: https://nni.readthedocs.io/en/latest/sklearn_examples.html

scottlegrand27y ago

in contrast, when we wrote bespoke GPU code for the graph, we saw a ~25x performance increase over relying on CPU plus MKL. I am being deliberately vague here and I cannot give further detail.

ec1096857y ago

You are somewhat uniquely qualified to do so:

> possibly the world's first or second (full-time) CUDA programmer, with 14 filed patents, and the world's fastest implementations of molecular Dynamics (CUDA ports of Folding@Home and AMBER).

1 more reply

williamsmj7y ago

There is scikit support. There's an example in the docs.

https://nni.readthedocs.io/en/latest/sklearn_examples.html

cuchoi7y ago

samcodes7y ago

Have you seen TPOT [0]? AutoML library that uses genetic algorithms to write scikit code for you. So fun.

[0] https://github.com/EpistasisLab/tpot

ayidnelm7y ago

There's also auto scikit-learn https://github.com/automl/auto-sklearn if you haven't already come across that.

streetcat17y ago

scikit learn is a different type of search, hence it will not be supported by this tool or any DNN search tool.

DNN require an architecture search, I.e. the building block are full layers, depth of the network, optimizer etc.

scikit learn search a parameter space, I.e. the algorithm weight are much much simpler and few.

So to sum up, DNN search involve big building blocks while scikit learn search (or for that reason any "classical ML" algorithm) is more of a parameter search.

[ The actual sci kit learn search would also include pre processing steps, which can be seen as a separate block]

Also, note that that DNN search is much more expensive than scikit learn search (100X) ]

human_scientist7y ago

The tools included in the repository are very broadly applicable and only a few of them are specifically targeted at neural architecture search.

[1] https://www.kdnuggets.com/2016/08/winning-automl-challenge-a... [2] https://openreview.net/forum?id=ByfyHh05tQ

williamsmj7y ago

This tool absolutely supports scikit-learn. Please see the docs. https://nni.readthedocs.io/en/latest/sklearn_examples.html.

nurettin7y ago

Do we need a hyper-parameter tuner tuner for this?

mlthoughts20187y ago

Stuart Geman (one of the inventors of Gibbs Sampling) always used to say, “Parameters are the death of an algorithm.”

nurettin7y ago

Environmental constraints (like width, height) are not bad. I would have argued Mr. Stuart.

angel_j7y ago

Does it test against and prevent over-fitting?

hestefisk7y ago

This is very cool.

j / k navigate · click thread line to collapse