Vladimir Vapnik Joins Facebook Research (opens in new tab)

(facebook.com)

111 pointsgie11y ago25 comments

25 comments

Am I the only one assuming that this many excellent scientists moving from academia is a loss for science in general? Will they really publish research in the same way they did before?

(Make no mistake, I can fully understand them, professors paid 80k per year, lacking resources, fighting bureaucrats, it is a great thing that they are recognised and at last paid what they deserve for devoting their lives to science.)

buttproblem11y ago

While I am curious too about your question, Vapnik was previously working in industry at NEC Labs in New Jersey.

patkai11y ago

Good point. But I do wonder about Facebook's journal publishing policy.

xamdam11y ago

- Highly doubt it's purely a money decision

- Vapnik is joining a number of people he previously worked with

- Getting huge computational resources and seeing your ideas applied to real data is rewarding

iandanforth11y ago

This is another great example of the unreasonable effectiveness of data. LeCunn, Hinton, Ng, Vapnik were all recruited on the basic fact that there is simply no way to do cutting edge research today without access to the data and computing resources of Google/Facebook/Yahoo/Baidu.

Edit: "No way" is inaccurate. I should have said it is much easier to do at these companies. Also it is inaccurate to imply this is the only reason these great minds have joined these companies.

azakai11y ago

> LeCunn, Hinton, Ng, Vapnik were all recruited on the basic fact that there is simply no way to do cutting edge research today without access to the data and computing resources of Google/Facebook/Yahoo/Baidu.

I don't see many details here, are you sure that's the case?

There are other reasons a giant of the field might decide to work at Facebook. They might give him more freedom than his previous employer. Perhaps friends of his already work at Facebook. The location and compensation may also play into it.

I don't want to be skeptical for no reason, but you're championing a popular narrative which I don't see direct support for in this instance.

robrenaud11y ago

There is great, big data driven research coming out of Stanford using Common Crawl. For example, see http://www-nlp.stanford.edu/projects/glove/ . They successfully train an 840 billion token corpus.

Vapnik is a big theory guy. Though I am not sure he has done anything of big practical importance recently, his immense contribution to ML (the SVM) was done at a time when machines were many orders of magnitudes weaker than they are now.

mturmon11y ago

"In writing this book I had one more goal in mind: I wanted to stress the practical power of abstract reasoning. The point is that during the last few years at different computer science conferences, I heard reiteration of the following claim:

  Complex theories do not work, simple algorithms do.

"One of the goals of this book is to show that, at least in the problems of statistical inference, this is not true. I would like to demonstrate that in this area of science a good old principle is valid:

  Nothing is more practical than a good theory.

-- From Vapnik's preface to The Nature of Statistical Learning Theory

Vapnik is not well-described as a "theory guy". That implies that he's not interested in connections between theory and practice, and this is most profoundly not the case. He has arguably been the most successful ML researcher ever as far as connecting abstract theory to real-world outcomes.

Besides the SVM: the VC dimension started out as a lemma regarding set counting, and he pushed it to the surprising (even shocking) conclusion of universal consistency for very general classes of estimators.

robrenaud11y ago

I guess it depends on what kind of semantics you apply to "theory guy". In my mind it's not at all dismissive.

I mean it in it a foundation sense, rather than an applications sense. He has done great work with a whiteboard and pure thought, without the need for terabytes of data and thousands of machines.

1 more reply

nl11y ago

There is great, big data driven research coming out of Stanford using Common Crawl. For example, see http://www-nlp.stanford.edu/projects/glove/ . They successfully train an 840 billion token corpus.

I haven't seen this paper before (thanks!!). How different is it to Word2Vec?

Clearly the pre-trained vectors at that scale (and much bigger than the ones released with Word2Vec) are new and very exciting.

Smerity11y ago

The paper compares in detail against word2vec, but (spoiler alert) GloVe using 42 billion tokens from Common Crawl beats word2vec using 100 billion tokens from the Google News corpus!

They don't actually use the 840 billion token model in the paper as it was made with some parameters that didn't allow for direct comparison, but the code and the models are all released for anyone to use from their site.

This is one of many great examples of open datasets like Common Crawl allowing talented people from academia and start-ups to compete with the large proprietary datasets of Google or Bing.

(disclaimer: data scientist at Common Crawl who does the crawling)

2 more replies

tlarkworthy11y ago

Vapnik is philosophically not big data. SVMs are data efficient, at the cost of O(n^3) partitioning algorithms. His work has been more about maximizing the utility of the data you have.

j2kun11y ago

I think the real reason is because Facebook plans to recreate the Bell labs style of industry research, where researchers have license to do whatever they want.

Teodolfo11y ago

That is not accurate at all. These people can and did do lots of research in academia and had plenty of data. They obviously get paid a lot more in industry, however.

moab11y ago

Sources? I've seen first hand the limits of the 'data' we had at school. Papers constantly citing in their experiments: "the largest real world graphs we are aware of are on the order of 1B vertices" (a twitter graph, from something like 2011. The other highly cited one is the live-journal graph).

There's a massive dearth of data in academia. This is also why you see people like Kleinberg working directly with facebook on network research.

vrnut11y ago

I'm probably not the only VR nut who confused the person in the title with Vladimir Vukićević, the Director of Engineering of Mozilla who has done worked on some Oculus-centric web vr stuff for Mozilla.

http://blog.bitops.com/blog/2014/06/26/first-steps-for-vr-on...

vanderZwan11y ago

A few weeks ago an article on Nautil.us about innovations in machine learning. Vladimir Vapnik was mentioned, specifically how he used poetry to teach a machine handwriting. Very fascinating article in general:

http://nautil.us/issue/6/secret-codes/teaching-me-softly

dmmcenzie11y ago

Never heard of the guy. Who?

cfrs11y ago

"The original SVM algorithm was invented by Vladimir N. Vapnik" http://en.wikipedia.org/wiki/Support_vector_machine

guard-of-terra11y ago

http://en.wikipedia.org/wiki/VC_dimension

j / k navigate · click thread line to collapse

25 comments

patkai11y ago

Am I the only one assuming that this many excellent scientists moving from academia is a loss for science in general? Will they really publish research in the same way they did before?

buttproblem11y ago

While I am curious too about your question, Vapnik was previously working in industry at NEC Labs in New Jersey.

patkai11y ago

Good point. But I do wonder about Facebook's journal publishing policy.

xamdam11y ago

- Highly doubt it's purely a money decision

- Vapnik is joining a number of people he previously worked with

- Getting huge computational resources and seeing your ideas applied to real data is rewarding

iandanforth11y ago

Edit: "No way" is inaccurate. I should have said it is much easier to do at these companies. Also it is inaccurate to imply this is the only reason these great minds have joined these companies.

azakai11y ago

I don't see many details here, are you sure that's the case?

I don't want to be skeptical for no reason, but you're championing a popular narrative which I don't see direct support for in this instance.

robrenaud11y ago

There is great, big data driven research coming out of Stanford using Common Crawl. For example, see http://www-nlp.stanford.edu/projects/glove/ . They successfully train an 840 billion token corpus.

mturmon11y ago

  Complex theories do not work, simple algorithms do.

  Nothing is more practical than a good theory.

-- From Vapnik's preface to The Nature of Statistical Learning Theory

robrenaud11y ago

I guess it depends on what kind of semantics you apply to "theory guy". In my mind it's not at all dismissive.

I mean it in it a foundation sense, rather than an applications sense. He has done great work with a whiteboard and pure thought, without the need for terabytes of data and thousands of machines.

1 more reply

nl11y ago

I haven't seen this paper before (thanks!!). How different is it to Word2Vec?

Clearly the pre-trained vectors at that scale (and much bigger than the ones released with Word2Vec) are new and very exciting.

Smerity11y ago

The paper compares in detail against word2vec, but (spoiler alert) GloVe using 42 billion tokens from Common Crawl beats word2vec using 100 billion tokens from the Google News corpus!

This is one of many great examples of open datasets like Common Crawl allowing talented people from academia and start-ups to compete with the large proprietary datasets of Google or Bing.

(disclaimer: data scientist at Common Crawl who does the crawling)

2 more replies

tlarkworthy11y ago

Vapnik is philosophically not big data. SVMs are data efficient, at the cost of O(n^3) partitioning algorithms. His work has been more about maximizing the utility of the data you have.

j2kun11y ago

I think the real reason is because Facebook plans to recreate the Bell labs style of industry research, where researchers have license to do whatever they want.

Teodolfo11y ago

That is not accurate at all. These people can and did do lots of research in academia and had plenty of data. They obviously get paid a lot more in industry, however.

moab11y ago

There's a massive dearth of data in academia. This is also why you see people like Kleinberg working directly with facebook on network research.

vrnut11y ago

http://blog.bitops.com/blog/2014/06/26/first-steps-for-vr-on...

vanderZwan11y ago

http://nautil.us/issue/6/secret-codes/teaching-me-softly

dmmcenzie11y ago

Never heard of the guy. Who?

cfrs11y ago

"The original SVM algorithm was invented by Vladimir N. Vapnik" http://en.wikipedia.org/wiki/Support_vector_machine

guard-of-terra11y ago

http://en.wikipedia.org/wiki/VC_dimension

j / k navigate · click thread line to collapse