List of data science and machine learning resources (opens in new tab)

(conductrics.com)

197 pointsseats13y ago15 comments

15 comments

Great write-up, and awesome list of resources!

The only thing I'd probably add is that there's a pretty significant gap going from learning linear algebra to more advanced topics such as LDA.

For people who are just getting started with machine learning, it's probably best to get started with implementing some of the more "intuitive" algorithms such as decision trees, k-means, and naive Bayes before moving over to some of the more recent academic work.

Other things that are pretty useful, but often forgotten, such as feature selection, data normalization, and even data visualization. Algorithms are usually just one part of machine learning, but even the best algorithm wouldn't be able to do anything without identifying what the best features of your data are.

Still, it's a great list of more advanced topics, and definitely something I'll keep bookmarked for future reference.

snippyhollow13y ago

To bridge the gap between naive Bayes and LDA, I would recommend going from k-means to EM and then from EM to variational Bayes. K-means to EM is covered in chapters 20 (pp. 286-), 22 (pp. 300-) and 33 (pp. 422-) of MacKay's ITILA [1] (excellent and free book BTW). I recommend to learn about (== apply on something) the junction tree algorithm because you will have to brush on graph theory. Also, do more convex optimization beforehand than I did, or you will have to catch up: take a full course or full book on it.

For LDA you'll need to understand Dirichlet processes, I find the introduction by Frigyik et al. [2] to be excellent for that. You may need to read A Measure Theory Tutorial (Measure Theory for Dummies) by Gupta [3] before. Finally, I put there the two most influential LDA papers to me: [4] and then [5].

[1] http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

[2] http://www.ee.washington.edu/research/guptalab/publications/...

[3] https://www.ee.washington.edu/techsite/papers/documents/UWEE...

[4] http://www.psychology.adelaide.edu.au/personalpages/staff/si...

[5] http://videolectures.net/site/normal_dl/tag=83534/nips2010_1...

mjw13y ago

In case the measure theory put anyone off: you don't need Dirichlet Processes for plain LDA, just the finite-dimensional http://en.wikipedia.org/wiki/Dirichlet_distribution (which isn't so bad and a very useful tool in Bayesian stats as the conjugate prior for discrete observations)

For some of the non-parametric variants like hierarchical dirichlet process LDA, you need DPs, but that stuff is pretty hardcore -- don't walk before you can run.

Another route to LDA (assumes some Bayesian stats basics):

* Learn a bit about Markov chains if you don't know them already * Read up on sampling-based approximate inference methods and find a proof that a Gibbs sampler converges (or just take it on trust...) * Read the classic Griffiths and Steyvers paper deriving a collapsed Gibbs sampler for LDA [1]

[1] http://www.pnas.org/content/101/suppl.1/5228.full.pdf

conductrics13y ago

Thanks! Sure, I didn't mean to imply that once you learn Linear Algebra you are all done - just that it you really need it and will make your life much easier once you do. Yeah, I didn't include a ton of stuff - nothing on EM, trees, boosting, etc. Prob the biggest thing missing is something explicitly about regularization. Maybe I will add something there. Please feel free to add stuff you think others might be interested in down in the comments section of the post. Thanks again for the comments and taking the time to read through it and don't forget to sign up for a free account!

rm99913y ago

Sorry for the slightly off-topic comment, but you should avoid shortening 'latent dirichlet allocation' to LDA without context because it also means 'linear discriminant analysis'. Linear discriminant analysis has historically been an important topic in general machine learning.

dacilselig13y ago

hi clarle,

I was wondering, could you list some of the more recent academic works? I've touched up on the basics and feel pretty comfortable with them so I want to try something a bit more advanced.

antman13y ago

Google is your friend. You will usually find something about those things by altering the following

best machine learning site:stackoverflow.com "closed as not"

eli_awry13y ago

I've spent the last 1.5 years as a machine learning PhD student slowly discovering many of these resources and topics, and I wish I had had this list at the beginning - it contains most of the gems I've found. I'd add that PGM course on Coursera clearly explains fundamental topics in probabilistic graphical models.

It's important to understand individual algorithms, but in many ways it's more important to have a broad overview of the field and its more modern methods, so that given a problem it's possible to think about the best way to solve it, and to share a common language with others who may have ideas. Beyond this list and various online courses, I've found that talking to people about their work and explain the high-level concepts of every black-box classifier or similarity metric or whatever it is they use has been quite educational

RaSoJo13y ago

Awesome post. It has been bookmarked, Evernoted, printed and stuck up on my wall.

I did note the absence of the oft quoted Andrew Ng's Coursera course on ML. I assume the author has put it under : "disruptive educational sites".

But genuinely want to know how Ng's course measures up to the other resources mentioned in this post??

sherjilozair13y ago

I don't think it measures up as much. It's basically a dumbed down version of his actual Stanford course. If you have any experience in Math and Linear Algebra, you should take a more serious course like his lectures in Youtube or Caltech's course on Youtube. The ideal audience of the Coursera course is non-math people who want to get an idea of what machine learning is.

conductrics13y ago

I reached out to Yann LeCun and he emailed me a couple more recent links. I updated the deep learning section of the post to include them. Feel free to check them out.

paulgb13y ago

Great list. Anyone have a recommendation for a good, rigorous coverage of Bayesian Statistics?

icedin13y ago

Great list, well done Matt. Clear and concise as always.

tgwilson13y ago

Fantastic list of resources!

simplerichard13y ago

Great list.

j / k navigate · click thread line to collapse

15 comments

clarle13y ago

Great write-up, and awesome list of resources!

The only thing I'd probably add is that there's a pretty significant gap going from learning linear algebra to more advanced topics such as LDA.

Still, it's a great list of more advanced topics, and definitely something I'll keep bookmarked for future reference.

snippyhollow13y ago

[1] http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

[2] http://www.ee.washington.edu/research/guptalab/publications/...

[3] https://www.ee.washington.edu/techsite/papers/documents/UWEE...

[4] http://www.psychology.adelaide.edu.au/personalpages/staff/si...

[5] http://videolectures.net/site/normal_dl/tag=83534/nips2010_1...

mjw13y ago

For some of the non-parametric variants like hierarchical dirichlet process LDA, you need DPs, but that stuff is pretty hardcore -- don't walk before you can run.

Another route to LDA (assumes some Bayesian stats basics):

[1] http://www.pnas.org/content/101/suppl.1/5228.full.pdf

conductrics13y ago

rm99913y ago

dacilselig13y ago

hi clarle,

I was wondering, could you list some of the more recent academic works? I've touched up on the basics and feel pretty comfortable with them so I want to try something a bit more advanced.

antman13y ago

Google is your friend. You will usually find something about those things by altering the following

best machine learning site:stackoverflow.com "closed as not"

eli_awry13y ago

RaSoJo13y ago

Awesome post. It has been bookmarked, Evernoted, printed and stuck up on my wall.

I did note the absence of the oft quoted Andrew Ng's Coursera course on ML. I assume the author has put it under : "disruptive educational sites".

But genuinely want to know how Ng's course measures up to the other resources mentioned in this post??

sherjilozair13y ago

conductrics13y ago

I reached out to Yann LeCun and he emailed me a couple more recent links. I updated the deep learning section of the post to include them. Feel free to check them out.

paulgb13y ago

Great list. Anyone have a recommendation for a good, rigorous coverage of Bayesian Statistics?

icedin13y ago

Great list, well done Matt. Clear and concise as always.

tgwilson13y ago

Fantastic list of resources!

simplerichard13y ago

Great list.

j / k navigate · click thread line to collapse