My one complaint is that the programming assignments weren't interesting at all. The results were interesting, but the setups were mostly given to us, and we just had to code an algorithm that was in our notes. For someone who understands the basics of linear algebra and programming, it was just a syntax challenge, and that got irritating after a bit so I stopped doing them.
I won't get the certificate for completing the course, but I have a few extra hours of free time each week to add this second course, so I'm happy. I doubt that the actual homework that Stanford students taking this course get is so easy and repetitive, though, and I'm positive they wouldn't complain about not getting to retake quizzes after getting poor grades.
Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.
Firstly, think about how much more difficult the assignments would be if, for example, the steps weren't broken out and we didn't get any advice on how to vectorize. Of course, it would still be short work for anyone who (a) knows Matlab/Octave and/or (b) understands the material well, but it would also be an order of magnitude harder.
Secondly - and this is by far the larger point - the original CS 229 was really about math; the programming assignments were more of an afterthought. The lectures and homework mainly focused on the theoretical derivations and corollaries of the math that led to the algorithms. Once you'd done your bit on the math and cried to your classmates and the TA about it, you could go and implement the beautiful and extremely succinct result in Matlab.
As for my perspective on the difference, I believe it is a deliberate choice made with full knowledge of the difficulty drop. For starters, there are (with regards to homework help) no TAs in this course, so the absolute difficulty would have to decline to create an equivalent experience. More significantly, the enrollment has increased by a factor of about 700. If Stanford students had trouble with the original, you can bet that the median student in the course doesn't find it as easy as either of us does. If the goal is to generate the greatest benefit for the most people, and delivering the algorithms with a good intuition on their proper use will do so, then this course has succeeded marvelously. Of course, the smartest and most dedicated students will want more, which remains available through textbooks as well as the original course handouts (http://cs229.stanford.edu/materials.html). However, I would argue that the goal of most MOOCs (massive open online courses) should be to kindle interest and foster basic understanding, both of which the Coursera version achieves.
I am also taking the course by Andrew Ng and understand your complaint that the programming assignments aren't as interesting ( from your perspective). Being quite comfortable with linear algebra, I was able to complete the assignments easily.
But when I go through the course forums, I find that for many people taking the course, the intuition behind the use of linear algebra in ML doesn't come as easy as it does for us. I think when Andrew Ng designed this online course, he must have had those people in mind also. I think he mentions it at the start of the course that it's more about understanding the concepts and the implementation details should come later. The programming exercises are designed keeping that in mind, I think.
I tried to make the programming exercises interesting for myself, by first thoroughly understanding the code that they had provided and tweaking it here and there. Once you have done that, you could apply what you've learnt on real world datasets from sources like Kaggle and see how you fare :)
I agree with this. The programming assignments I've done so far in the Machine Learning class are usually 5-7 matlab functions, many of which are about 2 lines of code (the longer ones might be ~10 lines of code). If you've ever done matlab/octave programming the assignments will take about 20-30 minutes and be completely unenlightening as you're literally just translating mathematical notation into matlab (which is, by design, already a lot like mathematical notation anyway). They provide entirely way too much skeleton code to learn anything from if you're not actively trying to learn. If I weren't already mostly familiar with most of the material presented in the class, I imagine I would never retain knowledge of how the machine learning "pipeline" worked or have any high-level understanding of the algorithms, because the assignments just require you to implement the mathematical pieces of each step, without ever asking you to, for example, actually call any optimization routines, or put the pipeline together.
The problem, I think, is that it would just be too difficult to do automatic grading in a way that is reasonably possible to pass if they don't turn most of the work into skeleton code. Since the automatic grading needs nearly exactly matching results, one minor implementation difference in a perfectly good implementation of the algorithm itself (e.g., picking a single parameter incorrectly, picking the optimization termination conditions incorrectly, choosing a different train/dev split, etc.) would make the entire solution completely wrong.
I agree the programming assignments in the Finance class tend to be too simple. Most of the code is literally handed to you, you just have to understand it well enough to change it. I also understand that even that can be a major challenge if you don't have the background for it.
But I'm choosing to see the class itself as a starting point. It's a framework for my own explorations into the topics. I can do the minimum and get the minimum out of it. Or I can use what's provided as a base and go further.
The Coursera Algorithms class, for example. Writing code that got the answer was relatively easy, so once that step was done it became about optimizing the code for my own learning benefit.
It's like any educational process, you get our what you put in.
First, the Stanford CS229 version is definitely much more difficult than what you guys had online. The focus in the actual class was on the math, derivations and proofs. The homeworks sometimes got quite tricky and took a group of us PhD students usually about 2 days to complete. There was some programming in the class but it was not auto-graded so usually we produced plots, printed them out, attached the code and had it all graded by TAs for correctness. The code we wrote was largely written without starter code and I do believe you learn more this way.
An online version of the class comes with several challenges. First, you have to largely resort to quizzes to test students (instead of marking proofs, derivations, math). There is also no trivial way to autograde resulting plots, so everything has to be more controlled, standardized and therefore include more skeleton code. But even having said all that, Andrew was tightly involved with the entire course design and he had a specific level of difficulty in mind. He wanted us to babysit the students a little and he explicitly approved every assignment before we pushed it out. In short, the intent was to reach as many people as possible (after all, scaling up education is the goal here) while giving a good flavor of applied Machine Learning.
I guess what I mean is that you have more experience than the target audience that the class was intended for and I hope they can put up more advanced classes once some basics are covered (Daphne Koller's PGM class is a step in this direction). But there are still challenges with the online classes model. Do you have ideas on how one can go beyond quizzes, or how one can scale down on the skeleton code while retaining (or indeed, increasing) the scale at which the course is taught?
If there would be peer-graded assignments in machine learning course, I would definitely have tried them out.
It's the same version as the course given at CalTech and is more in-depth than Andrew Ng's. There is no skeleton code for the programming assignments, answers are made through quizzes. I took the summer session and learned a lot from it.
Right; I agree. I'm not sure how they would go about making it more challenging though. They can't expect us to go out and collect data ourselves, after all. I suppose they could give us the data, then expect us to code the setup and algorithms up ourselves, but that, too, would become repetitive after a few assignments.
> Not to knock the course. I've learned a lot and the professor (Andrew Ng) does a good job.
Agreed once again. I knew nothing about machine learning before starting; now I know about neural networks, SVMs, and PCM. It's really cool how much I've learned already, for free, too!
I've also signed up for this course, but the quizzes really aren't up to par. As an example: the first quiz question was about training a neural network with too much data, and about whether or not said network would be able to generalize to new test cases. Overfitting neural networks wasn't even mentioned in the lectures; I had to rely on material from Andrew's class to answer the question correctly. This chasm between the lectures and the quizzes is likely because Geoffrey is the one creating the video lectures, but he's not the one creating the quiz questions; he is having TAs do it [1].
Nevertheless, it looks like they're responding to feedback, so hopefully it'll get better with time.
1. https://class.coursera.org/neuralnets-2012-001/wiki/view?pag...
My experience is that students everywhere complain about grading. I've never been to Stanford, but I've attended and worked at several other top tier universities.
The syllabus, draft though it is, indicates the second half of the class will focus on deep learning, a field of machine learning that has demonstrated huge potential.
As has been said by many already, of course, the remaining nuts to crack are high quality interaction with other students, professors, and TAs; and accreditation.
But the dis-intermediation of large universities may be nearer than we think.
If you are taking a class voluntarily over the Internet, what benefit would be gained by cheating? I presume that a large fraction of people who are doing volunteer coursework are doing it to learn, not to keep a GPA up for some other reason (sports eligibility, scholarship requirements, parental expectations, Etc.) so looking at other solutions on Github might actually enhance the experience for you if you look at other solutions. If you find a way to do it better than the other solutions that could be a goal in itself.
This is one of those things I find most intriguing about 'free' classes on the Internet, the value equation is shifted around.
"Many of you are unhappy with only being allowed to attempt a quiz once. Starting in week two, we have therefore decided to make up twice as many questions and to allow you to do each quiz twice if you want to. The second time you try it the questions will all be different. Your score will be the maximum of your two scores. For week one, the quizzes will remain as they are now.
Many of you would like the names of the videos to be more informative. We will change the names to indicate the content and the duration.
Some of you thought that some of the quiz questions were too vague. We will try to make future questions less vague.
Some of you are unhappy that we do not have the resources to support Python for the programming assignments. We sympathize with you and would do it if we could. You are still welcome to use Python (or any other language) if you can port the octave starter code to your preferred language. We have no objection to people sharing the ported versions of the starter code (but only the starter code!). However, if you get starter code in another language from someone else, you are responsible for making sure it does not contain bugs."
I thought that was pretty funny!
Oh well, to be fair I would donate quite a lot for each course that I enjoyed.
As far as "dumbing down", I've found that the Coursera classes that I've taken (Compilers, Automata Theory, Algorithms 1, SaaS and Machine Learning) have varied in difficulty quite widely. Compilers and Automata were both challenging and enjoyable, Algorithms 1 was about what I'd expect from a freshman/sophomore algorithms class and SaaS and Machine Learning were easy enough that they should be approachable to anyone with basic programming experience.
I don't feel that the difficulty in the classes that I've taken had any particular correlation with teaching effectiveness. I found Andrew Ng's ML class to be simple, but still interesting and informative - you come out of it with enough of a basic understanding to implement simple ML techniques as well as a place to start if you wish to learn more. I think that while a theory-centric class would be a nice thing to have, he's done an amazing job of making a class that can appeal to a wide range of potential students and introduce them to a field that's usually very difficult to approach.
And haven't SVMs and such gradually taken over from Neural Networks?
In seriousness when you look around at what's happening both in practice and in academia I would say RandomForests/SVM/Neural Networks all stand pretty equally and have different strengths. If you've just got rows and rows of data with numeric, categorical and missing values it's hard to beat the speed and quality of shoving it in a RandomForest. However to my knowledge SVMs are still better at solving NLP categorization tasks and handling sparse, high dimensional data. And Neural Networks always seem to be popping up solving very weird and/or hard problems.
1) a convex problem which means a unique solution and a lot of already existing technology can be used
2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations
3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)
There is an ongoing craze about deep belief networks developed by Hinton (who is teaching this course) who came up with an algorithm that can train them (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular 1) They seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.
2) They can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlaballed datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.
3) They're "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.Also, I found this paper [1] on unsupervised feature detection, if you have some additional material, I'll really appreciate if you could post it!
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44....
- How much data do you have wrt dimensionality?
- How "easy" do you suspect your problem to be? Is it likely linearly separable? Equivalently, how good are your features?
- Do you have many mixed data? Missing data? Categorical/Binary data mixed in? (Better use Forest, perhaps!)
- Do you need training to be very fast?
- Do you need testing to be very fast on new out of sample data?
- Do you need a space-efficient implementation?
- Would you prefer a fixed-size (parametric) model?
- Do you want to train the algorithm online as the data "streams" in?
- Do you want confidences or probabilities about your final predictions?
- How interpretable do you want your final model to be?
etc. etc. etc. Therefore, it doesn't make any sense to talk about one method being better than another.
One thing I will say is that, as far as I am aware, Neural Nets have a fair amount of success in academia (which should be taken with a grain of salt!), but I haven't seen them win too many Kaggle competitions, or other similar real-world problems. SVMs or Random Forests have largely become the weapon of choice here.
Neural Nets do happen to be very good when you have a LOT of data in relatively low-dimensional spaces. Many tasks, such as word recognition in audio or aspects of vision fall into this category and Google/Microsoft and others have incorporated them into their pipelines (which is much more revealing than a few papers showing higher bars for Neural Networks). In these scenarios, Neural nets will parametrically "memorize" the right answers for all inputs, so you don't have to keep the original data around, only the weighted connections.
Anyway, I wrote a smaller (and related) rant on this topic on G+: https://plus.google.com/100209651993563042175/posts/4FtyNBN5...
That said, there is a rather unhelpful herd mentality in the field, with people moving from one Next Big Thing to another, disparaging the previous Big Thing along the way.
1) a convex problem which means a unique solution and a lot of already existing technology can be used
2) the "kernel trick" which enables us to learn in complicated spaces without computing the transformations
3) can be trained online, which makes them great for huge datasets (here the point 2) might not apply - but there exist ways - if someone's interested I can point out some papers)
There is an ongoing craze about deep belief networks developed by Hinton et al. (who is teaching this course) who came up with an algorithm that can train them reasonably well (there exist local optima and such, so it's far from ideal). Some of the reasons they're popular
1) they seem to be winning algorithm for many competitions / datasets, ranging from classification in computer vision to speech recognition and if I'm not mistaken even parsing. They are for example used in the newer Androids.
2) DBNs can be used in an unsupervised mode to _automatically_ learn different representations (features) of the data, which can be then used in subsequent stages of the classification pipeline. This makes them very interesting because while labelled data might be hard to get by, we have a lot of unlabelled datasets thanks to the Internet. As what they can do - see the work by Andrew Ng when they automatically learned a cat detector.
3) DBS are "similar" to biological neural networks, so one might think they have the necessary richness for many interesting AI applications.
Please do. I want to read some about SVMs since i haven't heard that much about them.
1. They are only for classification, not every problem is classification. The other big category is regression, for example predicting the sale price of a home rather than predicting a binary "will it sell"
2. They don't have a natural probabilistic interpretation for classification. Neural networks for classification (with a logistic activation function) are trained to predict a probability, not make a simple binary decision. In practice this probability is usually very useful, although I believe SVMs have been modified to give some kind of probability.
3. I have had a tough time getting them to run quickly. Linear kernel SVMs are fast, but aren't powerful. More complex kernels are more powerful but can be very slow on moderately large datasets.
http://scikit-learn.org/stable/modules/svm.html#regression
Note: the scikit-learn implementation of SVMs is based on libsvm:
2. There are actually ways you can modify the output of an SVM to give a probabilistic interpretation[1]. But I'll agree with the not having a 'natural' probabilistic interpretation.
3. Is definitely correct, but I'm not sure NNs are that much better.
[0] http://www.svms.org/regression/
[1] http://www.cs.colorado.edu/~mozer/Teaching/syllabi/6622/pape...
Also I took a nn class in college so do you think I would get much more out of this?
There have been some huge developments in neural networks in the last few years, particularly with respect to deep learning. If you missed out on that you might want to try this class. Hinton has been involved in many of these advances.
The second half of the course appears to focus on deep learning topics so you might want to start there if you already know the basics.
My 0.02 chf.