Wednesday, June 4, 2014

To MOOC or not to MOOC?

MOOCs are Massive Open Online Courses of course. So the question is if one should use this resource on the way to becoming a data scientist? The focus of this post is mostly on machine learning (ML) as it's the newest and messiest of the areas that make up data science (remember the Venn diagram). It's also the area that I'm actively learning and experimenting in.

MOOCs are a new thing. It got really started with AI Class by Sebastian Thrun and Peter Norvig, which led to the founding of Udacity. I took the first and only offering of that course back in fall of 2011 and enjoyed it very much. My Sunday mornings and later days and evenings were spent watching the videos and working the problems. Fun times :)

A few more players have entered the MOOC market since then and all of the major players offer courses on machine learning (only listing those that are upcoming or have materials available):

And so on... If you need a refresher on some concepts then Khan Academy is a great resource on pretty much everything. In addition to math they cover stuff from probability through to confidence intervals.

Want to talk to others about machine learning and ask questions? Sure, all of the courses have message boards, but that's course-specific. Reddit has an active ML-specific subreddit with over 24K subscribers. Forums at the MOOCs are actually good as well - you will see approaches and explanations that you would never come up with yourself!

Now that we've established that there's a great many ways to actively learn machine learning and data science - should you? Well, the online courses are of variable length and quality and I've only taken a few of them so far. The original AI Class is no longer available, so that's that. The Coursera specialization has some interesting courses, but the one on statistical inference was a letdown for me. It was a birds-eye view of probability, hypothesis testing, some Bayesian inference and power calculations. If you know most of that stuff and want a refresher then it's a good course to take, especially since all the quizzes are available from day 1. I only knew about half, so that was OK, but the other half went over my head and would've required time I didn't have to really dig into it with help from other sources.

I'm taking the Regression Models and Practical Machine learning on Coursera right now, as well as Machine Learning that starts in a couple of weeks. While I have some experience with natural language processing and topic modeling, these should help me get a better understanding of a few more areas.

So yeah, go on and check out these courses, identify your skill level and start learning! Take the free courses first :)

Next post should be about jobs in data science.

