Monday, June 2, 2014

The beginning

I've been following the Big Data and Data Science hype for a few years now. So far it's been something that's done elsewhere, mostly in top companies like Google, Microsoft and Amazon, along with top universities like MIT, Stanford and Cambridge. It's pretty intimidating trying to be a data science enthusiast from Estonia :)

Just this morning I found (the original) data science Venn diagram that puts things into perspective for me. I've been trying to figure out how a classification expert (subset of machine learning) could be useful in psychology if he or she lacks the knowledge to make sense of the results. I just completed my Masters level studies in psychology and it's apparent that it's not something that a machine learning expert can just jump into without extensive study or very close collaboration effort. The Venn diagram sums it up pretty well - you need domain knowledge, math & stats skills and be able to work with data and algorithms to get to data science.

Why would one aim for data science? Well, as the Wikipedia article explains, it's "extraction of knowledge from data". In my studies I've seen up close the process of gathering data on how children study - they fill out tests and partake in experiments, the same is asked of their teachers and sometimes parents. All of it takes years of effort by teams of people and results in a large amount of data which is then subjected to a tiny sample of traditional research: "lets see if these two things correlate". It does result in articles, but mostly only to confirm the ideas previously thought. This is actually well and good as the people here are truly masters of their trade in the realms of developmental and educational psychology. On the other hand, it feels like the work of all the children and researchers deserves another chance in the hands of a data scientist. Or rather similar work to be done in the future as the current data sets are of a quality to take even the best data wrangler to an early grave.

To sum up, I have some math and stats background, have been writing software more than 20 years, did some sentiment analysis at Cambridge Uni last summer and consider myself a data science enthusiast. Future posts will explore MOOCs currently available in this direction, overview of job offers in data science and other musings on this topic.