Friday, August 1, 2014

Taking job interviews is good

I had a real job interview yesterday for a data scientist position in a (very well funded) startup. It was a short lunch meeting with the objective of "getting to know you". It was awkward and uncomfortable and I thought long and hard about writing this blog post.

So here's a list of things that were "off" and what I learned from it:

  • a meeting needs an objective that is predefined or at least specified at the start of the meeting. I learned that if there isn't one, I need to ask the other party to define one. It's like a friend of mine just wrote: "If you don't know the question how will you know if/when you get the answer?"
  • meeting for lunch is OK, but you need a place with limited noise. I learned that if I either organize or agree to a meeting in public space, I need to scout it out beforehand or be vocal about being bothered by the noise.
  • when you meet for lunch, organize the questions-answers so that everyone has a chance to enjoy or at least finish their meal. Once again it was a new experience for me and I'll verbalize this the next time.
  • have the courage to say "I don't think it's a good fit". Leaving with a "we might get back to you either before or after vacations or something, maybe" is just insulting. In future I will make a concluding remark myself if the other side is avoidant.
Interviewing for a position is a skill, from both sides of the table. Only way to get better is to practice. The experience I had yesterday was a good learning experience and I'll continue the journey :)

Wednesday, June 18, 2014

Jobs in data science or data analytics

I kept putting off writing this blog post hoping to get some more info or that something would happen, but neither did. The job market seems bleak.

I've been checking the local (Estonian) job offers in for a few months now, looking for signs of data science/analytics jobs. There are some, but the wording is weird and it shows that the companies are hiring their first "data people".

Out of interest I went and applied to a few of these positions. One company actually knew a little something about modeling and they were looking for someone to work on classification. Others were completely random. Most didn't reply and the few that did expressed that they didn't really know what they were looking for. Reminded me a lot of the "if carpenters were hired like programmers" joke.

Anyways, so that's the current status of data science jobs in Estonia. Which suits me fine since I'm just an enthusiast.

The Machine Learning by Andrew Ng started yesterday on Coursera. It's supposedly very good, but I haven't had the time to check it out yet. I'm still finishing up the Practical Machine Learning by Jeff Leek and it's pretty good. The materials are understandable and quizzes are mostly clear with multiple choice answers so there's no problems with number formats and stuff.

There was a first Meetup for a group called Startup Founders 101, run by Development Fund (Arengufond). For the first occasion the topic was "From Employee to Entrepreneur". It got me thinking that I could probably start doing some consulting in data science in a few months. Real practice with real problems! Still an idea tho.. we'll see what happens :)

Wednesday, June 4, 2014

To MOOC or not to MOOC?

MOOCs are Massive Open Online Courses of course. So the question is if one should use this resource on the way to becoming a data scientist? The focus of this post is mostly on machine learning (ML) as it's the newest and messiest of the areas that make up data science (remember the Venn diagram). It's also the area that I'm actively learning and experimenting in.

MOOCs are a new thing. It got really started with AI Class by Sebastian Thrun and Peter Norvig, which led to the founding of Udacity. I took the first and only offering of that course back in fall of 2011 and enjoyed it very much. My Sunday mornings and later days and evenings were spent watching the videos and working the problems. Fun times :)

A few more players have entered the MOOC market since then and all of the major players offer courses on machine learning (only listing those that are upcoming or have materials available):

And so on... If you need a refresher on some concepts then Khan Academy is a great resource on pretty much everything. In addition to math they cover stuff from probability through to confidence intervals.

Want to talk to others about machine learning and ask questions? Sure, all of the courses have message boards, but that's course-specific. Reddit has an active ML-specific subreddit with over 24K subscribers. Forums at the MOOCs are actually good as well - you will see approaches and explanations that you would never come up with yourself!

Now that we've established that there's a great many ways to actively learn machine learning and data science - should you? Well, the online courses are of variable length and quality and I've only taken a few of them so far. The original AI Class is no longer available, so that's that. The Coursera specialization has some interesting courses, but the one on statistical inference was a letdown for me. It was a birds-eye view of probability, hypothesis testing, some Bayesian inference and power calculations. If you know most of that stuff and want a refresher then it's a good course to take, especially since all the quizzes are available from day 1. I only knew about half, so that was OK, but the other half went over my head and would've required time I didn't have to really dig into it with help from other sources.

I'm taking the Regression Models and Practical Machine learning on Coursera right now, as well as Machine Learning that starts in a couple of weeks. While I have some experience with natural language processing and topic modeling, these should help me get a better understanding of a few more areas.

So yeah, go on and check out these courses, identify your skill level and start learning! Take the free courses first :)

Next post should be about jobs in data science.

Monday, June 2, 2014

The beginning

I've been following the Big Data and Data Science hype for a few years now. So far it's been something that's done elsewhere, mostly in top companies like Google, Microsoft and Amazon, along with top universities like MIT, Stanford and Cambridge. It's pretty intimidating trying to be a data science enthusiast from Estonia :)

Just this morning I found (the original) data science Venn diagram that puts things into perspective for me. I've been trying to figure out how a classification expert (subset of machine learning) could be useful in psychology if he or she lacks the knowledge to make sense of the results. I just completed my Masters level studies in psychology and it's apparent that it's not something that a machine learning expert can just jump into without extensive study or very close collaboration effort. The Venn diagram sums it up pretty well - you need domain knowledge, math & stats skills and be able to work with data and algorithms to get to data science.

Why would one aim for data science? Well, as the Wikipedia article explains, it's "extraction of knowledge from data". In my studies I've seen up close the process of gathering data on how children study - they fill out tests and partake in experiments, the same is asked of their teachers and sometimes parents. All of it takes years of effort by teams of people and results in a large amount of data which is then subjected to a tiny sample of traditional research: "lets see if these two things correlate". It does result in articles, but mostly only to confirm the ideas previously thought. This is actually well and good as the people here are truly masters of their trade in the realms of developmental and educational psychology. On the other hand, it feels like the work of all the children and researchers deserves another chance in the hands of a data scientist. Or rather similar work to be done in the future as the current data sets are of a quality to take even the best data wrangler to an early grave.

To sum up, I have some math and stats background, have been writing software more than 20 years, did some sentiment analysis at Cambridge Uni last summer and consider myself a data science enthusiast. Future posts will explore MOOCs currently available in this direction, overview of job offers in data science and other musings on this topic.