(Just launched, 5.24.18, the new Data Scientist Nanodegree program!)
You’ve just spent the last year working on honing your skills through the Data Analysis program. You dutifully spent your evenings — sometimes late into the night — on homework and projects, ignoring friends, family, and the ever-growing mountain of laundry in the corner. You coded hobby projects for your local municipality and wrote up an entire epic (at least, you thought it was epic) series of blog posts on your findings. And when a friend mentions her department’s struggling marketing efforts, your mind spins away on ways you might capture meaningful data and process the results.
Now you want to get a job. Which, despite your best efforts as a data analyst, depends almost entirely on acing the interview.
It should come as no surprise that careful preparation, and understanding the expectations going into the process, is what it takes not only to survive an interview for a data analyst position, but to set yourself apart as the best and most qualified candidate.
And even if you’re not actively looking for a position — if you’re still learning your craft and working your way through your projects — you can get started right now practicing interview questions, so that in six or twelve months, you’ll have done all the legwork necessary to wow your potential employers and win that coveted dream job offer.
Know Your Field
Katie Malone, Physics PHD and former Udacity instructor, has interviewed with Groupon and others in the Bay Area and Chicago. Some deceptively basic questions she’s heard come up again and again:
- What do you think a data scientist is/does?
- What do you think are the most important skills for a data scientist to have?
Thoughtfully crafted answers demonstrate not only your interest and commitment to a career in data, but your communication skills as well. Keep in mind that you may be interviewed by a team lead or HR director without a technical background, in which case you want to be able to explain concepts in the most general terms.
Nick Gustafson, a Udacity data scientist, notes that you need to be prepared to talk in depth about the skills and tools of data analysis. He suggests being prepared to discuss topics such as these:
- Which machine learning model (classification vs. regression, for example) to use given a particular problem.
- The tradeoffs between different types of classification models. Between different types of regression models.
- How to go about training, testing, and validating results. Different ways of controlling for model complexity.
- How to model a quantity that you can’t directly observe (using Bayesian approaches, for example, and when doing so, how to choose prior distributions).
- The various numerical optimization techniques (maximum likelihood, maximum a posteriori).
- What types of data are important for a particular set of business needs, how you would go about collecting that data.
- Dealing with correlated features in your data set, how to reduce the dimensionality of data.
If you find yourself stumped on a question, don’t panic. It’s OK to ask for more context or a relevant example. But be prepared to talk theory as well. You need to know the field inside out to advance in it.
Brush Up Beforehand
Being able to talk fluently and confidently across the range of tools and methods of data analysis means a fair amount of study beforehand. You might find it useful to review your coursework and notes, and to go over the latest tech blogs and industry newsletters.
Udacity data engineer Krasnoshtan Dmytro prepared for his interview by making sure he had a firm grasp on:
- Linear/polynomial regression
- Decision trees
- Dimensionality reduction
Reviewing your past work, and continuing to hone and use those skills, can only help ground you more thoroughly in the material.
Talk about Yourself
Undoubtedly, you’ll be asked to go into some detail about a project you’ve worked on. As Katie Malone says, prospective employers always ask these questions.
This is your opportunity to demonstrate how you approach a data problem and how well you can report and share your results. Pick a project you really loved working on — your passion will underscore your presentation. Make sure you can explain:
- Why you chose the model you did, given the problem you were trying to solve.
- What the features of your data were.
- How you tested and validated the results.
- What you got out of the project.
And be able to extrapolate, talking about your skills in general, answering such questions as:
- When you get a new data set, what do you do with it to see if it will suit your needs for a given project?
- How do you handle big data sets — how would you start work on a project with an associated data set that was many tens of GB or larger?
Know the Company
In addition to knowledge and skill, employers are looking for individuals who will be a good fit with the company and its culture. It goes without saying that you need to do what you can to research the company you’re interviewing with, looking not only at their products, but finding out what you can about their office culture as well. Think about a few reasons (other than a steady paycheck!) you’d like to work there.
Be able to answer:
- What’s a project you would want to work on at our company?
- What data would you go after to start working on it?
- What unique skills do you think you’d bring to the team?
If you’re able to provide a relevant sample or example, even better. According to Malone, you might be asked to do that on the fly anyway. “Not every place does this, and they usually tell you in advance if there’s a coding portion of the interview. But if there is, having a simple framework/methodology that you’re very comfortable with is essential.”
Beyond the Basics
Going through lists of questions typically heard during data science interviews by yourself won’t be as effective as talking through a few of these problems with a friend or fellow student. Mock interviews give you practice not only in organizing and verbalising your thoughts, but in doing so under some degree of pressure (though prepare yourself for the possibility of an anxiety-ridden interview!).
Reach out to your connections in the field and ask them how their own interview processes went and what they would ask if they were looking for a right-fit data analyst with your particular skill set.
Lewis Kaneshiro is a former Udacity instructor and current data scientist at Shazam. Not only has he endured grueling interviews, he is also interviewing KPCB Fellows for summer internships.
When looking for stand-out candidates, Lewis asks, What are the assumptions required for linear regression?
“Surprisingly this question has come up in multiple interviews throughout the years, and it tends to separate those who know linear models as ‘a function in R/Python’ or worse ‘a function in Excel,’ and those who can apply the models to actual data.”
Being able to confidently and capably verbalize and demonstrate (via a whiteboard) those assumptions has been a large chunk of Lewis’s interview experiences. He also hints at the importance of including graphical demonstrations of data that will violate each assumption.
“It is simple, but students who ignore these conditions will tend to blindly apply models without understanding the underlying use cases, and fail to recognize the need for normalization, skew adjustment, outlier detection, or other real-world issues. They also tend to need far too much oversight to be useful in an actual job. Students sometimes think they are being hired to apply a bunch of cool models to data, when in reality 90%+ of work is done with linear models and data normalization/validation.”
Lewis emphasizes that a keen interest in data, curiosity, drive, and tenacity are critical to convey during an interview.
“I’ve found that interest and passion coupled with underlying curiosity and intellectual grit trump simple ‘brilliance’ or ‘intelligence’ every time. We’ve passed on PhD candidates that are pure intellectuals in favor of passionate masters or undergrads willing to dive into data and get dirty.
“For example, we are considering a summer intern who may not appear to be the strongest candidate, but his Kaggle approaches and his personal project (scraping bus arrival times to predict actual arrivals) demonstrate that he is willing to jump into messy situations and power through. This is far more useful than a brilliant candidate who wants to invent a new algorithm. He or she is probably better off in a PhD program, in that case, and they tend to be poor employees.”
And finally, Lewis wants to know of those he interviews, how do you detect outliers?
“Some of the best interviews I’ve had begin and continue with this question. It separates academics who have only applied algorithms to cleaned data sets from those with real-world experience — or real-world curiosities. It is always better to spend the majority of one’s time understanding the raw data and proceeding with the correct approach and model, than to frantically apply every R model known and check the accuracy. I think it is one of the questions asked by the most experienced interviewers.”
To answer this one, Lewis recommends reviewing the practical method described in Sebastian Thrun and Katie Malone’s Intro to Machine Learning “to use LR to fit [a predictive model to a data set], rank errors, and throw out the top 10% and refit iteratively as a test for stability.”
The Bottom Line
If you were to ask a hundred different data professionals what they were asked during their interviews, you’d likely get a hundred different answers. Luckily, you don’t have to collate that information yourself. There are tons of lists and resources full of sample questions that you can use to practice and prepare yourself for the big day.
Knowing your field inside and out, reviewing your projects, and rekindling your passion for data will go a long way toward a successful interview. Stretch and exercise now, by quizzing yourself, challenging yourself, and talking through the problems you’ve tackled — and hopefully you’ll come out the other side of the interview process with the offer you’ve been waiting on.