How To Build A Data Analysis Portfolio That Will Get You Hired

Getting your data projects online to get hired

At Udacity, we strive to be as responsive as possible to student queries of all kinds, and virtually every member of every team gets the opportunity to speak directly with students at one time or another. This is in fact one of the most gratifying things about working at Udacity, this direct connection to our students.

When certain subjects and topics start to come up with more frequency, we often turn to a particular Udacian for insight. One subject that has definitely come up a great deal lately is the question of how to get data projects online. To speak to this matter, our own Mat Leonard—a Udacity course developer—is here to offer some thoughts and experience!

First, a bit of “official” background on Mat:

Mat Leonard earned a PhD in Physics from UC Berkeley, where he wrote his dissertation on neural activity related to short term memory. When it came time to make sense of his data, he turned to Python and the science stack including Numpy, Scikit-learn, and Pandas. He created his personal blog, Matatat.org, to publish small data projects online. For example, he explored linear regression models for predicting body fat percentage and a Bayesian approach to A/B testing.

Continue Reading

Hottest Jobs in 2016 #3 Data Scientist

03_Data-Scientist_Blog-Facebook

Few jobs have been surrounded by as much hyperbole as has Data Scientist. Most famously, the Harvard Business Review referred to is as “The Sexiest Job of the 21st Century.” With hype like that, a backlash is inevitable, and there certainly was one, with some of the more apocalyptic voices even stating that the role would be replaced completely by automation within a decade.

That’s not going to happen.

Continue Reading

Putting Deep Learning To Work

This is a guest post from Vincent Vanhoucke, a Principal Scientist at Google. He is a technical lead and manager in Google’s deep learning infrastructure team.

Screen Shot 2016-01-20 at 3.32.41 PM

Deep learning is a modern take on the old idea of teaching computers, instead of programming them. It has taken the world of machine learning by storm in recent years, and for good reason! Deep learning provides state-of-the-art results in many of the thorniest problems in computing, from machine perception and forecasting, to analytics and natural language processing. Our brand new Deep Learning Course, a collaboration between Google and Udacity, will have you learning and mastering these techniques in an interactive, hands on fashion, and give you the tools and best practices you need to apply deep learning to solve your own problems.

Continue Reading

Improving With Experience: Machine Learning in the Modern World

MLND-blog-post

In the elevators and the stairwells, at desks and in conference rooms, by the coffee machine and in the library, everyone at Udacity is talking about machine learning. Why? Because we’re launching a brand-new Machine Learning Engineer Nanodegree program, and everyone is very excited!

Machine learning is a truly unique field, in that it can seem both very complicated, and very simple. For example, compare the following two descriptions:

“Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.”

and

“Machine learning is the science of getting computers to act without being explicitly programmed.”

The first is from Wikipedia, the second is from a Stanford course description. Somewhat different flavor, no? So how can Machine Learning be both so complicated and so simple? The answer lies in its omnipresence. Machine Learning is literally everywhere.

But what IS Machine Learning?

Where did it come from, what does it mean, and why is it important?

At its core, machine learning is about making sense of large quantities of data. And note: by ‘large’, we mean LARGE—literally millions of just about everything you can count, quantify, and analyze: millions of patients, millions of students, millions of trades, millions of tweets. The sheer volume of data the modern world now produces is what makes machine learning both necessary, and possible.

Of course fields like statistics and algorithms have long aimed to summarize data for making decisions and predictions, and many of the formulas and techniques used in machine learning were developed by mathematicians centuries ago. What is new is the quantity. Increases in computational power allow us to perform analyses in hours that would have taken centuries by hand.

The result: a billion times more data than we’ve ever had before, and a billion times more power to make sense of it. How is this all made possible? Machine learning! Literally, a machine “learning” concepts from data. It learns like we do every day: it looks at experiences and observations and discerns useful information. But while we can do that based on a couple dozen experiences, machine learning can do it based on millions of experiences, all rigorously and numerically defined.

So what do machine learning engineers actually do?

Simple! Machine learning engineers build programs that dynamically perform the analyses that data scientists used to perform manually. And why is this important? Think for a moment of all the fields where data is very important. Healthcare, education, astronomy, finance, robotics, and more. Machine learning is already impacting them all, and in fact, there is virtually no field that machine learning won’t impact!

This is one of the key reasons why machine learning is so fascinating, because it’s everywhere. Often, it’s operating when we don’t even realize it. Ever used Google Translate? How about Siri? Your Facebook News Feed? All made possible through machine learning! If you know a bit about Udacity, you’ll know that our founder and CEO Sebastian Thrun himself has a long and remarkable history in the field, from founding a Master’s program at Carnegie-Mellon that evolved into a Machine Learning PhD program, to being director of the Artificial Intelligence Laboratory at Stanford University, to leading the development of the Google driverless car.

Google Translate may in fact be one of the most famous (and most utilized!) examples of machine learning in action, and Google’s description of how it works makes for a pretty classic illustration of the concepts at play:

Machine Translation is a great example of how cutting edge research and world class infrastructure come together at Google. We focus our research efforts towards developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality.

The key sentence here is “techniques that improve with more data.” This is really the essence of machine learning.

In 2006, Tom Mitchell published The Discipline of Machine Learning. In it he posed the following question:

“How can we build computer systems that automatically improve with experience?”

Machine learning is the answer to this question, and it’s why we’re launching our new Machine Learning Engineer Nanodegree Program!

 

Moneyball Your Career And Become A Data Analyst!

DAND_BlogPost

There is virtually no field in the modern employment landscape that does not rely on data. The Oakland Athletics made data so famous it became a Brad Pitt movie!

But when I ask you what you want to be when you grow up, are you likely to say “Data Analyst?” Probably not. Why is that?

Could it be a holdover sentiment from another era, when data really wasn’t very exciting? Say “data” to some people and it may conjure in their minds images of anonymous automatons squinting through bifocals at reams of seemingly unintelligible numbers as they sit hunched over drab desks in drab offices producing drab reports for drab enterprises that do drab things.

Or maybe it’s the idea that data only ever sits in the backseat? Data provides the numbers, but someone else goes out and gets the glory? Data cast as the perennial Cyrano de Bergerac?

Maybe data just seems too hard?

Whatever the reasons why Data Analyst may not be tip of tongue when it comes to career choices, it may be time to revise any prevailing assumptions about the field, because data has never been hotter as a career. Why? Because EVERYONE needs to know how to collect it, analyze it, contextualize it, report on it, and act on it.

Continue Reading

Questions From Data Science Interviews

You’ve just spent the last year working on honing your skills through the Data Analysis program. You dutifully spent your evenings — sometimes late into the night — on homework and projects, ignoring friends, family, and the ever-growing mountain of laundry in the corner. You coded hobby projects for your local municipality and wrote up an entire epic (at least, you thought it was epic) series of blog posts on your findings. And when a friend mentions her department’s struggling marketing efforts, your mind spins away on ways you might capture meaningful data and process the results.

Now you want to get a job. Which, despite your best efforts as a data analyst, depends almost entirely on acing the interview.

It should come as no surprise that careful preparation, and understanding the expectations going into the process, is what it takes not only to survive an interview for a data analyst position, but to set yourself apart as the best and most qualified candidate.

detect_outliers

And even if you’re not actively looking for a position — if you’re still learning your craft and working your way through your projects — you can get started right now practicing interview questions, so that in six or twelve months, you’ll have done all the legwork necessary to wow your potential employers and win that coveted dream job offer.

Know Your Field

Katie Malone, Physics PHD and former Udacity instructor, has interviewed with Groupon and others in the Bay Area and Chicago. Some deceptively basic questions she’s heard come up again and again:

  • What do you think a data scientist is/does?
  • What do you think are the most important skills for a data scientist to have?

Thoughtfully crafted answers demonstrate not only your interest and commitment to a career in data, but your communication skills as well. Keep in mind that you may be interviewed by a team lead or HR director without a technical background, in which case you want to be able to explain concepts in the most general terms.

Nick Gustafson, a Udacity data scientist, notes that you need to be prepared to talk in depth about the skills and tools of data analysis. He suggests being prepared to discuss topics such as these:

  • Which machine learning model (classification vs. regression, for example) to use given a particular problem.
  • The tradeoffs between different types of classification models. Between different types of regression models.
  • How to go about training, testing, and validating results. Different ways of controlling for model complexity.
  • How to model a quantity that you can’t directly observe (using Bayesian approaches, for example, and when doing so, how to choose prior distributions).
  • The various numerical optimization techniques (maximum likelihood, maximum a posteriori).
  • What types of data are important for a particular set of business needs, how you would go about collecting that data.
  • Dealing with correlated features in your data set, how to reduce the dimensionality of data.

If you find yourself stumped on a question, don’t panic. It’s OK to ask for more context or a relevant example. But be prepared to talk theory as well. You need to know the field inside out to advance in it.

Brush Up Beforehand

Being able to talk fluently and confidently across the range of tools and methods of data analysis means a fair amount of study beforehand. You might find it useful to review your coursework and notes, and to go over the latest tech blogs and industry newsletters.

Udacity data engineer Krasnoshtan Dmytro prepared for his interview by making sure he had a firm grasp on:

  • Linear/polynomial regression
  • Decision trees
  • Dimensionality reduction
  • Clustering

and keeping up with Data Science Weekly and Machine Learning Mastery, as well as sharpening his skills through Hacker Rank and Kaggle Competitions.

Reviewing your past work, and continuing to hone and use those skills, can only help ground you more thoroughly in the material.

Talk about Yourself

Undoubtedly, you’ll be asked to go into some detail about a project you’ve worked on. As Katie Malone says, prospective employers always ask these questions.

This is your opportunity to demonstrate how you approach a data problem and how well you can report and share your results. Pick a project you really loved working on — your passion will underscore your presentation. Make sure you can explain:

  • Why you chose the model you did, given the problem you were trying to solve.
  • What the features of your data were.
  • How you tested and validated the results.
  • What you got out of the project.

And be able to extrapolate, talking about your skills in general, answering such questions as:

  • When you get a new data set, what do you do with it to see if it will suit your needs for a given project?
  • How do you handle big data sets — how would you start work on a project with an associated data set that was many tens of GB or larger?

Know the Company

In addition to knowledge and skill, employers are looking for individuals who will be a good fit with the company and its culture. It goes without saying that you need to do what you can to research the company you’re interviewing with, looking not only at their products, but finding out what you can about their office culture as well. Think about a few reasons (other than a steady paycheck!) you’d like to work there.

Be able to answer:

  • What’s a project you would want to work on at our company?
  • What data would you go after to start working on it?
  • What unique skills do you think you’d bring to the team?

If you’re able to provide a relevant sample or example, even better. According to Malone, you might be asked to do that on the fly anyway. “Not every place does this, and they usually tell you in advance if there’s a coding portion of the interview. But if there is, having a simple framework/methodology that you’re very comfortable with is essential.”

Beyond the Basics

Going through lists of questions typically heard during data science interviews by yourself won’t be as effective as talking through a few of these problems with a friend or fellow student. Mock interviews give you practice not only in organizing and verbalising your thoughts, but in doing so under some degree of pressure (though prepare yourself for the possibility of an anxiety-ridden interview!).

Reach out to your connections in the field and ask them how their own interview processes went and what they would ask if they were looking for a right-fit data analyst with your particular skill set.

Lewis Kaneshiro is a former Udacity instructor and current data scientist at Shazam. Not only has he endured grueling interviews, he is also interviewing KPCB Fellows for summer internships.

When looking for stand-out candidates, Lewis asks, What are the assumptions required for linear regression?

“Surprisingly this question has come up in multiple interviews throughout the years, and it tends to separate those who know linear models as ‘a function in R/Python’ or worse ‘a function in Excel,’ and those who can apply the models to actual data.”

Being able to confidently and capably verbalize and demonstrate (via a whiteboard) those assumptions has been a large chunk of Lewis’s interview experiences. He also hints at the importance of including graphical demonstrations of data that will violate each assumption.

“It is simple, but students who ignore these conditions will tend to blindly apply models without understanding the underlying use cases, and fail to recognize the need for normalization, skew adjustment, outlier detection, or other real-world issues. They also tend to need far too much oversight to be useful in an actual job. Students sometimes think they are being hired to apply a bunch of cool models to data, when in reality 90%+ of work is done with linear models and data normalization/validation.”


Data science interview questions and topics. via udacity

Lewis emphasizes that a keen interest in data, curiosity, drive, and tenacity are critical to convey during an interview.

“I’ve found that interest and passion coupled with underlying curiosity and intellectual grit trump simple ‘brilliance’ or ‘intelligence’ every time. We’ve passed on PhD candidates that are pure intellectuals in favor of passionate masters or undergrads willing to dive into data and get dirty.

“For example, we are considering a summer intern who may not appear to be the strongest candidate, but his Kaggle approaches and his personal project (scraping bus arrival times to predict actual arrivals) demonstrate that he is willing to jump into messy situations and power through. This is far more useful than a brilliant candidate who wants to invent a new algorithm. He or she is probably better off in a PhD program, in that case, and they tend to be poor employees.”

And finally, Lewis wants to know of those he interviews, how do you detect outliers?

“Some of the best interviews I’ve had begin and continue with this question. It separates academics who have only applied algorithms to cleaned data sets from those with real-world experience — or real-world curiosities. It is always better to spend the majority of one’s time understanding the raw data and proceeding with the correct approach and model, than to frantically apply every R model known and check the accuracy. I think it is one of the questions asked by the most experienced interviewers.”

To answer this one, Lewis recommends reviewing the practical method described in Sebastian Thrun and Katie Malone’s Intro to Machine Learning “to use LR to fit [a predictive model to a data set], rank errors, and throw out the top 10% and refit iteratively as a test for stability.”

The Bottom Line

If you were to ask a hundred different data professionals what they were asked during their interviews, you’d likely get a hundred different answers. Luckily, you don’t have to collate that information yourself. There are tons of lists and resources full of sample questions that you can use to practice and prepare yourself for the big day.

Knowing your field inside and out, reviewing your projects, and rekindling your passion for data will go a long way toward a successful interview. Stretch and exercise now, by quizzing yourself, challenging yourself, and talking through the problems you’ve tackled — and hopefully you’ll come out the other side of the interview process with the offer you’ve been waiting on.

Why One Udacity Student Dropped A Career in Law to Take On Data Science

Today’s post comes from Data Analyst student Christian Strobl, a former lawyer turned startup cofounder, currently living in Munich and learning data science within the Data Analyst Nanodegree. Read on to see how Christian made the big career shift outside of the world of law.

01In 2007 when I started in law school, it felt like a different time. I was looking for career security and was always interested in social science, literature, and ultimately doing something with words. The world was so insecure with the impending financial crisis, and I figured I would choose a path that guaranteed the most financial stability. I figured that if I worked hard in law school, I could be successful, have a stable path, and earn a good salary that would lead to a safe career path. That was the main inspiration and thinking behind attending law school.

In 2013 after my first law exam, I was working in a big firm in Frankfurt. I loved my boss and worked with a great team, but I couldn’t help feeling unfulfilled by my job. At about the same time, a friend of mine knew that I was very interested in technology and always reading up on tech blogs, so he invited me to attend a hackathon in Berlin. So I did, and even though I didn’t know how to code, it was still such a great experience. While I was there, I met someone that pointed me in the direction of online courses to start learning how to program. Later that year, I started taking courses online and quit law shortly after. I haven’t looked back.

Continue Reading