(Just launched, 5.24.18, the new Data Scientist Nanodegree program!)
Every time you send a text message, type a tweet, post a Facebook photo, click a link, or buy something online, you’re generating data. And considering there are more than 3 billion Internet users in the world (a quantity that’s tripled in the last 12 years) and 4.3 billion cell phone users, that’s a heck of a lot of data.
Fortunately, as data has multiplied, so has the ability to collect, organize, and analyze it. Data storage is cheaper than ever, processing power is more massive than ever, and tools are more accessible than ever to mine the zettabytes of available data for business intelligence. In recent years, data analysis has done everything from predict stock prices to prevent house fires.
All that data crunching requires an army of data masters. Translation: there’s never been a better time to pursue a career in data. The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of 1.5 million people who know how to leverage data analysis to make effective decisions. Enter: you.
The first step on your path to professional data whiz? Taking stock of your three main career options: data analyst, data scientist, and data engineer.
A data analyst is essentially a junior data scientist. It’s the perfect place to start if you’re new to a career in data and eager to cut your teeth.
Data analysts don’t have the mathematical or research background to invent new algorithms, but they have a strong understanding of how to use existing tools to solve problems.
Skills and tools
Data analysts need to have a baseline understanding of five core competencies: programming, statistics, machine learning, data munging, and data visualization.
Beyond technical skill, attention to detail and the ability to effectively present results are equally important to be successful as a data analyst.
How it translates
Data analysts are given direction from more experienced data professionals in their organization. Based on that guidance, they acquire, process, and summarize data. Data analysts are the ones managing the quality assurance of data scraping, regularly querying databases for stakeholder requests, and triaging data issues to come to timely resolutions. They also then package the data to provide digestible insights in narrative or visual form.
An enduring curiosity about data and close examination of evolving best practices and tools serves all data professionals well, no matter the level of seniority.
Some companies treat the titles of “data scientist” and “data analyst” as synonymous. But there’s really a distinction between the two in terms of skill set and experience.
Though data scientists and data analysts have the same mission in an organization—to glean insight from the massive pool of data available—a data scientist’s work requires more sophisticated skills to tackle a higher volume and velocity of data.
As such, a data scientist is someone who can do undirected research and tackle open-ended problems and questions. Data scientists typically have advanced degrees in a quantitative field, like computer science, physics, statistics, or applied mathematics, and they have the knowledge to invent new algorithms to solve data problems.
An enduring curiosity about data and close examination of evolving best practices and tools serves all data professionals well.
Data scientists are extremely valuable to their companies, as their work can uncover new business opportunities or save the organization money by identifying hidden patterns in data (for example, highlighting surprising customer behavior or finding potential storage cluster failures).
Skills and tools
Whereas a data analyst might look at data from only a single source, a data scientist explores data from many different sources. Data scientists use tools like Hadoop (the most widely used framework for distributed file system processing), they use programming languages like Python and R, and they apply the practices of advanced math and statistics.
The exact set of skills differs by organization and project, but this example from Data Science London gives a sense of how complex the data scientist’s toolkit can be:
Image via Data Science London
The most valuable nontechnical skill a data scientist brings to the table is an intense inquisitiveness. Data scientists have to be driven to pose questions and hunt down solutions, and in so doing to unearth information that could transform a business.
As data scientist Gaëlle Recourcé, CSO at Evercontact, said, “I love the power of metrics and tracking user behaviors, because it gives me the opportunity to test personal intuitions and then have real empirical results that allow our team to make data-driven decisions and continually improve our product.”
How it translates
Data scientists essentially leverage data to solve business problems. They interpret, extrapolate from, and prescribe from data to deliver actionable recommendations. A data analyst summarizes the past; a data scientist strategizes for the future.
Data scientists could identify precisely how to optimize websites for better customer retention, how to market products for stronger customer lifecycle value, or how to fine-tune a delivery process for speed and minimal waste.
A data engineer builds a robust, fault-tolerant data pipeline that cleans, transforms, and aggregates unorganized and messy data into databases or datasources. Data engineers are typically software engineers by trade. Instead of data analysis, data engineers are responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
Data engineers essentially lay the groundwork for a data analyst or data scientist to easily retrieve the needed data for their evaluations and experiments.
Skills and tools
Whereas data scientists extract value from data, data engineers are responsible for making sure that data flows smoothly from source to destination so that it can be processed.
As such, data engineers have deep knowledge of and expertise in:
- Hadoop-based technologies like MapReduce, Hive, and Pig
- SQL based technologies like PostgreSQL and MySQL
- NoSQL technologies like Cassandra and MongoDB
- Data warehousing solutions
How it translates
“My responsibilities are quite various,” said Social Searcher Data Engineer Dmitry Novikov. “They range from designing the system architecture and separate modules, to algorithm implementation and infrastructure requirements.”
Data engineers do the behind-the-scenes work that enables data analysts and data scientists to do their jobs more effectively. Here’s a visual look at the specific differences between data engineers and data scientists:
Image via Data Science 101
Chris Beland, who leads the data engineering team at Allclasses, describes what his team does, why it matters, and why he loves it:
“In my work right now, I do a lot of natural language processing, turning semi-structured, human-readable web content into highly structured machine-readable databases. My favorite thing to do is to teach the computer something concrete about the real world, like how humans write calendar dates and what they mean, or how the universe of class topics breaks down into categories and subcategories. Then I come up with some algorithms so my machine can exploit that new knowledge to parse and sort text and make sense of it just a little bit like a human would. I feel a bit like a proud parent when I can check the resulting database, give the program a virtual pat on the head for getting all the right answers, despite getting a lot of inputs I never anticipated, and with a satisfying click ship the data out to people who need it.”
The Bottom Line
You have many options when it comes to a career working with data. If you’re interested in exploring such a career, your three major options are data analyst, data scientist, and data engineer.
Sanjay Venkateswarulu, co-founder of big data analytics and visualization startup Datavore Labs, crystallizes why and how this subdividing has occurred: “Data analysts have morphed into these three or more specialized disciplines. I believe it is the same specialization that doctors went through at the birth of modern medicine. First there was your village leader or elder who played the main role, but as tools of the trade have become more and more specialized, we now have GPs, surgeons, and neurosurgeons.”
If you’re new to the field of data science, you’ll want to start by aiming for the GP in Venkateswarulu’s analogy, an analyst job. As you develop your skills and gain experience, you’ll be able to progress to data scientist or data engineer.
Ready to get started?