Andrew
Andrew

When I tell people that I’m a data science intern at Udacity, I’m often greeted with an intrigued smile or a quizzical look. “How exactly does Udacity use data?” I’m often asked. After all, the words “data-driven” and “education” until recently rarely appeared in the same sentence; now Udacity, despite being only 50 people, already has a data team?

Data are so important at Udacity because being data-driven is the only way we can keep learning and getting better for our students. Like any other company, we learn a lot from our data, but for us it’s not just about optimizing visitors and conversion numbers. Analyzing data is how we learn about what differentiates our most driven students from the rest, which aspects of our teaching and tutoring increase comprehension and retention, and why some students think about Udacity 24/7 and others have only brief infatuations with it.

One particularly exciting part of my work is taking findings from four decades of pedagogical research, implementing them in Udacity’s classrooms, and using online activity data to analyze the impact of these interventions on students’ learning. At Udacity, we now have the opportunity to take findings that originated from studies on tens of students in physical classrooms – such as Carol Dweck’s concept of growth mindset – and apply learnings to hundreds of thousands of students with improved teaching. But even more powerful is Udacity’s ability to conduct our own pedagogical research at scale on a rapidly growing worldwide classroom that was not even possible a year ago. Pedagogical areas we’re exploring include the importance of metacognition, expectation setting around formative assessment, and even new online challenges such as which characteristics of video keep students most engaged.

Our team learns how to be data-driven about pedagogy not only from pedagogical research, but also from the rest of Silicon Valley. For instance, when we were first faced with the need to categorize our students in order to determine who was struggling and needed attention versus who was motivated and just needed resources, we ran into a lot of trouble defining these “inactive” versus “active” users. How should we categorize, for example, a college student who finished the Introduction to Computer Science course over summer, never visited Udacity during fall semester, and finished our Web Development course over winter break? The only meaningful thing we could say is that the student was “active” over summer, “inactive” during the fall, and “active” again during winter, but it took inspiration from Twitter’s user model before we finally came up with a user categorization that made sense in its dependency on time:
 Thanks in part to the example of Twitter and data-driven Silicon Valley, we now have a user model that allows us to meaningfully segment our students just like teachers cluster their students in order to teach them differently.

My data work has been a pleasantly refreshing flavor of intellectual stimulation. Looking at data to devise better ways to teach diverse students is a different (and often more difficult) challenge than my math studies back at Harvard. Data-driven pedagogy is the kind of problem that I can see captivating me for life, just as it has already captivated some of the brightest minds in cognitive science, developmental psychology, and education. Just like a student seeks a teacher’s feedback to advance her understanding, Udacity endlessly seeks feedback in its data to improve its teaching; for both the student and Udacity, the challenge to make progress is never-ending.Want to pioneer data-driven pedagogy with us? Think you can do better than we are? I bet you can! The good news is that we’re hiring smart people like you. Join us in changing the world of education.
Andrew LiuData Science Intern