Tuesday, June 3, 2014

Facebook Meme Hunting with Exploratory Data Analysis

Exploratory Data Analysis (EDA) is an approach to data analysis for summarizing and visualizing the important characteristics of a data set. It can be used to get a quick, basic understanding of a data set.

With EDA, we can explore and visualize interesting questions such as, “When do memes pop up in social networks?” Below, Facebook data scientist Lada Adamic explains how she uses EDA techniques to follow a meme’s Facebook presence over time.


As Lada explains, a meme is an idea that replicates itself. In a social network, you may see a meme suggesting that you repost it to all of your friends. 

In this example, Lada is interested in the Moneybags meme, which has popped up on Facebook regularly over the years, and is specifically adapted to Facebook by suggesting that readers copy and paste to share with their friends. 

Lada wants to know how this Facebook-specific meme keeps recurring over time. A quick glance at a plot of the meme’s occurrence over time shows that it seems to disappear in between spikes of activity.
Lada tries some Exploratory Data Analysis techniques to see what is happening! She tries looking at meme occurrences over time using a log scale instead of a linear scale, and we are now able to see that the meme does not disappear entirely between spikes, and instead persists in low numbers over time before being replicated in variations and becoming popular once again. 

For more on data investigation, check out Exploratory Data Analysis, a self-paced course where you’ll learn with the Facebook data science team how to investigate, visualize and summarize data using R.