Wednesday, May 7, 2014

Social Computing Data Analysis - Will Facebook die out by 2017

The paper titled Epidemiological modeling of online social network dynamics by John Cannarella & Joshua A. Spechler (2014) uses epidemic recovery models (irSIR - infection recovery S = number susceptible, I =number infectious, R =number recovered) to predict the rise and fall of Online Social Networks using Google Trends search data in place of actual usage reports. It first tries to fit the model with the rise and fall of MySpace usage, then uses the adapted model to predict if the same effects apply to Facebook data and, therefore, when Facebook would see similar decline.



The usage of search terms to predict the spread of disease has been proven before and is the basis for Google's Flu Trends, which attempts to predict when and where flu epidemics will hit next. But the model does not exactly fit OSN adoption and the paper discusses some of the shortcoming of the model, but not all. The first shortcoming discussed by the paper is that, differently from diseases, people do not join an OSN expecting and/or making a conscious effort to leave them. People will remain members for as long as it is interesting to them, meaning the as long as there are enough friends using an OSN to justify being a member, they'll stay.


Another shortcoming (one that is not discussed satisfactorily in the paper) is that R, the number of people who recovered from a disease, is replaced by people opposed to join an OSN, both those who never joined in the first place and those who joined and left deciding to never come back. This adaptation is necessary because it assumes S+I+R = N, meaning it assumes population remains constant during the study and adding up susceptible, infected and recovered members gives you the full population. This carries two shortcomings:

First, you cannot consciously decide to resist/accept to be infected by a disease, but people make conscious decisions about joining/not joining an OSN all the time, decisions that can change with time (natural immunity can fluctuate, but not be switched on and off at will);
Second, internet usage numbers in the period have not remained constant, instead growing exponentially. The assumption of S+I+R = N is feeble at best.



The third (or fourth?) shortcoming is that the data includes a highly eschewed input, as Google Trends data shows a circa 20% jump around October 2012 that never recovered. This data was 'corrected' by multiplying all input after that date by a correction factor derived from their own projections of where the data points should be, without feedback from Google on what exactly is the nature of the change in data. The turning point in the search data occurs only after the correction factor is applied, putting into question how much of the reduction observed is actually bias generated by the researchers' own 'correction' of input data.

All in all, the paper interestingly draws a parallel between the decline of MySpace and historical Facebook usage data, but the predictions derived by the parallel must be taken with a big grain of salt.

Reference

John Cannarella & Joshua A. Spechler, Epidemiological modeling of online social network dynamics (2014)

Images: John Cannarella & Joshua A. Spechler, Epidemiological modeling of online social network dynamics (2014)

Pingback

No comments:

Post a Comment