2012 Volume 15 Issue 3

Facebook: Data Mining the World’s Largest Focus Group

Facebook: Data Mining the World’s Largest Focus Group

With the data at Facebook’s disposal, could it predict outcomes within the typically volatile financial markets?

Facebook is an online social networking platform where users interact in real time. Though Facebook advertises as a free service, trade-offs exist. Subscribers must surrender personal information upon accepting Facebook’s Terms and Conditions. The personal information users relinquish before joining the Facebook community include: work history, level of education, date of birth, interests, contact information, relationship status, geographic location, and “Likes” of consumer brands, musical interests, and favorite celebrities. Often users agree to privacy terms without thoroughly reading or understanding the implications, which allows Facebook to capture, manage, and sell user data. This data is formatted into a historical timeline, transforming Facebook’s over 900 million active users into the largest focus group in the world. Facebook’s ability to successfully obtain and organize data on its massive user base raises two questions:

With the seemingly indomitable amount of data at Facebook’s disposal, can the company accurately predict outcomes within the typically volatile financial markets?
If Facebook is able to predict such outcomes, can it go a step further and manipulate or influence such predicted events?

This article will investigate the validity of these questions, along with the implications of the answers for Facebook users, the financial markets, and Facebook itself.

Monetizing You

With 526 million users posting 3.2 billion “Likes,” comments, picture uploads, and shares on a daily basis, the task of managing data on Facebook’s nearly 1 billion users once seemed insurmountable. Facebook has an estimated 60,000 or more servers that store daily content, resulting in the construction of a supplementary 300,000-square-foot facility in Prineville, Oregon, and another center located in Rutherford County, North Carolina.[1] Regarding the organization of data, in 2012 Facebook launched its Timeline feature, which organizes, tracks, and displays a user’s activity in the form of milestones ranging from sign-up through the present day. Facebook’s ability to successfully organize a user’s information and activity opens a new revenue stream known as data mining.

Data mining is the process of discovering patterns in a consumer’s online behavior, habits, and preferences. This information is then categorized into fields where relationships have been identified.[2] Facebook has been able to successfully collect and organize user data, transforming user information into targeted advertisements. User data is highly coveted by private businesses, public companies, and non-profits, such that companies pay Facebook for advertising space to reach a targeted consumer base. A business seeking to advertise on Facebook completes fields specifying its desired demographics, such as country, gender, age, interests, etc. Subsequently, Facebook returns the estimated impressions an advertisement will receive based on this target information.[3]

During Facebook’s infancy, data management strategies were nebulous, focusing primarily on creating bandwidth to support the website’s growing customer base.[4] Now, with Facebook’s highly active user base and data management timeline tool, Facebook has ultimately laid the foundation to crowd source, or derive insights from its large user base and technology platform. The acquisition of user data remains an important revenue generator for Facebook. However, to compete in the 21st century, Facebook must continue to diversify its monetization strategy.

Facebook’s potential revenue streams are threatened by user backlash regarding both the company’s privacy policy and utilization of subscriber data. Users may not want their information shared, and are sometimes unaware when it is shared. Likewise, Facebook is profiting from user information through targeted advertising. To mitigate any privacy issues, Facebook might consider more transparent policies, which clearly state that the information a user provides on the Facebook platform can and may be shared with third parties for profit. In addition, Facebook may state that the company is providing a free service to the user, such that the user can promote oneself and share his/her information with others.

Predictability in the Financial Industry

Current forecasting tools in the financial industry include the Index of Leading Economic Indicators, the Consumer Sentiment Index (CSI), the Conference Board’s Consumer Confidence Index (CCI), and the American Association of Individual Investors (AAII). These indices are used to measure the health of the U.S. economy based on consumer/investor opinion. However, indices produce only marginal indicators of the market.

What do other forms of media imply?

Regarding media, Tetlock states that the “high values of media pessimism provoke downward pressure on market prices, which leads to high market trading volume.”[5] Internet chat rooms concentrating on stocks, however, did not reveal any material correlation between positive messages and market returns.[6] More accurate market forecasts require a platform with “real-time data sets, from thousands of data points, from historical archives of data collection agencies” according to Ludvigson.[7]

We suggest that the real-time platform of Facebook’s millions of users be used to create an algorithm based on user sentiment to predict more accurate, short-term forecasts of market movement. Previous and current market resources use data from much smaller populations that are solicited, not voluntarily given. Often these resources are stale, as they are aggregated and released from the previous quarter. Past studies also relied on the interpretation of chat room feedback, thus implying a degree of subjective error.

Subscribers voluntarily submit positive responses via Facebook’s “Like” function for businesses and products. Though “Likes” provide only positive feedback, the information is captured immediately. By gathering subscriber feedback and data mining for trends, Facebook can use real-time data with other traditional resources to produce more accurate forecasts of the financial market. These forecasts would contain more frequent data points, which produce market correlations and lead to more precise market movement predictions.

However, the positive feedback communicated by the “Like” function does not convey the extent to which the subscriber feels positive toward another person, company, and/or product. The integration of a user’s comments on a particular company, brand, or the economy as a whole may provide better insights. Whether the information Facebook has acquired, and continues to acquire, is both genuine and robust enough to provide psychographic insight is debatable. Nevertheless, this information should be used in conjunction with the other indices and market information, such as industry experts and analysts, to ensure accuracy.

User-generated content from social media or other mediums may present a suitable method to gauge sentiment.[8] However, the same study determined that inaccuracies arose from using sentiment from an entire social media population; therefore, experts within the masses should be identified. The aggregation of data from Facebook utilized in conjunction with more traditional resources could produce more accurate market predictions, otherwise known as crowd sourcing.

Practitioner Take-Aways & Implications

In the previous sections, the analysis presupposes the possibility of Facebook predicting activity in sentiment-based markets. If Facebook is able to predict an outcome, a working model or theory could be built around that outcome. With a working model, Facebook could conceivably influence an outcome to reflect an individual’s preference, especially if the predicted outcome is contrary to the individual’s desire. However, Facebook is not acting contrary to their customer’s desires, the company is leveraging proprietary customer information to exploit a potential revenue stream.  The constant updating of customer information is the feedstock to predictive algorithms and stock market forecasting, which in turn is the potential revenue stream. This assumption recognizes that Facebook could not only predict financial markets, but also influence and ultimately control them.

A critical mass of users is needed to support this assumption, which Facebook currently offers, through a user base characterized by diverse demographics, achieved by offering free accounts. It seems free accounts provided by Internet services follow the adage, if you cannot quite figure out how a company is making money, it is you (specifically your information) that is being sold.[9] In Facebook’s business model, users are not only customers, but the inventory as well. User information is sold every time users engage free services. Facebook users are thus engaged in a tacit agreement with Facebook through which their information will be used and possibly sold.

How is Facebook relevant to the reader and the reader’s business?

One area of opportunity Facebook offers businesses and entire industries is to better understand their customers through the rich set of data and analysis provided by its users. However, this opportunity relies on the presumption that Facebook would be willing to share or sell its data and analysis. These companies would then use the information to their own advantage, and possibly sell the insights derived to the highest bidder(s).

Validation & Proof of Concept

While much of this discussion could be considered conjecture, the business of data mining the past to predict the future is neither a pipedream nor a future technology. In fact, several third-party platforms have utilized social media to predict a movie’s success once released, while other platforms use sentiment analytics to measure the popularity of a company’s brand amongst the general population. When Facebook is able to properly mine and analyze the data it collects from its users, it will be able to predict its users’ wants and needs. Facebook will also be able to predict trends in different markets (i.e., consumer products and financial).

In January 2010 a small start-up company based in Massachusetts, Recorded Future, predicted Yemen’s impending famine and conflict. This was a full year ahead of the actual conflict, something Recorded Future was able to predict based on public information gathered from various resources across the Internet by using algorithms built “in-house.”[10] With Facebook’s enormous repository of information and the already established practice of mining data to predict the future, Facebook could soon adjust its business model to not only sell predictions of the future, but also influence the future based on those very same predictions.


Facebook is in a prime position to transition from a social media platform into a revenue-generating predictor of sentiment-based markets. Facebook started out as a platform for users to connect with friends, and quickly grew into one of the most visited websites in the world. The implications are potentially endless with the growing number of users Facebook possesses and the massive warehouse of data, which can be analyzed. A continual loop can be created through user-generated content continually feeding Facebook’s user data repository and prediction algorithms, and thus further influencing outcomes. Facebook stands at the precipice of a major decision. If Facebook utilizes the data it is collecting, the firm could create a market disruption not only in the information technology sector, but also in sentiment-based markets.

[1] Miller, Rich, “Facebook Building 2nd Data Center in Oregon,” Data Center Knowledge. July 26, 2011. Accessed April 8, 2012, http://www.datacenterknowledge.com/archives/2011/07/26/facebook-building-2nd-data-center-in-oregon/.

[2] Palace, Bill, Data Mining, June 1996. Accessed February 25, 2012, http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm.

[3] “Facebook for Business: Creating Advertisements,” Facebook.com, July 29, 2012. Accessed July 29, 2012, http://www.facebook.com/business/connect.

[4] “Facebook’s Latest News, Announcements and Media Resources,” Facebook.com. April 8, 2012. Accessed April 8, 2012, http://newsroom.fb.com/.

[5] Tetlock, P. C, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” Journal of Finance, 62:3 (2007): 1139-1168.

[6] Antweiler, Werner, and Murray Z. Frank, “Is All That Just Noise? The Information Content of Internet Stock Message Boards,” Journal of Finance, 59:3 (2005): 1259-1294.

[7] Ludvigson, S. C., “Consumer Confidence and Consumer Spending,” Journal of Economic Perspectives, 18:2 (2004): 29-50.

[8] Hill, S., and N. Ready-Campbell. “Expert Stock Picker: The Wisdom of (Experts in) Crowds,” International Journal of Electronic Commerce, 15:3 (2011): 73-101.

[9] Warner, Gregory, “Should Facebook Continue to Be Free?,” Podcast audio. American Public Media: Marketplace Morning Report for Friday, May 18, 2012. 2:18. Accessed July 22, 2012. http://www.marketplace.org/topics/tech/should-facebook-continue-be-free.

[10] “Yemen Heading for Disaster in 2010?” RecordedFuture.com, January 12, 2010. Accessed April 8, 2012, https://www.recordedfuture.com/2010/01/12/yemen-heading-for-disaster-in-2010/.

Print Friendly, PDF & Email
Authors of the article
Brian M. Kwong
Brian M. Kwong, is an Associate for a financial institution in Los Angeles focused on private investments. He is pursuing an MBA from the Graziadio School of Business and Management at Pepperdine University with an expected completion in December 2012. He earned his BA from the University of California, Santa Barbara. In his free time, he enjoys spending time with his friends and family.
Sean M. McPherson
Sean M. McPherson, is a VP of Business Development for a software development firm, which specializes in start-up incubation. He is pursuing an MBA from the Graziadio School of Business and Management at Pepperdine University, with a concentration in Entrepreneurship and an expected completion in December 2012. Sean currently lives in Los Angeles, CA, where he enjoys spending time with his family, friends, and fiancée, and volunteering with local start-ups.
Jonathan F. A. Shibata
Jonathan F. A. Shibata, is a Vice President in the financial services industry, focused on business lending and consulting. He is currently pursuing an MBA from the Graziadio School of Business and Management at Pepperdine University with an expected completion in April 2013. Jonathan earned his BS in Business Administration from the University of Iowa. In his free time, he enjoys spending time with his family and friends and devoting time to community service.
Oliver T. Zee
Oliver T. Zee, is a Technical Project Manager for a business solutions firm in Los Angeles specializing in delivering technological solutions to solve business problems. He is pursuing an MBA from the Graziadio School of Business and Management at Pepperdine University, with a concentration in Entrepreneurship and an expected completion in December 2012. He earned his BS in Computer Engineering and Computer Science from the University of Southern California. Oliver enjoys spending his free time with his wife, barbecuing with friends, and snowboarding when conditions permit.
More articles from 2012 Volume 15 Issue 3
Related Articles