Social Monitoring for Public Health. Michael J. Paul
Читать онлайн книгу.With social media adoption expanding,11 and with new advancements in technology, social media is likely to have an increasing impact on public health.
1Flu has such a low mortality rate that many people are surprised to learn that influenza is a leading cause of death in the United States. Since it infects so many people, even a low mortality rate leads to thousands of deaths.
2
http://www.who.int/about/role/en/
3
http://www.cdc.gov/nphpsp/essentialservices.html
4
http://www.who.int/topics/public_health_surveillance/en/
5
http://www.cdc.gov/flu/weekly/
6
https://www.cdc.gov/mmwr/mmwr_nd/
7
http://www.well-beingindex.com/
8
http://www.cdc.gov/ehrmeaningfuluse/syndromic.html
9
http://www.nowtrendingchallenge.com/
10
https://nowtrending.hhs.gov/
11
http://www.pewinternet.org/fact-sheets/social-networking-fact-sheet/
CHAPTER 3
Social Data
What constitutes “social data” and how can this type of data be used for public health monitoring? This chapter describes different types of social media, including well-known platforms like Twitter and Facebook, as well as other online platforms that may be less known but still valuable. We take “social data” as an broad term that includes a variety of the types of online data.
Before embarking on any social monitoring project, it is important to understand the social media landscape and the options for data sources. For example, it may surprise you to learn that Facebook, despite being the world’s largest social network, is rarely used for social monitoring. This is due to a variety of factors, including how the platform is used by people and the tools available for data collection. In contrast, Twitter dominates the social monitoring community, for reasons that are scientifically motivated (it provides a large and relatively representative sample) and reasons that are not (the data is free and convenient). We’ll compare different platforms, describing the affordances of different data sources and data types, their strengths and weaknesses, and their appropriateness for different health applications.
Finally, we briefly describe how to obtain data from a few popular platforms, with pointers to tools and tutorials.
3.1 WHAT IS SOCIAL DATA?
Social data refers to data that is created by people with the goal of sharing the data with others. For example, when people post messages or photos online to share with others, the text and images of the messages and photos are considered social data. Social media websites are the platforms through which social data is created. Examples of popular social media platforms include Twitter and Facebook. In general, social data is created by ordinary people, rather than professional writers or domain experts (e.g., clinicians).
This book will also use “social data” to refer to data that is created by people on the Web but not necessarily intended for social sharing, including search query data, because this data is prominently used for public health monitoring in addition to standard social media data. We also include data generated by people’s online activities other than intentionally posted messages, like location information—the “digital traces” left behind by people’s online behavior [Welser et al., 2008]. What we don’t generally include is data created specifically for researchers (like survey responses), although we do discuss how new technologies can facilitate collection of that kind of data.
3.2 MONITORING OF SOCIAL DATA
Social monitoring refers to the act of analyzing social data—either by manually reading the data, or automatically using computational tools—to learn about the world. Many people use social media platforms to publicly share information about what they are currently doing and thinking. By analyzing social data, it is possible to infer what is happening around the world and within populations.
Social monitoring is also a form of infodemiology, or information epidemiology, a term introduced by Eysenbach [2002] to describe the study of health determinants and the sharing of health information on the internet. Some of the earliest studies of health on the internet looked at the quality of health information on available websites [Davison, 1996, Impicciatore et al., 1997]. Social monitoring focuses on studying user-generated content to learn about a population.
It is possible to measure and understand all sorts of population opinions and behaviors through social monitoring. For example, social media can be monitored to measure consumer sentiment [Bian et al., 2016, Chamlertwat et al., 2012] and political sentiment [O’Connor et al., 2010]. Social monitoring has been used for forecasting sales [Asur and Huberman, 2010], predicting financial markets [Bollen et al., 2011], forecasting elections [Digrazia et al., 2013, Tumasjan et al., 2010], and estimating crowd sizes [Sinnott and Chen, 2016] and traffic congestion [Tse et al., 2017]. It is also a rich resource for interdisciplinary work, such as combining health and economics [Althouse et al., 2014, Ayers et al., 2012b] or health and politics [Dredze et al., 2017].
Social monitoring can be used to answer scientific questions, often in social science [Cioffi-Revilla, 2010, Lazer et al., 2009], including learning regional dialects [Eisenstein et al., 2010] and learning associations with personality traits [Schwartz et al., 2013].
We mention all these examples from different areas to give a taste of the enormous potential of social data. Of course, this book will focus on applications in public health, which we’ll survey throughout.
3.2.1 ACTIVE VS. PASSIVE MONITORING
Social monitoring can take an active or passive approach. Active monitoring requires explicit participation from users, while passive monitoring makes use of data already published by users, without requiring user interaction. An example of active monitoring is asking a sample of Twitter users which presidential candidate they favor, while a passive monitoring approach might analyze what Twitter users are writing about the candidates and infer sentiment toward candidates from the messages alone. Passive monitoring represents the bulk of research into social monitoring due to its relative ease and low cost.
We will focus on passive monitoring in this book, but will mention active approaches when relevant. See Hill et al. [2013] for a discussion on the utility of active approaches to public health surveillance compared to passive monitoring.
3.2.2 TYPES OF USERS
This book focuses on monitoring of people in a population, and we therefore focus on messages written by individuals. However, large swaths of social data are produced by organizations, bots, and spammers. These messages also have value in public health analyses. Heldman et al. [2013] considered how public health agencies can use social media and others discuss how the medical profession can use social media to communicate with the population [Moorhead et al., 2013, Thackeray et al., 2008]. McCorriston et al. [2015] introduce automated methods for differentiating Twitter accounts between individuals and organizations. Whether to detect and remove spammers in analyzing health messages in social media is the subject of debate [Allem and Ferrara, 2016, Kim et al., 2016], but certainly the presence of such messages should be considered when designing research studies.
3.3 TYPES OF PLATFORMS
Social data comes in many forms. Different online platforms and websites exist for different audiences and different purposes, and different platforms may be better suited for particular