Advertisement

Utility and potential of rapid epidemic intelligence from internet-based sources

  • S.J. Yan
    Affiliations
    School of Public Health and Community Medicine, University of New South Wales, Australia
    Search for articles by this author
  • A.A. Chughtai
    Affiliations
    School of Public Health and Community Medicine, University of New South Wales, Australia
    Search for articles by this author
  • C.R. Macintyre
    Correspondence
    Corresponding author at: School of Public Health and Community Medicine, Samuels Building, Room 325, Faculty of Medicine, University of New South Wales, Sydney, 2052, NSW, Australia. Tel: +61 2 9385 3811; Fax: +61 2 9313 6185.
    Affiliations
    School of Public Health and Community Medicine, University of New South Wales, Australia

    College of Public Service and Community Solutions, Arizona State University, Phoenix, USA
    Search for articles by this author
Open AccessPublished:July 29, 2017DOI:https://doi.org/10.1016/j.ijid.2017.07.020

      Abstract

      Objectives

      Rapid epidemic detection is an important objective of surveillance to enable timely intervention, but traditional validated surveillance data may not be available in the required timeframe for acute epidemic control. Increasing volumes of data on the Internet have prompted interest in methods that could use unstructured sources to enhance traditional disease surveillance and gain rapid epidemic intelligence. We aimed to summarise Internet-based methods that use freely-accessible, unstructured data for epidemic surveillance and explore their timeliness and accuracy outcomes.

      Methods

      Steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist were used to guide a systematic review of research related to the use of informal or unstructured data by Internet-based intelligence methods for surveillance.

      Results

      We identified 84 articles published between 2006-2016 relating to Internet-based public health surveillance methods. Studies used search queries, social media posts and approaches derived from existing Internet-based systems for early epidemic alerts and real-time monitoring. Most studies noted improved timeliness compared to official reporting, such as in the 2014 Ebola epidemic where epidemic alerts were generated first from ProMED-mail. Internet-based methods showed variable correlation strength with official datasets, with some methods showing reasonable accuracy.

      Conclusion

      The proliferation of publicly available information on the Internet provided a new avenue for epidemic intelligence. Methodologies have been developed to collect Internet data and some systems are already used to enhance the timeliness of traditional surveillance systems. To improve the utility of Internet-based systems, the key attributes of timeliness and data accuracy should be included in future evaluations of surveillance systems.

      Background

      Broadly, ‘intelligence’ is defined as information collected, analysed and converted to gain insights. Intelligence can be described as a process or product (
      • Hughbank R.
      Intelligence and Its Role in Protecting Against Terrorism.
      ). The application of intelligence principles in public health gave rise to the discipline of ‘epidemic intelligence’ (
      • Bowsher G.
      • Milner C.
      • Sullivan R.
      Medical intelligence, security and global health: the foundations of a new health agenda.
      ), denoting all activities related to the early detection of potential health-related hazards (
      • Paquet C.
      • et al.
      Epidemic intelligence: A new framework for strengthening disease surveillance in Europe.
      ). Intelligence is generally formed through distinct steps such as data collection, processing, analysis, dissemination, feedback and tasking. Firstly, a specific request for intelligence is issued. Information-gathering methods are used to collect unstructured data. Following conversion to a manageable format, data can be interpreted and a final report produced. Feedback can inform a subsequent task (
      • Hughbank R.
      Intelligence and Its Role in Protecting Against Terrorism.
      ). Though criticism exists in literature regarding this simplified model of intelligence, public health surveillance follows similar steps in case detection, reporting, analysis and confirmation of cases (
      • Hulnick A.
      What’s wrong with the Intelligence Cycle.
      ).
      Automated intelligence methods are of growing interest due to improvements in information technology (
      • Hughbank R.
      Intelligence and Its Role in Protecting Against Terrorism.
      ). Open-source intelligence (OSINT) can utilise user-generated data found in the Internet or social media (
      • Li E.Y.
      • Tung C.Y.
      • Chang S.H.
      The wisdom of crowds in action: Forecasting epidemic diseases with a web-based prediction market system.
      ). The application of information technology for electronic data collection and interpretation have encouraged Internet-based methods that can inform rapid epidemic intelligence on public health events (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      ). The International Health Regulations issued by the World Health Organization (WHO) for health threat detection emphasized the importance of both indicator-based and event-based components of epidemic intelligence for the early detection of events (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ). Informal information sources are important, with the WHO reporting that more than 60% of initial disease epidemic reports come from unofficial sources (
      • World Health Organization
      Epidemic intelligence – systematic event detection.
      ). For the purposes of this report, a disease epidemic is “The occurrence of cases of disease in excess of what would normally be expected in a defined community, geographical area or season” (
      • World Health Organization
      Disease outbreaks.
      ). Public health surveillance, hereby referred to as ‘surveillance’, is defined by the WHO as: “The continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice.” (
      • World Health Organization
      Public health surveillance.
      )

      Overview of public health surveillance

      Rapid epidemic detection and real-time monitoring are important objectives of syndromic surveillance to minimise the morbidity and mortality caused by infectious diseases. Different types of surveillance can be conducted according to desired objectives. Sociocultural or ethical issues are also considered, with cost-effectiveness having implications on system feasibility (
      • McNabb S.
      • et al.
      Conceptual framework of public health surveillance and action and its application in health sector reform.
      ). Active surveillance involves regular monitoring of sources, providing the most complete information but ideally requiring trained epidemiologists. In contrast, passive surveillance is less resource intensive, involving regular reports from a wide range of sources and chance detection of cases. Integrated or enhanced surveillance makes use of both active and passive systems (
      • Nsubuga P.
      • et al.
      ). Active syndromic surveillance, or reporting based on clinical case definitions and pre-diagnostic data can be conducted in settings lacking laboratory confirmation to support diagnoses (
      • Chaudet H.
      • et al.
      Web Services Based Syndromic Surveillance for Early Warning within French Forces.
      ). Syndromic surveillance systems have therefore been established for early epidemic detection to minimize mortality and morbidity associated with emerging disease threats (
      • Flamand C.
      • et al.
      The Epidemiologic Surveillance of Dengue-Fever in French Guiana: When Achievements Trigger Higher Goals.
      ).
      During infectious disease epidemics, validated data collected through traditional or indicator-based surveillance methods may not be available for timely use. Surveillance has traditionally involved the monitoring of public health indicators from a range of sources. Key indicators outlined by the WHO in 1968 initially included mortality, morbidity, clinical data, laboratory reports, relevant field investigations, surveys, animal or vector studies, demographic and environmental data (
      • Declich S.
      • Carter A.
      Public health surveillance: historical origins, methods and evaluation.
      ). Other data have since been included, such as hospital statistics, disease registries, over-the-counter drug sales data, school or work absenteeism, telephone triage calls and news reports (
      • Chan E.
      • et al.
      Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance.
      ,
      • Lee L.
      • Thacker S.
      Public health surveillance and knowing about health in the context of growing sources of health data.
      ). Traditional surveillance methods have disadvantages in timeliness and sensitivity, attributable to factors such as a lengthy data validation process, bureaucratic barriers, higher costs and resource requirements (
      • Yang Y.T.
      • Horneffer M.
      • DiLisio N.
      Mining social media and web searches for disease detection.
      ). The time lag between the onset of epidemics and released official reports may render surveillance data redundant for the purpose of early detection and response (
      • Zhou X.
      • Ye J.
      • Feng Y.
      Tuberculosis surveillance by analyzing Google trends.
      ).
      To address these disadvantages, researchers have proposed methods utilising newer technologies like the Internet, mobile phones, improved point-of-care diagnostic tools and other event-based surveillance methods (
      • Chunara R.
      • Freifeld C.
      • Brownstein J.
      New technologies for reporting real-time emergent infections.
      ). In conjunction with developments in computing or automated technology, Internet-based methods can handle and utilise ‘big data’ collected from informal sources. Big datasets are terabytes to petabytes in size, requiring higher-level software tools to capture, store, manage and analyse effectively (
      • Bello-Orgaz G.
      • Jung J.
      • Camacho D.
      Social big data: Recent achievements and new challenges.
      ).
      Internet-based sources have potential to provide more timely information for detecting infectious disease epidemics, such as by identifying events or clusters of disease-related keywords in local news reports or social media (
      • Chunara R.
      • Freifeld C.
      • Brownstein J.
      New technologies for reporting real-time emergent infections.
      ). Common informal sources found on the Internet include search queries, online news, blogs and social media (
      • Salathe M.
      • et al.
      Influenza A (H7N9) and the importance of digital epidemiology.
      ,
      • Bernardo T.M.
      • et al.
      Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
      ). Through a scoping review, Bernardo et al. traced the use of Internet-based sources for disease surveillance back to 2006, with early work focusing on influenza. Pioneering studies introduced ‘infodemiology’, or the study of the determinants and distribution of health information (
      • Eysenbach G.
      Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance.
      ) in electronic media, specifically the Internet or in a population with the aim to inform public health policy (
      • Bernardo T.M.
      • et al.
      Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
      ). ‘Infoveillance’ was also introduced to refer to the secondary analysis of Internet, social media or cell phone activity (
      • Pagliari C.
      • Vijaykumar S.
      Digital Participatory Surveillance and the Zika Crisis: Opportunities and Caveats.
      ), as well as the use of infodemiology data for surveillance or digital disease detection (
      • Bernardo T.M.
      • et al.
      Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
      ). Subsequent methods have shown promising results for monitoring diseases such as influenza and foodborne illness. Through participatory surveillance, FluNearYou uses voluntary online surveys asking for demographic data and symptoms to form weekly trends of influenza-like illnesses (ILI) in the United States (
      • Santillana M.
      • et al.
      Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.
      ,
      • Christaki E.
      New technologies in predicting: preventing and controlling emerging infectious diseases.
      ,
      • FluNearYou
      How it works.
      ). Online media from local sources have also been utilised for early disease surveillance in recent epidemics (
      • Bernardo T.M.
      • et al.
      Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
      ). However FluNearYou and other online systems are not widely used by the community and their overall utility is questionable.
      However, despite available evidence supporting the use of Internet-based sources to complement traditional surveillance, informal sources are inherently more prone to biases. Systems using social media are affected by several factors, including the algorithms employed (
      • Al-garadi M.
      • et al.
      Using online social networks to track a pandemic: A systematic review.
      ), unstructured or disorganised data (
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      ) and expected false positives and negatives due to background noise. Several sources maintain that Internet-based surveillance should be used to supplement traditional methods of surveillance (
      • Gittelman S.
      • et al.
      A New Source of Data for Public Health Surveillance: Facebook Likes.
      ). Brownstein et al. has suggested integrating unstructured online information with other health indicator data (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ).
      Previous reviews defined epidemic intelligence, differentiating indicator-based from event-based surveillance and evaluating existing event-based surveillance systems (
      • Velasco E.
      • et al.
      Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review.
      ,
      • Choi J.
      • et al.
      Web-based infectious disease surveillance systems and public health perspectives: a systematic review.
      ). Sensitivity and specificity analyses have also been conducted on some Internet-based surveillance systems (
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ). The Early Alerting and Reporting (EAR) project was developed in 2008 to incorporate several Internet-based surveillance systems and intelligence specialists for optimising the detection of biological threats (
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ). Wilson and Brownstein emphasised the value of ‘search-term surveillance’ involving the monitoring of aggregated keyword searches or queries performed by Internet users (
      • Wilson K.
      • Brownstein J.
      Early detection of disease outbreaks using the Internet.
      ). However, few reviews have summarised the core outcomes of timeliness and accuracy in relation to Internet-based surveillance methods studied in literature. Thus, we aimed to summarise recent studies looking at existing Internet-based surveillance methods and explore any subsequent timeliness and accuracy measures reported by each study.

      Objectives

      The primary aim of this study was to identify and summarise the types of Internet-based surveillance methods studied in recent literature. A secondary aim was to identify and summarise the timeliness and accuracy outcomes of Internet-based methods described in literature.

      Method

      An initial exploratory review was conducted to gain familiarity with the discipline of epidemic intelligence and to generate possible questions or keywords for a review of related studies. Steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 checklist (
      • Anon
      Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).
      ,
      • Moher D.
      • et al.
      Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement.
      ) were then used to guide a review of research related to the use of informal data by Internet-based intelligence methods for rapid infectious disease epidemic surveillance. We systematically searched and reviewed published studies which used internet-based methods.

      Search strategy

      Based on the primary aim to explore web-based intelligence methods for early disease surveillance and an exploratory literature review, the keywords of ‘web-based’, ‘surveillance’ and ‘epidemic intelligence’ were identified. Combinations of these words and their synonyms were inputted into online databases including MEDLINE, EMBASE, SCOPUS, Web of Science and Global Health. Database searches involved strings and Boolean combinations of relevant Medical Subject Heading (MeSH) terms such as ‘disease outbreaks’, ‘epidemics’, ‘Internet’, ‘population surveillance’, ‘epidemiological monitoring’ were used. Electronic records of all references were imported into EndNote.

      Eligibility criteria

      Studies included in this review were limited to publicly-available, peer-reviewed articles published between years 2006-2016 and written in the English language. Title and abstracts were screened. We excluded any articles which did not refer to surveillance or intelligence methods using unstructured or open-source data for detecting new or emerging outbreaks of infectious disease. All surveillance types were considered relevant and surveillance for symptoms or complications of infectious disease were included. Articles were excluded if (i) full article was not publicly available or (ii) if the abstract and article were not related to infectious disease epidemics or surveillance methods using unstructured data. Reviews, letters, editorials, perspectives and conference papers that already had journal articles published were also excluded.
      For each included study, we collected quantitative and qualitative information pertaining to data collection or analysis methods. We grouped our assessment based under 3 key streams of Internet-based disease surveillance: (i) existing surveillance systems and news aggregators, (ii) search query surveillance and (iii) social media surveillance, based on the data sources used. For each stream, we explored any outcomes describing the timeliness (temporal difference between alerts issued by the surveillance system and the onset of an outbreak or official alerts) or accuracy (correlation strength between collected informal data and officially released disease data) of the methods used to collect data from the Internet.

      Results

      A total of 84 studies were identified (Figure 1) as relevant in describing Internet-based methods using open-access, unstructured data for rapid epidemic detection.
      Figure 1
      Figure 1Adapted PRISMA diagram showing the progression of article inclusion and exclusion.

      Existing surveillance systems and news aggregators

      29 studies used existing Internet-based surveillance systems developed by researchers and public health authorities (Table 1). These systems focused on event-based reporting and syndromic surveillance. Studies used and evaluated the operation or technical development of these systems. HealthMap and the former EpiSPIDER system were the only systems that collected data from a social media platform (
      • Ahmed S.S.
      • et al.
      Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      ). Other systems used unstructured data from news reports and official data, but not from social media platforms. Most systems collected data using automated tools and involved a hierarchical process to analyse the data (see Table 2). BioCaster, ProMED-mail and HealthMap required human judgement in the form of curators and analysts to correct occasional mistakes (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ,
      • Collier N.
      • et al.
      BioCaster: Detecting public health rumors with a Web-based text mining system.
      ).
      Table 1Summary of study characteristics.
      Internet-based surveillance streamNumber of studiesData sourceDisease(s) or context*Comments
      Existing surveillance systems and news aggregators2917 HealthMap (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Ahmed S.S.
      • et al.
      Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
      ,
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ,
      • Keller M.
      • Freifeld C.
      • Brownstein J.
      Automated vocabulary discovery for geo-parsing online epidemic intelligence.
      ,
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      ,
      • Khan K.
      • et al.
      Preparing for infectious disease threats at mass gatherings: the case of the Vancouver 2010 Olympic Winter Games.
      ,
      • Schwind J.
      • et al.
      Evaluation of Local Media Surveillance for Improved Disease Recognition and Monitoring in Global Hotspot Regions.
      ,
      • Hoen A.
      • et al.
      Electronic Event-based Surveillance for Monitoring Dengue, Latin America.
      ,
      • Brownstein J.
      • et al.
      Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza.
      ,
      • Brownstein J.S.
      • Freifeld C.C.
      HealthMap: the development of automated real-time internet surveillance for epidemic intelligence.
      ,
      • Freifeld C.
      • et al.
      HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports.
      ,
      • Scales D.
      • Zelenev A.
      • Brownstein J.
      Quantifying the effect of media limitations on outbreak data in a global online web-crawling epidemic intelligence system, 2008-2011.
      ,
      • Zhang Y.
      • et al.
      Characterizing Influenza surveillance systems performance: application of a Bayesian hierarchical statistical model to Hong Kong surveillance data.
      )

      7 ProMED-mail (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Zeldenrust M.
      • et al.
      The value of ProMED-mail for the Early Warning Committee in the Netherlands: more specific approach recommended.
      ,
      • Rotureau B.
      • et al.
      International Epidemic Intelligence at the Institut de Veille Sanitaire, France.
      ,
      • Mondor L.
      • et al.
      Timeliness of Nongovernmental versus Governmental Global Outbreak Communications.
      ) 6 Argus (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Torii M.
      • et al.
      An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.
      ,
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ,
      • Thomas C.
      • et al.
      Use of media and public-domain Internet sources for detection and assessment of plant health threats.
      ) 6 BioCaster (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      ,
      • Collier N.
      • et al.
      BioCaster: Detecting public health rumors with a Web-based text mining system.
      ,
      • Chanlekha H.
      • Kawazoe A.
      • Collier N.
      A framework for enhancing spatial and temporal granularity in report-based health surveillance systems.
      ) 7 GPHIN (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ,
      • Rotureau B.
      • et al.
      International Epidemic Intelligence at the Institut de Veille Sanitaire, France.
      ,
      • Dion M.
      • AbdelMalik P.
      • Mawudeku A.
      Big Data and the Global Public Health Intelligence Network (GPHIN).
      ,
      • Mykhalovskiy E.
      • Weir L.
      The Global Public Health Intelligence Network and Early Warning Outbreak Detection: a Canadian contribution to global public health.
      ) 4 MedISys (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Mantero J.
      • et al.
      Enhanced epidemic intelligence using a web-based screening system during the 2010 FIFA World Cup in South Africa.
      ) 2 EpiSPIDER (
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      )

      2 PULS (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      )
      1 Ebola (
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      )

      1 Dengue (
      • Hoen A.
      • et al.
      Electronic Event-based Surveillance for Monitoring Dengue, Latin America.
      ) 4 ILI (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Brownstein J.
      • et al.
      Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza.
      ,
      • Zhang Y.
      • et al.
      Characterizing Influenza surveillance systems performance: application of a Bayesian hierarchical statistical model to Hong Kong surveillance data.
      ,
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ) 2 Mass gatherings (
      • Khan K.
      • et al.
      Preparing for infectious disease threats at mass gatherings: the case of the Vancouver 2010 Olympic Winter Games.
      ,
      • Mantero J.
      • et al.
      Enhanced epidemic intelligence using a web-based screening system during the 2010 FIFA World Cup in South Africa.
      ) 4 Comparative studies (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      ,
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      ) 9 Development studies (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ,
      • Keller M.
      • Freifeld C.
      • Brownstein J.
      Automated vocabulary discovery for geo-parsing online epidemic intelligence.
      ,
      • Schwind J.
      • et al.
      Evaluation of Local Media Surveillance for Improved Disease Recognition and Monitoring in Global Hotspot Regions.
      ,
      • Brownstein J.S.
      • Freifeld C.C.
      HealthMap: the development of automated real-time internet surveillance for epidemic intelligence.
      ,
      • Freifeld C.
      • et al.
      HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports.
      ,
      • Scales D.
      • Zelenev A.
      • Brownstein J.
      Quantifying the effect of media limitations on outbreak data in a global online web-crawling epidemic intelligence system, 2008-2011.
      ,
      • Zeldenrust M.
      • et al.
      The value of ProMED-mail for the Early Warning Committee in the Netherlands: more specific approach recommended.
      ,
      • Rotureau B.
      • et al.
      International Epidemic Intelligence at the Institut de Veille Sanitaire, France.
      ,
      • Collier N.
      • et al.
      BioCaster: Detecting public health rumors with a Web-based text mining system.
      ,
      • Chanlekha H.
      • Kawazoe A.
      • Collier N.
      A framework for enhancing spatial and temporal granularity in report-based health surveillance systems.
      ,
      • Dion M.
      • AbdelMalik P.
      • Mawudeku A.
      Big Data and the Global Public Health Intelligence Network (GPHIN).
      ,
      • Mykhalovskiy E.
      • Weir L.
      The Global Public Health Intelligence Network and Early Warning Outbreak Detection: a Canadian contribution to global public health.
      ) 1 Meningococcus (
      • Ahmed S.S.
      • et al.
      Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
      ) 1 Cholera (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      )

      4 Other or multiple infectious dieases (
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ,
      • Mondor L.
      • et al.
      Timeliness of Nongovernmental versus Governmental Global Outbreak Communications.
      ,
      • Torii M.
      • et al.
      An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.
      ,
      • Thomas C.
      • et al.
      Use of media and public-domain Internet sources for detection and assessment of plant health threats.
      )
      Timeliness of epidemic detection: 1.9 to 12 days before official reports (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Brownstein J.
      • et al.
      Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza.
      ). No significant difference between systems (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ).

      One-day lag (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ) or same day as the WHO (
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ).

      Accuracy: variable (
      • Zeldenrust M.
      • et al.
      The value of ProMED-mail for the Early Warning Committee in the Netherlands: more specific approach recommended.
      ) to strong positive correlations (
      • Ahmed S.S.
      • et al.
      Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
      ). Correlations weakened over time (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ), with weekly pattern and media effects (
      • Scales D.
      • Zelenev A.
      • Brownstein J.
      Quantifying the effect of media limitations on outbreak data in a global online web-crawling epidemic intelligence system, 2008-2011.
      )
      Search query surveillance4219 Google Flu Trends (GFT) (
      • Chan E.
      • et al.
      Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance.
      ,
      • Araz O.
      • Bentley D.
      • Muelleman R.
      Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
      ,
      • Santillana M.
      • et al.
      What can digital disease detection learn from (an external revision to) Google flu trends?.
      ,
      • Davidson M.
      • Haim D.
      • Radin J.
      Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions.
      ,
      • Malik M.
      • et al.
      “Google Flu Trends” and Emergency Department Triage Data Predicted the 2009 Pandemic H1N1 Waves in Manitoba.
      ,
      • Moss R.
      • et al.
      Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.
      ,
      • Olson D.
      • et al.
      Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales.
      ,
      • Nsoesie E.
      • Buckeridge D.
      • Brownstein J.
      Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance.
      ,
      • Patwardhan A.
      • Bilkovski R.
      Comparison: Flu Prescription Sales Data from a Retail Pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator.
      ,
      • Pervaiz F.
      • et al.
      FluBreaks: Early Epidemic Detection from Google Flu Trends.
      ,
      • Scarpino S.
      • Dimitrov N.
      • Meyers L.
      Optimizing Provider Recruitment for Influenza Surveillance Networks.
      ,
      • Thompson L.
      • et al.
      Emergency department and ‘Google flu trends’ data as syndromic surveillance indicators for seasonal influenza.
      ,
      • Timpka T.
      • et al.
      Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study.
      ,
      • Valdivia A.
      • M.C.S
      Diseases tracked by using Google trends, Spain.
      ,
      • Velardi P.
      • et al.
      Twitter mining for fine-grained syndromic surveillance.
      ,
      • Wilson N.
      • et al.
      Interpreting “Google Flu Trends” data for pandemic H1N1 influenza: the New Zealand experience.
      ,
      • Hulth A.
      • Rydevik G.
      GET WELL: an automated surveillance system for gaining new epidemiological knowledge.
      ,
      • Boyle J.
      • et al.
      Prediction and surveillance of influenza epidemics.
      ,
      • Lange M.M.A.D.
      • et al.
      Comparison of five influenza surveillance systems during the 2009 pandemic and their association with media attention.
      )

      10 Google Trends (
      • Zhou X.
      • Ye J.
      • Feng Y.
      Tuberculosis surveillance by analyzing Google trends.
      ,
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      ,
      • Kang M.
      • et al.
      Using Google Trends for Influenza Surveillance in South China.
      ,
      • Alicino C.
      • et al.
      Assessing Ebola-related web search behaviour: insights and implications.
      ,
      • Bakker K.
      • et al.
      Digital epidemiology reveals global childhood disease seasonality and the effects of immunization.
      ,
      • Carneiro H.
      • Mylonakis E.
      Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks.
      ,
      • Pollett S.
      • et al.
      Validating the Use of Google Trends to Enhance Pertussis Surveillance in California.
      ,
      • Zhou X.
      • et al.
      Monitoring Epidemic Alert Levels by Analyzing Internet Search Volume.
      ,
      • Desai R.
      • et al.
      Norovirus Disease Surveillance Using Google Internet Query Share Data.
      ,
      • Milinovich G.
      • et al.
      Using internet search queries for infectious disease surveillance: screening diseases for suitability.
      )

      1 Google insights for search (
      • Samaras L.
      • Garcia-Barriocanal E.
      • Sicilia M.
      Syndromic surveillance models using Web data: The case of scarlet fever in the UK.
      ) 1 Google AdSense (
      • Eysenbach G.
      Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance.
      ) 1 UpToDate (
      • Santillana M.
      • et al.
      Using Clinicians’ Search Query Data to Monitor Influenza Epidemics.
      ) 1 Microsoft Bing (
      • Yom-Tov E.
      • et al.
      Detecting disease outbreaks in mass gatherings using Internet data.
      ) 3 Baidu (
      • Yuan Q.
      • et al.
      Monitoring Influenza Epidemics in China with Search Query from Baidu.
      ,
      • Xie T.
      • et al.
      Correlation between reported human infection with avian influenza A H7N9 virus and cyber user awareness: What can we learn from digital epidemiology?.
      ,
      • Gu Y.
      • et al.
      Early detection of an epidemic erythromelalgia outbreak using Baidu search data.
      ) 1 Yandex (
      • Domnich A.
      • et al.
      Demand-based web surveillance of sexually transmitted infections in Russia.
      ) 1 Websok (
      • Edelstein M.
      • et al.
      Detecting the norovirus season in sweden using search engine data – Meeting the needs of hospital infection control teams.
      )

      1 Naver Trends (
      • Shin S.
      • et al.
      Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.
      )
      26 ILI (
      • Eysenbach G.
      Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance.
      ,
      • Araz O.
      • Bentley D.
      • Muelleman R.
      Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
      ,
      • Santillana M.
      • et al.
      What can digital disease detection learn from (an external revision to) Google flu trends?.
      ,
      • Davidson M.
      • Haim D.
      • Radin J.
      Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions.
      ,
      • Malik M.
      • et al.
      “Google Flu Trends” and Emergency Department Triage Data Predicted the 2009 Pandemic H1N1 Waves in Manitoba.
      ,
      • Moss R.
      • et al.
      Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.
      ,
      • Olson D.
      • et al.
      Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales.
      ,
      • Nsoesie E.
      • Buckeridge D.
      • Brownstein J.
      Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance.
      ,
      • Patwardhan A.
      • Bilkovski R.
      Comparison: Flu Prescription Sales Data from a Retail Pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator.
      ,
      • Pervaiz F.
      • et al.
      FluBreaks: Early Epidemic Detection from Google Flu Trends.
      ,
      • Scarpino S.
      • Dimitrov N.
      • Meyers L.
      Optimizing Provider Recruitment for Influenza Surveillance Networks.
      ,
      • Thompson L.
      • et al.
      Emergency department and ‘Google flu trends’ data as syndromic surveillance indicators for seasonal influenza.
      ,
      • Timpka T.
      • et al.
      Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study.
      ,
      • Valdivia A.
      • M.C.S
      Diseases tracked by using Google trends, Spain.
      ,
      • Velardi P.
      • et al.
      Twitter mining for fine-grained syndromic surveillance.
      ,
      • Wilson N.
      • et al.
      Interpreting “Google Flu Trends” data for pandemic H1N1 influenza: the New Zealand experience.
      ,
      • Hulth A.
      • Rydevik G.
      GET WELL: an automated surveillance system for gaining new epidemiological knowledge.
      ,
      • Boyle J.
      • et al.
      Prediction and surveillance of influenza epidemics.
      ,
      • Lange M.M.A.D.
      • et al.
      Comparison of five influenza surveillance systems during the 2009 pandemic and their association with media attention.
      ,
      • Kang M.
      • et al.
      Using Google Trends for Influenza Surveillance in South China.
      ,
      • Carneiro H.
      • Mylonakis E.
      Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks.
      ,
      • Santillana M.
      • et al.
      Using Clinicians’ Search Query Data to Monitor Influenza Epidemics.
      ,
      • Yuan Q.
      • et al.
      Monitoring Influenza Epidemics in China with Search Query from Baidu.
      ,
      • Xie T.
      • et al.
      Correlation between reported human infection with avian influenza A H7N9 virus and cyber user awareness: What can we learn from digital epidemiology?.
      ,
      • Shin S.
      • et al.
      Correlation between National Influenza Surveillance Data and Search Queries from Mobile Devices and Desktops in South Korea.
      ,
      • Hulth A.
      • Rydevik G.
      Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010.
      )

      2 Ebola (
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      ,
      • Alicino C.
      • et al.
      Assessing Ebola-related web search behaviour: insights and implications.
      ) 5 Foodborne illness (
      • Hulth A.
      • Rydevik G.
      GET WELL: an automated surveillance system for gaining new epidemiological knowledge.
      ,
      • Desai R.
      • et al.
      Norovirus Disease Surveillance Using Google Internet Query Share Data.
      ,
      • Edelstein M.
      • et al.
      Detecting the norovirus season in sweden using search engine data – Meeting the needs of hospital infection control teams.
      ,
      • Andersson T.
      • et al.
      Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
      ,
      • Bahk G.
      • Kim Y.
      • Park M.
      Use of internet search queries to enhance surveillance of foodborne illness.
      ) 1 Mass gatherings (
      • Yom-Tov E.
      • et al.
      Detecting disease outbreaks in mass gatherings using Internet data.
      ) 1 Scarlet fever (
      • Samaras L.
      • Garcia-Barriocanal E.
      • Sicilia M.
      Syndromic surveillance models using Web data: The case of scarlet fever in the UK.
      ) 1 Epidemic erythromelalgia (
      • Gu Y.
      • et al.
      Early detection of an epidemic erythromelalgia outbreak using Baidu search data.
      ) 1 Chickenpox (
      • Bakker K.
      • et al.
      Digital epidemiology reveals global childhood disease seasonality and the effects of immunization.
      ) 1 Dengue (
      • Chan E.
      • et al.
      Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance.
      ) 1 Pertussis (
      • Pollett S.
      • et al.
      Validating the Use of Google Trends to Enhance Pertussis Surveillance in California.
      ) 1 Hepatitis (
      • Zhou X.
      • et al.
      Monitoring Epidemic Alert Levels by Analyzing Internet Search Volume.
      ) 1 Tuberculosis (
      • Zhou X.
      • Ye J.
      • Feng Y.
      Tuberculosis surveillance by analyzing Google trends.
      )

      2 Other or multiple infectious diseases (
      • Milinovich G.
      • et al.
      Using internet search queries for infectious disease surveillance: screening diseases for suitability.
      ,
      • Domnich A.
      • et al.
      Demand-based web surveillance of sexually transmitted infections in Russia.
      )
      Timeliness: 1-12 weeks before official reporting (
      • Zhou X.
      • Ye J.
      • Feng Y.
      Tuberculosis surveillance by analyzing Google trends.
      ).

      Accuracy: weak to strong, variable correlations [93,94]and variable interpretations of findings (
      • Samaras L.
      • Garcia-Barriocanal E.
      • Sicilia M.
      Syndromic surveillance models using Web data: The case of scarlet fever in the UK.
      ).

      Increased correlation with increased availability of weekly search data (
      • Boyle J.
      • et al.
      Prediction and surveillance of influenza epidemics.
      ).
      Social media surveillance1712 Twitter (
      • Santillana M.
      • et al.
      Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.
      ,
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ,
      • Velardi P.
      • et al.
      Twitter mining for fine-grained syndromic surveillance.
      ,
      • Yom-Tov E.
      • et al.
      Detecting disease outbreaks in mass gatherings using Internet data.
      ,
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ,
      • Denecke K.
      • et al.
      How to exploit twitter for public health monitoring?.
      ,
      • Mollema L.
      • et al.
      Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
      ,
      • Nagel A.
      • et al.
      The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets.
      ,
      • Sofean M.
      • Smith M.
      A real-time disease surveillance architecture using social networks.
      ,
      • Towers S.
      • et al.
      Mass Media and the Contagion of Fear: The Case of Ebola in America.
      ,
      • Young S.
      • Rivers C.
      • Lewis B.
      Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes.
      ,
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      )

      1 Weibo (
      • Zhang E.
      • et al.
      Leveraging social networking sites for disease surveillance and public sensing: the case of the 2013 avian influenza A(H7N9) outbreak in China.
      )

      5 Online restaurant websites, blogs or other media (
      • Nsoesie E.
      • Buckeridge D.
      • Brownstein J.
      Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance.
      ,
      • Lange M.M.A.D.
      • et al.
      Comparison of five influenza surveillance systems during the 2009 pandemic and their association with media attention.
      ,
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      ,
      • Nsoesie E.O.
      • Kluberg S.A.
      • Brownstein J.S.
      Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports.
      ,
      • Corley C.D.
      • et al.
      Using Web and social media for influenza surveillance.
      )
      8 ILI (
      • Santillana M.
      • et al.
      Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.
      ,
      • Nsoesie E.
      • Buckeridge D.
      • Brownstein J.
      Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance.
      ,
      • Velardi P.
      • et al.
      Twitter mining for fine-grained syndromic surveillance.
      ,
      • Lange M.M.A.D.
      • et al.
      Comparison of five influenza surveillance systems during the 2009 pandemic and their association with media attention.
      ,
      • Nagel A.
      • et al.
      The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets.
      ,
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      ,
      • Zhang E.
      • et al.
      Leveraging social networking sites for disease surveillance and public sensing: the case of the 2013 avian influenza A(H7N9) outbreak in China.
      ,
      • Corley C.D.
      • et al.
      Using Web and social media for influenza surveillance.
      )

      2 Ebola (
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ,
      • Towers S.
      • et al.
      Mass Media and the Contagion of Fear: The Case of Ebola in America.
      ) 1 Cholera (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ) 1 Measles (
      • Mollema L.
      • et al.
      Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
      ) 1 Pertussis (
      • Nagel A.
      • et al.
      The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets.
      ) 1 HIV (
      • Young S.
      • Rivers C.
      • Lewis B.
      Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes.
      ) 1 Mass gatherings (
      • Yom-Tov E.
      • et al.
      Detecting disease outbreaks in mass gatherings using Internet data.
      )

      2 Other or multiple infectious diseases (
      • Denecke K.
      • et al.
      How to exploit twitter for public health monitoring?.
      ,
      • Sofean M.
      • Smith M.
      A real-time disease surveillance architecture using social networks.
      )
      Timeliness: up to one week before official announcements (
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ) or GFT (
      • Santillana M.
      • et al.
      Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.
      ).

      Accuracy: moderate to strong correlations with official data (
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      ).

      Tweet volumes influenced by disease-related news videos (
      • Towers S.
      • et al.
      Mass Media and the Contagion of Fear: The Case of Ebola in America.
      ) and time (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ).
      Other22 Combination of various media (
      • Denecke K.
      • et al.
      Event-Driven Architecture for Health Event Detection from Multiple Sources.
      ,
      • Khan S.
      • Patel C.
      • Kukafka R.
      GODSN: Global News Driven Disease Outbreak and Surveillance.
      )
      2 Development study (
      • Denecke K.
      • et al.
      Event-Driven Architecture for Health Event Detection from Multiple Sources.
      ,
      • Khan S.
      • Patel C.
      • Kukafka R.
      GODSN: Global News Driven Disease Outbreak and Surveillance.
      )
      Proposed new avenues for Internet-based methods in areas such as system architecture and technical requirements (
      • Denecke K.
      • et al.
      Event-Driven Architecture for Health Event Detection from Multiple Sources.
      ,
      • Khan S.
      • Patel C.
      • Kukafka R.
      GODSN: Global News Driven Disease Outbreak and Surveillance.
      ).
      Table 2Comparing existing Internet-based surveillance systems by attributes described in studies.
      HealthMapGPHINBioCasterArgusMedISysProMED-mail (
      • Cowen P.
      • et al.
      Evaluation of ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004.
      ,
      • Madoff L.
      ProMED-mail: An Early Warning System for Emerging Diseases.
      ,
      • Madoff L.
      • Woodall J.
      The Internet and the Global Monitoring of Emerging Diseases: Lessons from the First 10 Years of ProMED-mail.
      ,
      • Morse S.
      • Rosenberg B.
      • Woodall J.
      ProMED Global monitoring of emerging diseases: design for a demonstration program.
      ,
      • Stewart A.
      • Denecke K.
      Using ProMED-Mail and MedWorm Blogs for Cross-Domain Pattern Analysis in Epidemic Intelligence.
      ,
      • Woodall J.
      Global surveillance of emerging diseases: the ProMED-mail perspective.
      )
      Sources used (
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      )
      49% media, 39% other systems, 12% official93% media, 1% other systems, 6% official44% media, 49% other systems, 6% official67% media, 1% other systems, 32% official52% media, 41% other systems, 7% official73% media, 1% other systems, 26% official
      Data collection and processingCollects data from websites and news aggregators through steps of acquiring, categorizing (by pathogen and location), clustering and filtering (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ). Duplicates are removed. Relevant articles are integrated. Final report filtered into one of 5 alert levels (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ).
      Uses a web-based application to scan and extract various news sources and has processes to filter out noise (
      • Dion M.
      • AbdelMalik P.
      • Mawudeku A.
      Big Data and the Global Public Health Intelligence Network (GPHIN).
      ). Duplicates and irrelevant articles are filtered out, while relevant articles are categorized into seven subject areas (
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ).
      Collects news articles from medical sites, news media and selected blogs (
      • Riccardo F.
      • et al.
      Event-based surveillance during EXPO Milan 2015: rationale, tools, procedures, and initial results.
      ). Information is categorized according to predefined multilingual disease categories. Known names and events are identified and statistics calculated to detect emerging threats (
      • Riccardo F.
      • et al.
      Event-based surveillance during EXPO Milan 2015: rationale, tools, procedures, and initial results.
      ).
      Collects media articles daily. Bayesian tools are used to identify relevant articles. Human experts review these articles manually and determine relevance based on indicators. Experts then write reports (
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ).
      Collects public health news articles. Extracted data are aggregated and categorized. Identifies entities and uses the Pattern-based Understanding and Learning System (PULS) to cluster information (
      • Denecke K.
      • et al.
      Event-Driven Architecture for Health Event Detection from Multiple Sources.
      ). Events filtered into one of 5 categories (
      • Linge J.
      • et al.
      MedISys: Medical Information System.
      ).
      Receives information through emails or reports from professionals or anyone around the world. Staff members search the Internet and traditional media. Filtered by one editor, who may reject entries based on relevance and accuracy. Most reports sent to expert moderators.
      Number of languages (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      ,
      • Hartley D.
      • et al.
      Landscape of international event-based biosurveillance.
      )
      791240269
      Target audienceHealth professionals

      Community
      Health professionalsNot applicableHealth professionalsHealth professionals

      Community
      Health professionals

      Community
      Goal of serviceReal-time intelligence of infectious diseasesOutbreak detectionNot applicable (No longer available)Detection of biological eventsDetection of infectious diseases outbreaksCommunication among international community regarding health events
      Translation (
      • Thomas C.
      • et al.
      Use of media and public-domain Internet sources for detection and assessment of plant health threats.
      ,
      • Denecke K.
      • et al.
      Event-Driven Architecture for Health Event Detection from Multiple Sources.
      ,
      • Hartley D.
      • et al.
      An overview of Internet biosurveillance.
      )
      Uses language-specific search terms.Uses both language-specific keywords and text extraction algorithms.Uses full text translation first, uses only English language selection algorithms.Employs regional specialists fluent in a language.Classifies news articles according to multilingual categories.Employs multilingual personnel.
      ModerationPartialPartialAutomatedPartialAutomatedManual
      Mode of disseminationAllow users to specify parameters for pushed information. Only reports classified as ‘breaking news’ (the top priority alert level is added as a marker to the map (
      • Brownstein J.
      Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
      ).
      Pushing function to send alerts. The most relevant articles are published on a website (
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ).
      In several formats, including graphs, geographic and email alerts (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      ).
      Following internal review, reports are posted to a secure Internet portal (
      • Thomas C.
      • et al.
      Use of media and public-domain Internet sources for detection and assessment of plant health threats.
      ).
      Publishes reports and alerts on website.Final reports (following colour code ranking) shared on website. Mailing lists and listserv software that distributes reports to at least 1 of 11 mailing lists (
      • Madoff L.
      ProMED-mail: An Early Warning System for Emerging Diseases.
      ).
      AccessibilityFreely accessible websiteRestricted, password-protected accessNo longer available, had freely available and password-protected modes (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      )
      Restricted accessFreely accessibleFreely accessible
      Developer (
      • Barboza P.
      • et al.
      Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
      )
      Harvard University, USAPublic Health Agency of CanadaNational Institute of Informatics, JapanGeorgetown University, USAJoint Research Centre, EUInternational Society of Infectious Diseases, USA

      Timeliness

      In a study analysing seven surveillance systems, Barboza et al. found that the systems issued alerts for human cases between 1.9 days (95%CI −0.4-4.1) to 6.1 days (95%CI 3.1-9.1) on average before WHO reports, but together did not detect 7% of public health events before official reports (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ). Differences in timeliness between systems were generally not significant (
      • Barboza P.
      • et al.
      Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
      ,
      • Lyon A.
      • et al.
      Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap.
      ). In another study, nongovernmental data sources were found to communicate initial reports a median of 10 days earlier than governmental sources, but this difference was not significant (
      • Mondor L.
      • et al.
      Timeliness of Nongovernmental versus Governmental Global Outbreak Communications.
      ). Though based on a small sample, a study assessing epidemic alerts issued by the French surveillance tool Bulletin Hebdomadaire International found that GPHIN, a source used by the tool, exhibited a mean delay of 1 month and 19 days in providing an outbreak signal after the first retrospective occurrence of the disease, while ProMED-mail exhibited a mean delay of 2 months and 4 days (
      • Rotureau B.
      • et al.
      International Epidemic Intelligence at the Institut de Veille Sanitaire, France.
      ). ProMED-mail provided the first relevant indication of an outbreak in the 2014 Ebola epidemic around three months after the first retrospectively suspected occurrence of Ebola (
      • Hossain L.
      • et al.
      Social media in Ebola outbreak.
      ). In one study, HealthMap was found to provide alerts at a median of 12 days (95%CI 9-18) before official reports of confirmed H1N1 influenza cases (
      • Brownstein J.
      • et al.
      Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza.
      ). However, in another study HealthMap data exhibited a one day lag behind officially reported cases of cholera (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ). Argus provided reports of the first pandemic H1N1 2009 case 1 to 16 days before WHO for 42 countries, but provided reports on the same day as WHO for 21 countries (
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ).

      Accuracy

      Reported correlations between Internet-based systems and official data have generally been positive but variable. HealthMap was found to correlate well with traditional surveillance (
      • Ahmed S.S.
      • et al.
      Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
      ), though this correlation weakened over time (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ). Argus showed a moderate to strong correlation (r = 81) with WHO case data for the pandemic H1N1 2009 (
      • Nelson N.
      • et al.
      Event-based internet biosurveillance: relation to epidemiological observation.
      ). One study found that ProMED-mail reports showed variable correlations, but that 20% of events detected by ProMED-mail was not detected by a national system (
      • Zeldenrust M.
      • et al.
      The value of ProMED-mail for the Early Warning Committee in the Netherlands: more specific approach recommended.
      ). An assessment of two data sources used by HealthMap found a weekly pattern and ‘crowding out’ phenomenon caused by unbalanced or heightened media attention (
      • Scales D.
      • Zelenev A.
      • Brownstein J.
      Quantifying the effect of media limitations on outbreak data in a global online web-crawling epidemic intelligence system, 2008-2011.
      ).

      Search query surveillance

      Search queries, in which Internet users request specific information from web-based search engines have been implemented for early epidemic detection. Google Flu Trends (GFT) was the most common data source, used by 19 studies for comparisons with official case reports of Influenza or Influenza-like illness (ILI). Not all sources were open-access and some were obtained directly from the search engine provider (
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      ). Methods most commonly employed a syndromic surveillance approach, comparing search frequencies of disease-related keywords with a pre-existing official dataset and using statistical methods to validate the informal data.

      Timeliness

      Generally, search queries provided earlier epidemic alerts than official data sources. GFT preceded official data by 1-2 weeks (
      • Malik M.
      • et al.
      “Google Flu Trends” and Emergency Department Triage Data Predicted the 2009 Pandemic H1N1 Waves in Manitoba.
      ,
      • Timpka T.
      • et al.
      Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study.
      ) and up to 4-6 weeks when an infection model was applied (
      • Moss R.
      • et al.
      Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.
      ). Similarly, Google Trends data was found to precede CDC data by 1-12 weeks for Tuberculosis reporting (
      • Zhou X.
      • Ye J.
      • Feng Y.
      Tuberculosis surveillance by analyzing Google trends.
      ). Local search query data produced a signal for the onset of the norovirus season 2-3 weeks earlier than laboratory notification data (
      • Edelstein M.
      • et al.
      Detecting the norovirus season in sweden using search engine data – Meeting the needs of hospital infection control teams.
      ). A study using data from Chinese search engine Baidu also showed upward trends one week before official reports (
      • Gu Y.
      • et al.
      Early detection of an epidemic erythromelalgia outbreak using Baidu search data.
      ).

      Accuracy

      Correlations with validated surveillance varied in strength, and limitations were mentioned. Moderate correlations were reported for Google Trends data (
      • Carneiro H.
      • Mylonakis E.
      Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks.
      ,
      • Pollett S.
      • et al.
      Validating the Use of Google Trends to Enhance Pertussis Surveillance in California.
      ). Local search query data had moderate (r = 0.70) to strong correlations (0.88-0.89) with official data (
      • Edelstein M.
      • et al.
      Detecting the norovirus season in sweden using search engine data – Meeting the needs of hospital infection control teams.
      ,
      • Bahk G.
      • Kim Y.
      • Park M.
      Use of internet search queries to enhance surveillance of foodborne illness.
      ). Studies using Baidu reported both strong (r = 0.98, r = 0.96) and weak (r = 0.43) correlations and noted the coherence of curves produced using a search index compared to official cases (
      • Yuan Q.
      • et al.
      Monitoring Influenza Epidemics in China with Search Query from Baidu.
      ,
      • Xie T.
      • et al.
      Correlation between reported human infection with avian influenza A H7N9 virus and cyber user awareness: What can we learn from digital epidemiology?.
      ). GFT showed variable correlations from r = 0.69 (95%CI 0.22-0.90) to r = 0.96 (95%CI 0.88-0.99) (
      • Timpka T.
      • et al.
      Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study.
      ). GFT data had moderate to strong positive correlations with official data after implementing additional statistical regression models (
      • Araz O.
      • Bentley D.
      • Muelleman R.
      Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
      ,
      • Scarpino S.
      • Dimitrov N.
      • Meyers L.
      Optimizing Provider Recruitment for Influenza Surveillance Networks.
      ,
      • Thompson L.
      • et al.
      Emergency department and ‘Google flu trends’ data as syndromic surveillance indicators for seasonal influenza.
      ,
      • Timpka T.
      • et al.
      Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study.
      ), as well as reduced errors (
      • Santillana M.
      • et al.
      What can digital disease detection learn from (an external revision to) Google flu trends?.
      ). GFT exhibited high correlations (r = 0.92, 95%CI 0.90-0.94) with prescription sales data (
      • Patwardhan A.
      • Bilkovski R.
      Comparison: Flu Prescription Sales Data from a Retail Pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator.
      ) and a higher correlation with OpenTable restaurant availability (an online restaurant table reservation site) than official estimates of ILI (
      • Nsoesie E.O.
      • Kluberg S.A.
      • Brownstein J.S.
      Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports.
      ). GFT models provided underestimates and overestimates of influenza epidemics and seasons (
      • Olson D.
      • et al.
      Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales.
      ,
      • Valdivia A.
      • M.C.S
      Diseases tracked by using Google trends, Spain.
      ). Boyle et al. observed an increase in correlation with increased availability of weekly search data (
      • Boyle J.
      • et al.
      Prediction and surveillance of influenza epidemics.
      ). Samaras et al. challenged the use of correlations to validate approaches to surveillance, suggesting that any correlations under r = 0.90 could be prone to variable predictions and distribution spread (
      • Samaras L.
      • Garcia-Barriocanal E.
      • Sicilia M.
      Syndromic surveillance models using Web data: The case of scarlet fever in the UK.
      ).

      Surveillance using intelligence from social media data

      Social media or Internet-based environments that support the production and sharing of user content have been progressively discussed in literature. 12 studies using social media used Twitter. Other platforms included Chinese social media site Weibo, online restaurant websites and blogs (Table 1). Studies looked at the content of tweets in a similar way to news reports, utilizing computing approaches such as support vector machines (SVM), natural language processing and parsing. Other details can be obtained in addition to disease data, such as time stamps, user names, message types, the number of followers and user Internet Protocol addresses (
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ). Restaurant review forums can contain otherwise unknown information about the setting of foodborne illnesses (
      • Nsoesie E.O.
      • Kluberg S.A.
      • Brownstein J.S.
      Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports.
      ). Studies commonly evaluated their systems using Pearson’s and Spearman’s rank coefficients. Similar processes as those used for existing surveillance systems were used to construct systems for social media.

      Timeliness

      Some studies reported that social media trends ‘corresponded’ to official case counts (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ,
      • Mollema L.
      • et al.
      Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
      ). Most of the studies reporting timeliness or time difference outcomes found that social media data provided more timely alerts than official sources. Odlum & Yoon reported an increase in Tweets starting 3–7 days before the first Nigerian case of Ebola was announced by the Nigerian Ministry of Health and the CDC. Weibo data allowed reporting at an average of one hour before official websites (
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ). Another study also used a combination of Tweet volumes, search queries and traditional sources to produce timely estimates one week ahead of GFT (
      • Santillana M.
      • et al.
      Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.
      ).

      Accuracy

      Nine studies used correlation coefficients to evaluate the utility of social media. Studies using Twitter found moderate to strong correlations. One study found a stronger correlation between Twitter and official data to that of GFT (
      • Velardi P.
      • et al.
      Twitter mining for fine-grained syndromic surveillance.
      ). A proposed SVM model using a combination of Twitter and blog posts obtained high correlations with official data (
      • Woo H.
      • et al.
      Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.
      ). However, Tweet volumes were shown to be influenced by disease-related news videos (
      • Towers S.
      • et al.
      Mass Media and the Contagion of Fear: The Case of Ebola in America.
      ), while correlations also decreased over time in another study (
      • Chunara R.
      • Andrews J.R.
      • Brownstein J.S.
      Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
      ). Differences in significance were observed when assessing different types of Tweets and depended on the disease-related keywords used (
      • Nagel A.
      • et al.
      The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets.
      ). Spinn3r, a blog post aggregator was found to have a correlation of r = 0.626 with CDC data (
      • Corley C.D.
      • et al.
      Using Web and social media for influenza surveillance.
      ). OpenTable Restaurant website use gave moderate strength correlations with official data (
      • Nsoesie E.
      • Buckeridge D.
      • Brownstein J.
      Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance.
      ,
      • Nsoesie E.O.
      • Kluberg S.A.
      • Brownstein J.S.
      Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports.
      ).

      Computing methods and automated processes

      In all three main surveillance methods described, elements of computing and automated methodology were implemented. 7 studies combined these methods with statistical modelling. The most common methods involved machine-learning algorithms, such as Naïve Bayes and SVM used to classify text (
      • Torii M.
      • et al.
      An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.
      ,
      • Denecke K.
      • et al.
      How to exploit twitter for public health monitoring?.
      ,
      • Sofean M.
      • Smith M.
      A real-time disease surveillance architecture using social networks.
      ). Recent studies have focussed on improving the extraction and classification of geographical metadata (
      • Keller M.
      • et al.
      Use of unstructured event-based reports for global infectious disease surveillance.
      ,
      • Chanlekha H.
      • Collier N.
      A methodology to enhance spatial understanding of disease outbreak events reported in news articles.
      ). A study explored the use of a text classification framework to train machine-learning classifiers for use in surveillance systems (
      • Torii M.
      • et al.
      An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics.
      ). For signal generation, statistical methods such as regression, cumulative sum control chart and Farrington’s method have been used (
      • Andersson T.
      • et al.
      Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
      ,
      • Denecke K.
      • et al.
      How to exploit twitter for public health monitoring?.
      ).

      Statistical analysis methods for evaluation

      In 49 studies, statistical analysis was used to evaluate the performance of surveillance methods. Pearson’s correlation and Spearman’s rank coefficients were commonly used to determine the strength of correlations between unstructured data to official case data.

      Discussion

      We reviewed published studies on Internet-based methods for rapid epidemic intelligence to gain an overview of the available tools, their timeliness and accuracy outcomes (Figure 2). More than 20 different Internet-based data sources were identified across all three streams of surveillance, indicating the existing wide breadth of research in this area.
      Figure 2
      Figure 2Internet-based syndromic surveillance methods.
      HealthMap was the most commonly studied existing Internet-based surveillance system with a freely-accessible platform and user-friendly visual interface. Most existing systems were reported through various measures as being more timely than official sources. While Internet sources were known to exhibit intrinsic biases due to noise, some Internet-based surveillance systems were prone to biases due to their overreliance on a few sources (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      ,
      • Scales D.
      • Zelenev A.
      • Brownstein J.
      Quantifying the effect of media limitations on outbreak data in a global online web-crawling epidemic intelligence system, 2008-2011.
      ). However, different event-based systems can complement each other through differing characteristics in data acquisition, language, moderation or dissemination (
      • Christaki E.
      New technologies in predicting: preventing and controlling emerging infectious diseases.
      ). Thus, existing surveillance systems may be combined to improve individual attributes, as seen in the EAR project (
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ).
      GFT was the most commonly analysed surveillance method using search queries. Mixed results for GFT were expected given ongoing criticism and previous overestimates and underestimates of Influenza outbreaks (
      • Lazer D.
      • et al.
      The Parable of Google Flu: Traps in Big Data Analysis.
      ). Though improvements were shown in studies that modified GFT, the use of Relative Search Volumes (RSVs) by GFT and Google Trends presented issues for statistical analysis and validity. RSVs are index measures derived from the density of search interest from a specific region. Thus, statistical methods may struggle to compare RSVs to absolute or numerical figures (
      • Hulth A.
      • Rydevik G.
      Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010.
      ). These factors impair the validity of the multitude of studies analysing GFT for search query surveillance. Given the consistent timeliness of search query surveillance, queries from other search engines could be considered. However, a large volume of queries are required for valid analyses. As a result, search query-surveillance is more effective for diseases with moderate to high prevalence in countries with high numbers of Internet users (
      • Christaki E.
      New technologies in predicting: preventing and controlling emerging infectious diseases.
      ).
      Existing studies on social media platforms mainly used Twitter, which is a rich source of information for epidemic detection and has an accessible developer interface (
      • Al-garadi M.
      • et al.
      Using online social networks to track a pandemic: A systematic review.
      ). In addition to disease detection and syndromic surveillance, social media analysis has a wide application including evaluation of public sentiments or behaviours during a public health event (
      • Mollema L.
      • et al.
      Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
      ). However, methods using social media have yet to show consistent timeliness or accuracy to official data. In addition to the socio-political influences, geographic bias, sensationalism or falsified data affecting most media sources (
      • Yang Y.T.
      • Horneffer M.
      • DiLisio N.
      Mining social media and web searches for disease detection.
      ), social media data may be incomplete and inadequate for statistical analyses (
      • Nagel A.
      • et al.
      The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets.
      ). Ethical issues concerning online confidentiality may also limit future research use of social media (
      • Bernardo T.M.
      • et al.
      Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
      ).
      With reduced geographical, political and systematic barriers, Internet-based syndromic surveillance has demonstrated improved timeliness, sensitivity, accessibility and cost-effectiveness over traditional indicator-based surveillance. Internet-based surveillance can also be applied to health behaviours in addition to changes in disease-trends. Risk of geographical or cultural bias, low specificity or higher false positives are the main disadvantages affecting the accuracy of Internet-based surveillance systems. Therefore, the predominant view remains that Internet-based surveillance should complement but not replace traditional surveillance.
      For Internet-based surveillance methods that use machine learning algorithms, resources are required to construct annotated training datasets (
      • Lo S.
      • Chiong R.
      • Cornforth D.
      Using Support Vector Machine Ensembles for Target Audience Classificcation on Twitter.
      ). Having labelled training data available is also a preferred requirement for supervised machine learning (
      • Al-garadi M.
      • et al.
      Using online social networks to track a pandemic: A systematic review.
      ). Internet-based sources alone require resources for upkeep and limitations could preclude access to these sources. The value of using Internet-based surveillance systems has also been challenged due to weaknesses in data management capabilities and a lack of user confidence (
      • Christaki E.
      New technologies in predicting: preventing and controlling emerging infectious diseases.
      ). Given these resource requirements and depending on user preference, a ‘trade-off’ between timeliness and accuracy may be considered. In one survey, timeliness had higher priority than the completeness of data (
      • Riccardo F.
      • et al.
      Interfacing a Biosurveillance Portal and an International Network of Insitutional Analysts to Detect Biological Threats.
      ), while 50% of participants in another survey preferred having a moderator so that health events could be reviewed for completeness (
      • Schwind J.
      • et al.
      Evaluation of Local Media Surveillance for Improved Disease Recognition and Monitoring in Global Hotspot Regions.
      ).
      A variety of technical approaches were described in studies, emphasising the challenges associated with managing unstructured big data. These systems collect data from various sources and sometimes feed off of one another, resulting in considerable overlap (
      • O’Shea J.
      Digital disease detection: A systematic review of event-based internet biosurveillance systems.
      ). For example, ProMED mail uses HealthMap alerts for a large portion of its postings. In addition to many other data sources, the Healthmap collects data from ProMED Mail as well (
      • Healthmap
      Alert Sources.
      ). Furthermore, many systems use common data sources such as google news, social media and WHO reports. Constructing a new system requires consideration of complex automated processes and resources. The first challenge is to select suitable keywords for disease-related news articles, search queries or social media posts. Once keywords are selected, web crawlers or news aggregators are often implemented to collect relevant articles from a range of media. This step can be achieved with publicly-available or commercially-operated automatic data collection services. However, following extraction of relevant text, the next step involves detecting unusual trends or aberrations. Approaches to this are diverse and include machine-learning algorithms like SVM for entity recognition, ontologies for classifying text and statistical analyses to determine an appropriate baseline for assessing real-time disease trends (
      • Collier N.
      Uncovering text mining: A survey of current work on web-based epidemic intelligence.
      ). As data collection services cannot always provide complete information, including the geographic and demographic data required to construct a baseline, the difficulty is compounded for social media data (
      • Harris J.
      • et al.
      Health Department Use of Social Media to Identify Foodborne Illness - Chicago: Illinois, 2013-2014.
      ). Variations in human language and multiple spellings also predispose text mining and natural language processing (NLP) to error and bias (
      • Anholt R.
      • et al.
      Mining free-text medical records for companion animal enteric syndrome surveillance.
      ). Despite automated search and machine-learning algorithms, most systems rely on human moderators or curators to post news aggregates, which might result in delay or selection bias. While a study has used combination of NLP and transformation of text sources to reduce background noise in Twitter data (
      • Odlum M.
      • Yoon S.
      What can we learn about the Ebola outbreak from tweets?.
      ), future research should aim to continue optimizing methods that can improve the accuracy of data collected in Internet-based surveillance.
      This study has several limitations. Some systems identified, such as Google Flu Trends, BioCaster and EpiSPIDER were no longer available or publicly-accessible at the time this review was conducted, suggesting potential challenges in maintaining these systems. Due to the large variety of approaches and overlapping content discussed by literature, it was difficult to classify each article into distinct categories. We categorised each article by the data source used, but were aware that some articles used multiple types of sources or had other secondary aims relevant for other methods. The lack of streamlined methodology and definitions across all articles also had ramifications. While each article was reviewed carefully to ensure all outcomes relating to the timeliness and accuracy of systems were considered, definitions were susceptible to misinterpretation, introducing a potential reporting bias. Lack of consistency in methodologies across articles also precluded the possibility of conducting quantitative or meta-analyses of these articles. We acknowledge relationships between these internet-based systems, and that other data relationships might exist which are difficult to ascertain due to lack of information and transparency around these data sources.
      Though cross-disciplinary concepts such as ‘infoveillance’ and Internet-based ‘biosurveillance’ are frequently used in this research area, there is a lack of standardization or protocol to define these concepts. In addition to terms like timeliness and accuracy, qualitative attributes such as accessibility, flexibility, and acceptability require subjective interpretation. To encourage consistency in future studies on Internet-based syndromic surveillance, a rubric based on the CDC ‘Guidelines for Evaluating Public Health Surveillance Systems’ could be developed to analyse computational or statistical approaches used in Internet-based systems. Additionally, following the issues emphasised by Samaras et al. in interpreting correlation coefficients, the rubric should encompass ways to define, evaluate and report outcome measures such as data accuracy (
      • Samaras L.
      • Garcia-Barriocanal E.
      • Sicilia M.
      Syndromic surveillance models using Web data: The case of scarlet fever in the UK.
      ).

      Conclusions

      To address the need for more rapid methods to detect emerging infectious disease threats, several Internet-based surveillance methods have been studied in literature for use in early epidemic detection. With the development of advanced computing and statistical methods, several systems have incorporated online sources with promising results. Though there are several recognised disadvantages of using Internet-based sources in current literature, the advantages of timeliness are still valuable. There may be a necessary trade-off between timeliness and accuracy when choosing different surveillance tools. Surveillance needs vary depending on the objective, and when the objective is timely epidemic detection, Internet-based sources can complement and enhance traditional approaches to surveillance. To improve the utility of novel surveillance methods, future research should continue to assess key attributes such as timeliness and accuracy of these systems for enhancing rapid epidemic intelligence.

      Funding statement

      The authors received support from the NHMRC Centre for Research Excellence, Integrated Systems for Epidemic Response (ISER), grant number APP1107393.

      Conflict of interest statement

      Raina MacIntyre is director of a NHMRC funded Centre for Research Excellence “Integrated Systems for Epidemic Response”.

      Ethics approval

      Ethics approval was not required for this study.

      References

        • Ahmed S.S.
        • et al.
        Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
        PLoS One. 2015; 10: e0127406
        • Al-garadi M.
        • et al.
        Using online social networks to track a pandemic: A systematic review.
        J Biomed Inform. 2016; 62: 1-11
        • Alicino C.
        • et al.
        Assessing Ebola-related web search behaviour: insights and implications.
        Infect Dis Poverty. 2015; 4
        • Andersson T.
        • et al.
        Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
        Epidemiol Infect. 2014; 142: 303-313
        • Anholt R.
        • et al.
        Mining free-text medical records for companion animal enteric syndrome surveillance.
        Prev Vet Med. 2014; 113: 417-422
        • Anon
        Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).
        Checklist. 2017; ([cited 2017 22 June]; Available from: http://prisma-statement.org/Default.aspx)
        • Araz O.
        • Bentley D.
        • Muelleman R.
        Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska.
        Am J Emerg Med. 2014; 32: 1016-1023
        • Bahk G.
        • Kim Y.
        • Park M.
        Use of internet search queries to enhance surveillance of foodborne illness.
        Emerg Infect Dis. 2015; 21: 1906-1912
        • Bakker K.
        • et al.
        Digital epidemiology reveals global childhood disease seasonality and the effects of immunization.
        Proc Natl Acad Sci U S A. 2016; 113: 6689-6694
        • Barboza P.
        • et al.
        Evaluation of Epidemic Intelligence Systems Integrated in the Early Alerting and Reporting Project for the Detection of A/H5N1 Influenza Events.
        PLoS One. 2013; 8: e57252
        • Barboza P.
        • et al.
        Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
        PLoS One. 2014; 9: e90536
        • Bello-Orgaz G.
        • Jung J.
        • Camacho D.
        Social big data: Recent achievements and new challenges.
        Inf Fusion. 2016; 28: 45-59
        • Bernardo T.M.
        • et al.
        Scoping review on search queries and social media for disease surveillance: a chronology of innovation.
        J Med Internet Res. 2013; 15: e147
        • Bowsher G.
        • Milner C.
        • Sullivan R.
        Medical intelligence, security and global health: the foundations of a new health agenda.
        J R Soc Med. 2016; 109: 269-273
        • Boyle J.
        • et al.
        Prediction and surveillance of influenza epidemics.
        Med J Aust. 2011; 194: S28-S33
        • Brownstein J.S.
        • Freifeld C.C.
        HealthMap: the development of automated real-time internet surveillance for epidemic intelligence.
        Euro surveill. 2007; 12
        • Brownstein J.
        • et al.
        Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza.
        N Engl J Med. 2010; 362: 1731-1735
        • Brownstein J.
        Surveillance Sans Frontieres: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project.
        PLoS Med. 2008; 5: e151
        • Carneiro H.
        • Mylonakis E.
        Google Trends: A Web-Based Tool for Real-Time Surveillance of Disease Outbreaks.
        Clin Infect Dis. 2009; 49: 1557-1564
        • Chan E.
        • et al.
        Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance.
        PLoS Neglected Trop Dis. 2011; 5: e1206
        • Chanlekha H.
        • Collier N.
        A methodology to enhance spatial understanding of disease outbreak events reported in news articles.
        Int J Med Inform. 2010; 79: 284-296
        • Chanlekha H.
        • Kawazoe A.
        • Collier N.
        A framework for enhancing spatial and temporal granularity in report-based health surveillance systems.
        BMC Med Inform Decis Mak. 2010; 10
        • Chaudet H.
        • et al.
        Web Services Based Syndromic Surveillance for Early Warning within French Forces.
        MIE2006. 2006
        • Choi J.
        • et al.
        Web-based infectious disease surveillance systems and public health perspectives: a systematic review.
        BMC Public Health. 2016; 16
        • Christaki E.
        New technologies in predicting: preventing and controlling emerging infectious diseases.
        Virulence. 2015; 6: 558-565
        • Chunara R.
        • Freifeld C.
        • Brownstein J.
        New technologies for reporting real-time emergent infections.
        Parasitology. 2012; 139: 1843-1851
        • Chunara R.
        • Andrews J.R.
        • Brownstein J.S.
        Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
        Am J Trop Med Hygiene. 2012; 86: 39-45
        • Collier N.
        • et al.
        BioCaster: Detecting public health rumors with a Web-based text mining system.
        Bioinformatics. 2008; 24: 2940-2941
        • Collier N.
        Uncovering text mining: A survey of current work on web-based epidemic intelligence.
        Global Public Health. 2012; 7: 731-749
        • Corley C.D.
        • et al.
        Using Web and social media for influenza surveillance.
        Adv Exp Med Biol. 2010; 680: 559-564
        • Cowen P.
        • et al.
        Evaluation of ProMED-mail as an electronic early warning system for emerging animal diseases: 1996 to 2004.
        J Am Vet Med Assoc. 2006; 229: 1090-1099
        • Davidson M.
        • Haim D.
        • Radin J.
        Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions.
        Sci Rep. 2015; 5: 1-5
        • Declich S.
        • Carter A.
        Public health surveillance: historical origins, methods and evaluation.
        Bull World Health Organ. 1994; 72: 285-304
        • Denecke K.
        • et al.
        Event-Driven Architecture for Health Event Detection from Multiple Sources.
        Stud Health Technol Inform. 2011; 169: 160-164
        • Denecke K.
        • et al.
        How to exploit twitter for public health monitoring?.
        Methods Inf Med. 2013; 52: 326-339
        • Desai R.
        • et al.
        Norovirus Disease Surveillance Using Google Internet Query Share Data.
        Clin Infect Dis. 2012; 55: e75-78
        • Dion M.
        • AbdelMalik P.
        • Mawudeku A.
        Big Data and the Global Public Health Intelligence Network (GPHIN).
        Can Commun Dis Rep. 2015; 41: 209-214
        • Domnich A.
        • et al.
        Demand-based web surveillance of sexually transmitted infections in Russia.
        Int J Public Health. 2014; 59: 841-849
        • Edelstein M.
        • et al.
        Detecting the norovirus season in sweden using search engine data – Meeting the needs of hospital infection control teams.
        PLoS One. 2014; 9: e100309
        • Eysenbach G.
        Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance.
        American Medical Informatics Association Annual Symposium. 2006;