If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Corresponding author at: School of Public Health and Community Medicine, Samuels Building, Room 325, Faculty of Medicine, University of New South Wales, Sydney, 2052, NSW, Australia. Tel: +61 2 9385 3811; Fax: +61 2 9313 6185.
School of Public Health and Community Medicine, University of New South Wales, AustraliaCollege of Public Service and Community Solutions, Arizona State University, Phoenix, USA
Rapid epidemic detection is an important objective of surveillance to enable timely intervention, but traditional validated surveillance data may not be available in the required timeframe for acute epidemic control. Increasing volumes of data on the Internet have prompted interest in methods that could use unstructured sources to enhance traditional disease surveillance and gain rapid epidemic intelligence. We aimed to summarise Internet-based methods that use freely-accessible, unstructured data for epidemic surveillance and explore their timeliness and accuracy outcomes.
Methods
Steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist were used to guide a systematic review of research related to the use of informal or unstructured data by Internet-based intelligence methods for surveillance.
Results
We identified 84 articles published between 2006-2016 relating to Internet-based public health surveillance methods. Studies used search queries, social media posts and approaches derived from existing Internet-based systems for early epidemic alerts and real-time monitoring. Most studies noted improved timeliness compared to official reporting, such as in the 2014 Ebola epidemic where epidemic alerts were generated first from ProMED-mail. Internet-based methods showed variable correlation strength with official datasets, with some methods showing reasonable accuracy.
Conclusion
The proliferation of publicly available information on the Internet provided a new avenue for epidemic intelligence. Methodologies have been developed to collect Internet data and some systems are already used to enhance the timeliness of traditional surveillance systems. To improve the utility of Internet-based systems, the key attributes of timeliness and data accuracy should be included in future evaluations of surveillance systems.
Background
Broadly, ‘intelligence’ is defined as information collected, analysed and converted to gain insights. Intelligence can be described as a process or product (
). Intelligence is generally formed through distinct steps such as data collection, processing, analysis, dissemination, feedback and tasking. Firstly, a specific request for intelligence is issued. Information-gathering methods are used to collect unstructured data. Following conversion to a manageable format, data can be interpreted and a final report produced. Feedback can inform a subsequent task (
). Though criticism exists in literature regarding this simplified model of intelligence, public health surveillance follows similar steps in case detection, reporting, analysis and confirmation of cases (
). The application of information technology for electronic data collection and interpretation have encouraged Internet-based methods that can inform rapid epidemic intelligence on public health events (
). The International Health Regulations issued by the World Health Organization (WHO) for health threat detection emphasized the importance of both indicator-based and event-based components of epidemic intelligence for the early detection of events (
). Informal information sources are important, with the WHO reporting that more than 60% of initial disease epidemic reports come from unofficial sources (
). For the purposes of this report, a disease epidemic is “The occurrence of cases of disease in excess of what would normally be expected in a defined community, geographical area or season” (
). Public health surveillance, hereby referred to as ‘surveillance’, is defined by the WHO as: “The continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice.” (
Rapid epidemic detection and real-time monitoring are important objectives of syndromic surveillance to minimise the morbidity and mortality caused by infectious diseases. Different types of surveillance can be conducted according to desired objectives. Sociocultural or ethical issues are also considered, with cost-effectiveness having implications on system feasibility (
). Active surveillance involves regular monitoring of sources, providing the most complete information but ideally requiring trained epidemiologists. In contrast, passive surveillance is less resource intensive, involving regular reports from a wide range of sources and chance detection of cases. Integrated or enhanced surveillance makes use of both active and passive systems (
Jamison D. Public Health Surveillance: A Tool for Targeting and Monitoring Interventions, in Disease Control Priorities in Developing Countries. Oxford University Press,
New York NY2006: 997-1016
). Active syndromic surveillance, or reporting based on clinical case definitions and pre-diagnostic data can be conducted in settings lacking laboratory confirmation to support diagnoses (
). Syndromic surveillance systems have therefore been established for early epidemic detection to minimize mortality and morbidity associated with emerging disease threats (
During infectious disease epidemics, validated data collected through traditional or indicator-based surveillance methods may not be available for timely use. Surveillance has traditionally involved the monitoring of public health indicators from a range of sources. Key indicators outlined by the WHO in 1968 initially included mortality, morbidity, clinical data, laboratory reports, relevant field investigations, surveys, animal or vector studies, demographic and environmental data (
). Other data have since been included, such as hospital statistics, disease registries, over-the-counter drug sales data, school or work absenteeism, telephone triage calls and news reports (
). Traditional surveillance methods have disadvantages in timeliness and sensitivity, attributable to factors such as a lengthy data validation process, bureaucratic barriers, higher costs and resource requirements (
). The time lag between the onset of epidemics and released official reports may render surveillance data redundant for the purpose of early detection and response (
To address these disadvantages, researchers have proposed methods utilising newer technologies like the Internet, mobile phones, improved point-of-care diagnostic tools and other event-based surveillance methods (
). In conjunction with developments in computing or automated technology, Internet-based methods can handle and utilise ‘big data’ collected from informal sources. Big datasets are terabytes to petabytes in size, requiring higher-level software tools to capture, store, manage and analyse effectively (
Internet-based sources have potential to provide more timely information for detecting infectious disease epidemics, such as by identifying events or clusters of disease-related keywords in local news reports or social media (
). Through a scoping review, Bernardo et al. traced the use of Internet-based sources for disease surveillance back to 2006, with early work focusing on influenza. Pioneering studies introduced ‘infodemiology’, or the study of the determinants and distribution of health information (
). Subsequent methods have shown promising results for monitoring diseases such as influenza and foodborne illness. Through participatory surveillance, FluNearYou uses voluntary online surveys asking for demographic data and symptoms to form weekly trends of influenza-like illnesses (ILI) in the United States (
). However FluNearYou and other online systems are not widely used by the community and their overall utility is questionable.
However, despite available evidence supporting the use of Internet-based sources to complement traditional surveillance, informal sources are inherently more prone to biases. Systems using social media are affected by several factors, including the algorithms employed (
) and expected false positives and negatives due to background noise. Several sources maintain that Internet-based surveillance should be used to supplement traditional methods of surveillance (
Previous reviews defined epidemic intelligence, differentiating indicator-based from event-based surveillance and evaluating existing event-based surveillance systems (
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
). The Early Alerting and Reporting (EAR) project was developed in 2008 to incorporate several Internet-based surveillance systems and intelligence specialists for optimising the detection of biological threats (
). Wilson and Brownstein emphasised the value of ‘search-term surveillance’ involving the monitoring of aggregated keyword searches or queries performed by Internet users (
). However, few reviews have summarised the core outcomes of timeliness and accuracy in relation to Internet-based surveillance methods studied in literature. Thus, we aimed to summarise recent studies looking at existing Internet-based surveillance methods and explore any subsequent timeliness and accuracy measures reported by each study.
Objectives
The primary aim of this study was to identify and summarise the types of Internet-based surveillance methods studied in recent literature. A secondary aim was to identify and summarise the timeliness and accuracy outcomes of Internet-based methods described in literature.
Method
An initial exploratory review was conducted to gain familiarity with the discipline of epidemic intelligence and to generate possible questions or keywords for a review of related studies. Steps outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 checklist (
) were then used to guide a review of research related to the use of informal data by Internet-based intelligence methods for rapid infectious disease epidemic surveillance. We systematically searched and reviewed published studies which used internet-based methods.
Search strategy
Based on the primary aim to explore web-based intelligence methods for early disease surveillance and an exploratory literature review, the keywords of ‘web-based’, ‘surveillance’ and ‘epidemic intelligence’ were identified. Combinations of these words and their synonyms were inputted into online databases including MEDLINE, EMBASE, SCOPUS, Web of Science and Global Health. Database searches involved strings and Boolean combinations of relevant Medical Subject Heading (MeSH) terms such as ‘disease outbreaks’, ‘epidemics’, ‘Internet’, ‘population surveillance’, ‘epidemiological monitoring’ were used. Electronic records of all references were imported into EndNote.
Eligibility criteria
Studies included in this review were limited to publicly-available, peer-reviewed articles published between years 2006-2016 and written in the English language. Title and abstracts were screened. We excluded any articles which did not refer to surveillance or intelligence methods using unstructured or open-source data for detecting new or emerging outbreaks of infectious disease. All surveillance types were considered relevant and surveillance for symptoms or complications of infectious disease were included. Articles were excluded if (i) full article was not publicly available or (ii) if the abstract and article were not related to infectious disease epidemics or surveillance methods using unstructured data. Reviews, letters, editorials, perspectives and conference papers that already had journal articles published were also excluded.
For each included study, we collected quantitative and qualitative information pertaining to data collection or analysis methods. We grouped our assessment based under 3 key streams of Internet-based disease surveillance: (i) existing surveillance systems and news aggregators, (ii) search query surveillance and (iii) social media surveillance, based on the data sources used. For each stream, we explored any outcomes describing the timeliness (temporal difference between alerts issued by the surveillance system and the onset of an outbreak or official alerts) or accuracy (correlation strength between collected informal data and officially released disease data) of the methods used to collect data from the Internet.
Results
A total of 84 studies were identified (Figure 1) as relevant in describing Internet-based methods using open-access, unstructured data for rapid epidemic detection.
Figure 1Adapted PRISMA diagram showing the progression of article inclusion and exclusion.
Existing surveillance systems and news aggregators
29 studies used existing Internet-based surveillance systems developed by researchers and public health authorities (Table 1). These systems focused on event-based reporting and syndromic surveillance. Studies used and evaluated the operation or technical development of these systems. HealthMap and the former EpiSPIDER system were the only systems that collected data from a social media platform (
). Other systems used unstructured data from news reports and official data, but not from social media platforms. Most systems collected data using automated tools and involved a hierarchical process to analyse the data (see Table 2). BioCaster, ProMED-mail and HealthMap required human judgement in the form of curators and analysts to correct occasional mistakes (
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
). Information is categorized according to predefined multilingual disease categories. Known names and events are identified and statistics calculated to detect emerging threats (
Collects media articles daily. Bayesian tools are used to identify relevant articles. Human experts review these articles manually and determine relevance based on indicators. Experts then write reports (
Collects public health news articles. Extracted data are aggregated and categorized. Identifies entities and uses the Pattern-based Understanding and Learning System (PULS) to cluster information (
Receives information through emails or reports from professionals or anyone around the world. Staff members search the Internet and traditional media. Filtered by one editor, who may reject entries based on relevance and accuracy. Most reports sent to expert moderators.
Uses both language-specific keywords and text extraction algorithms.
Uses full text translation first, uses only English language selection algorithms.
Employs regional specialists fluent in a language.
Classifies news articles according to multilingual categories.
Employs multilingual personnel.
Moderation
Partial
Partial
Automated
Partial
Automated
Manual
Mode of dissemination
Allow users to specify parameters for pushed information. Only reports classified as ‘breaking news’ (the top priority alert level is added as a marker to the map (
Final reports (following colour code ranking) shared on website. Mailing lists and listserv software that distributes reports to at least 1 of 11 mailing lists (
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.
In a study analysing seven surveillance systems, Barboza et al. found that the systems issued alerts for human cases between 1.9 days (95%CI −0.4-4.1) to 6.1 days (95%CI 3.1-9.1) on average before WHO reports, but together did not detect 7% of public health events before official reports (
). In another study, nongovernmental data sources were found to communicate initial reports a median of 10 days earlier than governmental sources, but this difference was not significant (
). Though based on a small sample, a study assessing epidemic alerts issued by the French surveillance tool Bulletin Hebdomadaire International found that GPHIN, a source used by the tool, exhibited a mean delay of 1 month and 19 days in providing an outbreak signal after the first retrospective occurrence of the disease, while ProMED-mail exhibited a mean delay of 2 months and 4 days (
). ProMED-mail provided the first relevant indication of an outbreak in the 2014 Ebola epidemic around three months after the first retrospectively suspected occurrence of Ebola (
). Argus provided reports of the first pandemic H1N1 2009 case 1 to 16 days before WHO for 42 countries, but provided reports on the same day as WHO for 21 countries (
Reported correlations between Internet-based systems and official data have generally been positive but variable. HealthMap was found to correlate well with traditional surveillance (
). One study found that ProMED-mail reports showed variable correlations, but that 20% of events detected by ProMED-mail was not detected by a national system (
). An assessment of two data sources used by HealthMap found a weekly pattern and ‘crowding out’ phenomenon caused by unbalanced or heightened media attention (
Search queries, in which Internet users request specific information from web-based search engines have been implemented for early epidemic detection. Google Flu Trends (GFT) was the most common data source, used by 19 studies for comparisons with official case reports of Influenza or Influenza-like illness (ILI). Not all sources were open-access and some were obtained directly from the search engine provider (
). Methods most commonly employed a syndromic surveillance approach, comparing search frequencies of disease-related keywords with a pre-existing official dataset and using statistical methods to validate the informal data.
Timeliness
Generally, search queries provided earlier epidemic alerts than official data sources. GFT preceded official data by 1-2 weeks (
Correlations with validated surveillance varied in strength, and limitations were mentioned. Moderate correlations were reported for Google Trends data (
). Studies using Baidu reported both strong (r = 0.98, r = 0.96) and weak (r = 0.43) correlations and noted the coherence of curves produced using a search index compared to official cases (
). Samaras et al. challenged the use of correlations to validate approaches to surveillance, suggesting that any correlations under r = 0.90 could be prone to variable predictions and distribution spread (
Surveillance using intelligence from social media data
Social media or Internet-based environments that support the production and sharing of user content have been progressively discussed in literature. 12 studies using social media used Twitter. Other platforms included Chinese social media site Weibo, online restaurant websites and blogs (Table 1). Studies looked at the content of tweets in a similar way to news reports, utilizing computing approaches such as support vector machines (SVM), natural language processing and parsing. Other details can be obtained in addition to disease data, such as time stamps, user names, message types, the number of followers and user Internet Protocol addresses (
). Studies commonly evaluated their systems using Pearson’s and Spearman’s rank coefficients. Similar processes as those used for existing surveillance systems were used to construct systems for social media.
Timeliness
Some studies reported that social media trends ‘corresponded’ to official case counts (
Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
). Most of the studies reporting timeliness or time difference outcomes found that social media data provided more timely alerts than official sources. Odlum & Yoon reported an increase in Tweets starting 3–7 days before the first Nigerian case of Ebola was announced by the Nigerian Ministry of Health and the CDC. Weibo data allowed reporting at an average of one hour before official websites (
Nine studies used correlation coefficients to evaluate the utility of social media. Studies using Twitter found moderate to strong correlations. One study found a stronger correlation between Twitter and official data to that of GFT (
In all three main surveillance methods described, elements of computing and automated methodology were implemented. 7 studies combined these methods with statistical modelling. The most common methods involved machine-learning algorithms, such as Naïve Bayes and SVM used to classify text (
Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
In 49 studies, statistical analysis was used to evaluate the performance of surveillance methods. Pearson’s correlation and Spearman’s rank coefficients were commonly used to determine the strength of correlations between unstructured data to official case data.
Discussion
We reviewed published studies on Internet-based methods for rapid epidemic intelligence to gain an overview of the available tools, their timeliness and accuracy outcomes (Figure 2). More than 20 different Internet-based data sources were identified across all three streams of surveillance, indicating the existing wide breadth of research in this area.
HealthMap was the most commonly studied existing Internet-based surveillance system with a freely-accessible platform and user-friendly visual interface. Most existing systems were reported through various measures as being more timely than official sources. While Internet sources were known to exhibit intrinsic biases due to noise, some Internet-based surveillance systems were prone to biases due to their overreliance on a few sources (
). However, different event-based systems can complement each other through differing characteristics in data acquisition, language, moderation or dissemination (
GFT was the most commonly analysed surveillance method using search queries. Mixed results for GFT were expected given ongoing criticism and previous overestimates and underestimates of Influenza outbreaks (
). Though improvements were shown in studies that modified GFT, the use of Relative Search Volumes (RSVs) by GFT and Google Trends presented issues for statistical analysis and validity. RSVs are index measures derived from the density of search interest from a specific region. Thus, statistical methods may struggle to compare RSVs to absolute or numerical figures (
). These factors impair the validity of the multitude of studies analysing GFT for search query surveillance. Given the consistent timeliness of search query surveillance, queries from other search engines could be considered. However, a large volume of queries are required for valid analyses. As a result, search query-surveillance is more effective for diseases with moderate to high prevalence in countries with high numbers of Internet users (
Existing studies on social media platforms mainly used Twitter, which is a rich source of information for epidemic detection and has an accessible developer interface (
). In addition to disease detection and syndromic surveillance, social media analysis has a wide application including evaluation of public sentiments or behaviours during a public health event (
Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013.
). However, methods using social media have yet to show consistent timeliness or accuracy to official data. In addition to the socio-political influences, geographic bias, sensationalism or falsified data affecting most media sources (
With reduced geographical, political and systematic barriers, Internet-based syndromic surveillance has demonstrated improved timeliness, sensitivity, accessibility and cost-effectiveness over traditional indicator-based surveillance. Internet-based surveillance can also be applied to health behaviours in addition to changes in disease-trends. Risk of geographical or cultural bias, low specificity or higher false positives are the main disadvantages affecting the accuracy of Internet-based surveillance systems. Therefore, the predominant view remains that Internet-based surveillance should complement but not replace traditional surveillance.
For Internet-based surveillance methods that use machine learning algorithms, resources are required to construct annotated training datasets (
). Internet-based sources alone require resources for upkeep and limitations could preclude access to these sources. The value of using Internet-based surveillance systems has also been challenged due to weaknesses in data management capabilities and a lack of user confidence (
). Given these resource requirements and depending on user preference, a ‘trade-off’ between timeliness and accuracy may be considered. In one survey, timeliness had higher priority than the completeness of data (
A variety of technical approaches were described in studies, emphasising the challenges associated with managing unstructured big data. These systems collect data from various sources and sometimes feed off of one another, resulting in considerable overlap (
). For example, ProMED mail uses HealthMap alerts for a large portion of its postings. In addition to many other data sources, the Healthmap collects data from ProMED Mail as well (
). Furthermore, many systems use common data sources such as google news, social media and WHO reports. Constructing a new system requires consideration of complex automated processes and resources. The first challenge is to select suitable keywords for disease-related news articles, search queries or social media posts. Once keywords are selected, web crawlers or news aggregators are often implemented to collect relevant articles from a range of media. This step can be achieved with publicly-available or commercially-operated automatic data collection services. However, following extraction of relevant text, the next step involves detecting unusual trends or aberrations. Approaches to this are diverse and include machine-learning algorithms like SVM for entity recognition, ontologies for classifying text and statistical analyses to determine an appropriate baseline for assessing real-time disease trends (
). As data collection services cannot always provide complete information, including the geographic and demographic data required to construct a baseline, the difficulty is compounded for social media data (
). Despite automated search and machine-learning algorithms, most systems rely on human moderators or curators to post news aggregates, which might result in delay or selection bias. While a study has used combination of NLP and transformation of text sources to reduce background noise in Twitter data (
), future research should aim to continue optimizing methods that can improve the accuracy of data collected in Internet-based surveillance.
This study has several limitations. Some systems identified, such as Google Flu Trends, BioCaster and EpiSPIDER were no longer available or publicly-accessible at the time this review was conducted, suggesting potential challenges in maintaining these systems. Due to the large variety of approaches and overlapping content discussed by literature, it was difficult to classify each article into distinct categories. We categorised each article by the data source used, but were aware that some articles used multiple types of sources or had other secondary aims relevant for other methods. The lack of streamlined methodology and definitions across all articles also had ramifications. While each article was reviewed carefully to ensure all outcomes relating to the timeliness and accuracy of systems were considered, definitions were susceptible to misinterpretation, introducing a potential reporting bias. Lack of consistency in methodologies across articles also precluded the possibility of conducting quantitative or meta-analyses of these articles. We acknowledge relationships between these internet-based systems, and that other data relationships might exist which are difficult to ascertain due to lack of information and transparency around these data sources.
Though cross-disciplinary concepts such as ‘infoveillance’ and Internet-based ‘biosurveillance’ are frequently used in this research area, there is a lack of standardization or protocol to define these concepts. In addition to terms like timeliness and accuracy, qualitative attributes such as accessibility, flexibility, and acceptability require subjective interpretation. To encourage consistency in future studies on Internet-based syndromic surveillance, a rubric based on the CDC ‘Guidelines for Evaluating Public Health Surveillance Systems’ could be developed to analyse computational or statistical approaches used in Internet-based systems. Additionally, following the issues emphasised by Samaras et al. in interpreting correlation coefficients, the rubric should encompass ways to define, evaluate and report outcome measures such as data accuracy (
To address the need for more rapid methods to detect emerging infectious disease threats, several Internet-based surveillance methods have been studied in literature for use in early epidemic detection. With the development of advanced computing and statistical methods, several systems have incorporated online sources with promising results. Though there are several recognised disadvantages of using Internet-based sources in current literature, the advantages of timeliness are still valuable. There may be a necessary trade-off between timeliness and accuracy when choosing different surveillance tools. Surveillance needs vary depending on the objective, and when the objective is timely epidemic detection, Internet-based sources can complement and enhance traditional approaches to surveillance. To improve the utility of novel surveillance methods, future research should continue to assess key attributes such as timeliness and accuracy of these systems for enhancing rapid epidemic intelligence.
Funding statement
The authors received support from the NHMRC Centre for Research Excellence, Integrated Systems for Epidemic Response (ISER), grant number APP1107393.
Conflict of interest statement
Raina MacIntyre is director of a NHMRC funded Centre for Research Excellence “Integrated Systems for Epidemic Response”.
Ethics approval
Ethics approval was not required for this study.
References
Ahmed S.S.
et al.
Surveillance for Neisseria meningitidis disease activity and transmission using information technology.
Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales.
Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks.