Database of epidemic trends and control measures during the first wave of COVID-19 in mainland China

Highlights • COVID-19 measures were applied on similar dates in provinces throughout China.• Disease severity was much greater in Hubei compared with other provinces.• Provincial data on epidemics and interventions is available for further research.


Introduction
The COVID-19 outbreak was first reported in Wuhan City of Hubei Province, China in late December 2019 (Xinhua, 2020b). From late January 2020, many provinces in China began to report confirmed COVID-19 cases. To control the epidemic, stringent social distancing, travel restrictions, contact tracing, environmental disinfection and other strategies were implemented. While other countries reported rising numbers of infection, a declining epidemic trend was observed in China from late February 2020. Considering the global spread of the pathogen, the World Health Organization (2020) declared COVID-19 a pandemic on 11 March 2020. Although an increase has been observed in the number of confirmed cases since June 2020, the overall epidemic size remains small in China.
The Imperial College London COVID-19 Response Team initiated activities of data collection in mid-January, to understand the epidemic in China. The Imperial Team, together with volunteers, made considerable efforts to collating aggregated data as well as individual patient information from publicly available, national and local situation reports published by health authorities in China.
While individual case or death reports are crucial for informing the determinants of disease severity and fatality in the emergence of the epidemic (Verity et al., 2020), the reporting was scattered and became unfeasible when new cases increased exponentially. Alternatively, aggregated notifications of cases and contacts were more accessible and mostly recorded in a standardised format across provinces in China. In addition to these indicators derived from the surveillance system, empirical experiences from the implementation of control measures are essential for interpreting the variation relating to these epidemic trends in the context.
Building on other existing data collection activities (Xu et al., 2020, Zhang et al., 2020b, the data we extracted from the Chinese official reports can also be useful for the wider research community. In this article, we aim to publish the collated data and present an overview of the epidemic trends and control measures in China based on a descriptive analysis. These exploratory findings highlight the potential applications of the database and provide insights for epidemic response in other countries.

Data collation
Situation reports of the COVID-19 epidemic from mid-January up to 31 March 2020 in 31 provinces/municipalities (with equivalent levels of administration) of mainland China were extracted. We downloaded these reports from websites of local health commissions and used Google J o u r n a l P r e -p r o o f translate to obtain English versions for each province/municipality. In addition, reports from the National Health Commission website and Wuhan City Health Commission website were included.
We extracted aggregated numbers of cases, deaths, recoveries, contacts, and details on disease severity and case importation from the official reports released each day (Table 1). These quantitative results of each province/municipality were extracted into a spreadsheet. Each record entry was independently checked and compared with the original situation reports by a second researcher. Both the spreadsheet and original situation reports are available at Github: https://github.com/mrc-ide/covid19_mainland_China_report.
We reviewed the timing of implementation and subsequent lifting of the following control measures: i) cancellation of cross-province public transportation; ii) temperature checks for inbound travellers at provincial borders; and iii) community-level lockdown (so-called 'closed-off management', including measures such as shop closure and ban of non-resident entry (Zhu, 2020)). We searched official notices and announcements published by the national and provincial/municipal governments as well as local news for information on these non-pharmaceutical interventions. Closure and reopening dates of primary, middle, and high schools, and universities were also extracted.
Additionally, we monitored progress of economic activity resumption through the reopening of 'designated enterprises', which contain registered companies with an annual revenue exceeding 2.8 million United States Dollars (20 million Chinese Yuans) (National Bureau of Statistics of China, 2018).

Descriptive analysis of epidemic trends
Based on the aggregated data collated for each province/municipality, we conducted a descriptive analysis to understand the epidemic trends and their possible association with the interventions implemented. We focused on the "six provinces" (Hubei, Guangdong, Henan, Zhejiang, Hunan, and Anhui) reporting the highest numbers of confirmed cases up to the end of March 2020. These provinces together accounted for 90% of the total COVID-19 cases in mainland China. Hubei alone accounted for 80% of the total number of reported cases (Figure 1).
We first calculated the proportion of recoveries from 15 January to 31 March, by: Note that almost all confirmed cases were hospitalised for isolation and medical care in mainland China and hospitals were responsible for reporting cases to the surveillance system. Recoveries in such setting were defined as hospitalised cases who meet criteria of discharge, including symptom To investigate the disease severity across provinces, we obtained the crude case-fatality ratio (cCFR) Confidence intervals (CI) of the cCFRs were calculated based on binomial distributions, with an underlying assumption that all cases with unresolved outcomes would eventually recover. Although cCFR may be biased in reflecting disease severity due to under-ascertainment and delays of death reporting (Garske et al., 2009), it can be an approximate estimate near the end of an epidemic, when the capacity of case detection improves and outcomes of cases are mostly known. We further captured varying need for critical care over time using the distribution of case severity among currently hospitalised cases: ( )/ ( ) × 100%.
According to the guideline for contact investigation published by the Chinese Center for Disease Control and Prevention (2020), those who have close contact with a confirmed case of COVID-19, during the period from two days prior to their symptom onset to isolation, should be quarantined at home or a specific facility for 14 days. To demonstrate the scale and effort involved in contact tracing across provinces, we calculated the ratios of contact-to-case by: ( )/ ( ). In this analysis, the calculation of contact-to-case ratio was first conducted based on cumulative numbers up to 31 March for each of the six provinces. We then derived the same ratio by taking the newly reported numbers at the national level over the observation period. As it is also recommended in the guideline that epidemiological surveys into the contact history of new cases should be completed within 24 hours from case confirmation, we conducted an alternative analysis assuming a 1-day lag between the reporting of cases and contacts.

Overview of COVID-19 control measures
On 26  Juxtaposed with keys dates for initiating and lifting three most common control measures, Figure 3 shows daily confirmed cases over time in the six provinces in China. Measures related to provincial border controlcancellation of cross-province public transportation (CC), and temperature checks for inbound travellers at provincial borders (TC)were consistently imposed in late January across provinces. Nevertheless, at the time of introduction of these measures, there were more than 700 cumulative cases reported in Hubei while less than 150 cases were observed in the other provinces, where the local epidemic was at an earlier stage. The implementation of community-level closed-off management (CM) was generally introduced during the peak of the local epidemic. Relaxation of these measures varied by province but related to the decline of daily confirmed cases. Except for Hubei, the local epidemic in the other five provinces was mostly suppressed in late February.
Zhejiang and Guangdong provinces, where international airports are located, reported a second wave of COVID-19 driven by incoming travellers from the beginning of March. However, the caseload J o u r n a l P r e -p r o o f caused by this second wave was much smaller than the first one, as measures to stop secondary transmission were put in place at the border for inbound passengers (Zhang, 2020).

Descriptive analysis of COVID-19 epidemics in the six provinces with the highest total caseload
We explored the association between the epidemic trend and healthcare burden by the time-varying proportions of total confirmed cases who recovered (Figure 4). Most provinces reported 50% of recovery by mid-February, 2~3 weeks after the peak of daily confirmed cases seen in late January or early February. The national trend was delayed by approximate 10 days due to the severe epidemic in Hubei, where the peak of daily cases and 50% of recovery occurred on 12 and 29 February, respectively. From 50% to 90% of recovery, there was a longer duration seen in Guangdong and Hubei compared to the other provinces, as newly reported cases continued to increase in late February and early March.
A wide variation in cCFRs by 31 March 2020 is reported in each province (Table 2). Whereas most of other provinces show a cCFR less than 1%, Hubei had a cCFR of 4.71% (95%CI 4.55%-4.87%).
In addition, Henan province had the second-highest cCFR (1.73%, 95% CI 1.09%-2.61%) among the six provinces analysed. The most affected areas in Henan province -Xinyang City, Nanyang City, and Zhumadian Cityare adjacent to Hubei province and many workers returned from Wuhan before the lockdown due to the Chinese New Year holiday (Su and Song, 2020). Both the geographical and social connections with Hubei may thus explain a stronger impact of the COVID-19 epidemic in Henan.
The provincial difference in disease severity was also found in the analysis of hospitalised cases ( Figure 5), with Hubei reporting a particularly high proportion of critical and severe cases (20-30%).
Regarding the temporal trend, a tendency to capture cases with more serious symptoms was consistently shown across the six provinces with highest caseload in early February ( Figure 5A). In March, the proportion of critical or severe cases increased again while the total numbers of hospitalised cases declined, reflecting a longer hospitalisation period of severe cases compared to mild cases. However, a distinct trend was seen in Guangdong from mid-March, showing a sharp decline in the proportion of critical or severe cases. This decline coincided with the increase of cases imported from foreign countries, who tended to have mild symptoms ( Figure 5B).
We finally investigated the scale of contact tracing involved in infection control at national and provincial levels, using the ratio of total contacts to total cases by the end of March (Table 3). On J o u r n a l P r e -p r o o f average, 20-40 close contacts were traced per confirmed case. Hubei province reported a particularly low contact-to-case ratio compared to other provinces. To further explore the change in the number of contacts traced over the epidemic, we calculated the contact-to-case ratio again with the daily numbers of reported contacts and confirmed cases ( Figure 6). There were less than 20 contacts traced for each new case over most of January and February; however, the contact-to-case ratio increased in March. This increase in the ratio was caused by an increase in the number of total contacts reported in provinces outside Hubei, likely due to imported cases. In the exploratory scenario to address the 1day delay of contact tracing following case confirmation, the general trend of the contact-to-case ratio over time was consistent with the scenario without the consideration of delay.

Discussion
We

J o u r n a l P r e -p r o o f
While the timing of school reopening in China depended on the local epidemic situation (China Central Television, 2020b), the general strategy was shared across multiple provinces (Figure 3).
Staged reopening was widely observed: senior students in middle and high schools were suggested to return first, while junior students and elementary schools were to follow a week later. In terms of returning to work, most provinces demonstrated rapid resumption of business activity after constraints on travel and commuting were relaxed (Figure 4). However, this rapid resumption was found in the reoperation of 'designated enterprises', which excluded enterprises not in key industries or smaller scale enterprises. Additional surveys on detailed indicators of resumption, such as production capacity and attendance of employees, and resumption in other aspects of economic activities will be useful in fully understanding the progress of restoration and inequality of the COVID-19 impact.
A particularly high cCFR was reported in Hubei compared to the other provinces in the analysis.
Such difference in disease severity was also consistently observed in the proportion of critical and severe cases over the past few months. Another study that accounted for right-censoring in estimating the true case fatality demonstrated similar discrepancy across provinces that we observed from the cCFRs (Deng et al., 2020). Such similarity also supports the use of cCFRs extracted near the end of the epidemic, as these crude proportions become more stabilised ( Figure S1) and close to the true burden (Garske et al., 2009). This severe burden is potentially driven by the explosive increase of cases that overwhelmed local healthcare services during the peak of epidemic, or the Contact tracing has been implemented nationally since the beginning of the epidemic, as a key strategy coupled with case management in China (Li et al., 2020). We found for every confirmed case, an average of 4 contacts were traced in Hubei, where over 20 contacts per case were traced in other provinces (Table 3). These numbers of contacts are consistent with the average number of daily contacts from diary-based contact surveys in Wuhan City and Shanghai. Approximate 2 and 17 daily contacts for each citizen were reported during the lockdown and before the COVID-19 epidemic, respectively (Zhang et al., 2020a), suggesting that stringent social distancing policies could modify the contact patterns and reduce the number of contacts. The overall case burden in each province may also affect the number of contacts that can be traced by the local public health authority. From early March, there was an increase in the number of contacts traced per case, which may be due to large clusters of contacts who shared the same flights and trains with imported cases travelling from foreign countries. However, we cannot exclude the possibility that the increase in the contact-to-case ratio was due to increased investment in both personnel training and establishment of proper management systems for contact tracing. Such resources could be gradually released from other control measures with the relief of epidemic burden. It is uncertain how many contacts were eventually confirmed as cases in most provinces. Further data collation and investigation will enable the assessment of the effectiveness of contact tracing in reducing COVID-19 transmission.
A major limitation of our descriptive analysis is the use of aggregate cases, deaths, recoveries, and contacts data. Whilst these indicators are convenient for monitoring and comparing the epidemic trends by province, further inference of risk factors on transmission dynamics is not possible. Patient characteristics such as age and comorbidities are essential in understanding the heterogeneity in disease severity. Estimating setting-specific incubation period, reporting delay, and disease progress also rely on the date of symptom onset and care-seeking pathways of individual cases (Zhang et al., 2020b). Another limitation lies in validating, quantifying, and distinguishing the impact of different control measures. Through comparisons across provinces, we could only explore the temporal relationships between interventions and epidemic trends. Applying dynamic modelling techniques and incorporating additional data sources may advance our understanding of the contributions of different interventions over the epidemic course (Flaxman et al., 2020, Lai et al., 2020. implementation (Imai et al., 2020). These measures of social distancing and contact tracing are likely to contribute to the reduction of COVID-19 transmission, as the reported epidemic size was relatively small in these countries, as well as indicated by a modelling study on counterfactual scenarios in the Chinese provinces (Lai et al., 2020). As the driving force of the COVID-19 epidemic in mainland China has shifted from local transmission to importation from other affected countries, there have been modifications in the focus of response, such as compulsory testing and quarantine for all incoming travellers (Cui, 2020)         Bars represent the cumulative numbers of cases (grey), recoveries (pink), and deaths (blue). Black vertical dashed lines show the dates when 50%, 70%, and 90% of recoveries among all cases was reached. Green vertical solid lines show the dates when the peak number of the daily confirmed case occurred. The six provinces were ranked from top to down by the date that 50% of recovery was achieved. Note the range of y-axis is different by province, to fit the magnitude of cases.