Epidemiology of COVID-19 in Mexico: Symptomatic profiles and presymptomatic people

Objectives The COVID-19 diagnosis is difficult and ambiguous due to nonspecific symptoms. Further, data from Mexico arehospitable population-based without signs and symptoms information. Thus, this work aims to provide epidemiology information about the burden of COVID-19 in Mexican outpatients and to identify symptomatic COVID-19 profiles that could help in the early diagnosis of the disease. Methods From June to September, epidemiological, clinical, and demographic data of 482,413 individuals diagnosed by RT-PCR test for SARS-CoV-2 in Salud Digna clinics were collected. Results We observed a 41% incidence of SARS-CoV-2 infections with a mean age of 36 years and with young adults (20–40 years) being the most affected. Among occupations, delivery persons (OR 1.38) or informal traders (OR 1.33) had a higher risk of COVID-19. Moreover, 13% of SARS-CoV-2 infections were in presymptomatic patients. Finally, we identified three different symptomatic profiles (common, respiratory, and gastrointestinal) associated with COVID-19. Conclusion The incidence of SARS-CoV-2 was high among outpatients with a significant proportion of presymptomatic carriers, and thus it is necessary to increase testing and continue SARS-CoV-2 surveillance with a better description of signs and symptoms; in this regard, we identified three symptomatic profiles that could help in the diagnosis of COVID-19.


Introduction
At the start of the COVID-19 pandemic, the initial reports showed that SARS-CoV-2 infection promotes pneumonia in mainly older adults, which were the group with a major risk of getting hospitalized and getting mechanical ventilation due to severe COVID-19 disease (Berumen et al., 2020;Giannouchos et al., 2020;Zheng et al., 2020). However, other age-groups had similar risks due to different factors. Moreover, with regard to control of the pandemic, asymptomatic people have attracted attention due to their ability to transmit the virus without manifesting infection symptoms, which makes them identifiable only by difficult massive tests on the general population.
'Asymptomatic' and 'presymptomatic' carriers are terms commonly used indistinctly; however, they are not equivalent. By definition, asymptomatic carriers refer to individuals without symptoms but confirmed as SARS-CoV-2 positive by a laboratory test. On the other hand, presymptomatic carriers can be defined as cases without symptoms at the moment of test, but who could develop them later. Thus, presymptomatic carriers can only be detected by retrospective studies (Nikolai et al., 2020;Savvides and Siegel, 2020).
Several studies about this issue have shown that asymptomatic carrier's proportions ranged from 15 to 45% (Nikolai et al., 2020). These variations are due to age, presymptomatic status, virus incubation time, and diagnosis mistakes (Buitrago-Garcia et al., 2020;Byambasuren et al., 2020;He et al., 2020;Oran and Topol, 2020). Likewise, similar virus shedding was demonstrated in symptomatic and asymptomatic patients (Cevik et al., 2020), possibly translating in the high ability to spread the virus by presymptomatic and asymptomatic patients which highlights the necessity of studying them in detail (Buitrago-Garcia et al., 2020;Byambasuren et al., 2020).
Moreover, the COVID-19 pandemic showed different risks according to the population structure and the association of different factors, including chronic diseases. In this regard, the situation in Mexico could be riskier due to the high prevalence of chronic diseases in the adult population such as hypertension (26%), obesity (36%), and diabetes (14%), which has been related to severe COVID-19 disease and mortality (Basto-Abreu et al., 2020;Campos-Nonato et al., 2018;OECD, 2020;SSA, 2020).
On February 27th, 2020, the first case of COVID-19 in Mexico was detected. (Forster et al., 2020). From this date, 1,255,974 confirmed cases of COVID-19 had been detected, with an overall mortality of 9%. Hypertension, obesity, and diabetes are the main comorbidities that increase risk of death by COVID-19 (SSA, 2020). Mexico's information comes from the Health Ministry's primarily hospital-based registries and shows a high incidence in men and people around 40 years; meanwhile, people with any comorbidity are more at risk of death by COVID-19 (Suárez et al., 2020).
Detailed epidemiology profile of COVID-19 in the country is not readily available; further, the current COVID-19 diagnosis is ambiguous due to the lack of specificity of the symptoms (Struyf et al., 2020). Moreover, the incidence of COVID-19 in outpatients, including those who are presymptomatic and symptomatic, and their epidemiological profile remains unknown. For the above, we analyzed epidemiological, clinical, and demographic data from 482,413 outpatients tested by RT-PCR for SARS-CoV-2 from 26 states of Mexico. Thus, we investigated the relevance of different factors and symptoms for SARS-CoV-2 infection to contribute the epidemiology of COVID-19 in the country.

Study design and population selection
The present work was a retrospective and transversal study. The collected data correspond to 482,413 outpatients of any sex and age evaluated for SARS-CoV-2 infection by PCR test in Salud Digna Clinics of 26 Mexican states from June 1st to September 30th. Only patients having clinical and demographic information and confirmed positive or negative status from SARS-CoV-2 test and accepting the use of their data by informed consent signing were considered. For geographical COVID-19 incidence analysis, we included states with 382 or more subjects. This threshold was considered from the sample size calculations for cross-sectional studies (Pourhoseingholi et al., 2013); in this case, we use the general COVID-19 incidence (47%) reported by the Ministry of Health of Mexico in the same period of this study (Direccion General de Epidemiologia, 2020). Finally standardization of incidence rate by state was made by age and sex and calculated with the direct method, using the standard world population as reference. The confidence interval was obtained considering a Poisson distribution for ASR according to (Boyle and Parkin, 1991).

Procedures
Nasopharyngeal swabs were collected in a viral transport medium to preserve samples until processing at the molecular biology laboratories at the National Reference Center located in Mexico State in Culiacan, Sinaloa. Molecular biologists processed samples in an automated and semiautomated workflow. Viral RNA extraction was automated with the MagNA Pure TM 24 System (Roche Diagnostics, US) and the KingFisher TM Flex instrument (Thermo Fisher Scientific, US). PCR tests were run on Roche's fully automated cobas 1 6800 and QuantStudio 7 Flex (Thermo Fisher). For SARS-CoV-2 detection, we used the cobas 1 SARS-CoV-2 Test (Roche Diagnostics), TaqMan 2019-nCoV Assay Kit v1 (Thermo Scientific), and VIASURE SARS-CoV-2 Real-Time PCR Detection Kit (CerTest Biotec). The TaqMan assay detects viral genes orf1ab, S, and N; cobas 1 test targets orf1ab and E viral genes; and VIASURE targets orf1ab and N genes. Human RNase P is used as an internal control.
For cobas 1 SARS-CoV-2 Test (Roche Diagnostics) and VIASURE SARS-CoV-2 Real-Time PCR Detection Kit (CerTest Biotec), a positive result corresponds with the amplification of target genes under Ct<38, meanwhile Ct<37 was used in the TaqMan 2019-nCoV Assay Kit v1 (Thermo Scientific) according to the manufacturers' recommendations.

Data collection
We applied a standardized survey prior to the SARS-CoV-2 PCR test to collect information regarding symptoms, clinical history, occupation, and lifestyle risk factors. PCR test results were according to the manufacturer's recommendations.

Consent for the use of information and privacy protection
Information analyzed in this work belongs to the patients who signed the informed consent for this study provided at the tests. A unique ID code was assigned to anonymize records for data privacy protection and prevent data duplication. Further, we aggregated information to enhance data protection. All procedures were carried out in adherence to the Mexican Federal Law on Personal Data Protection (LFPDPPP) and the privacy policy of Salud Digna.

Ethical statement
This study followed the approved guidelines for clinical information management, Helsinki's declaration, and Mexico's national regulations. This study was approved by the Ethical Review and Research Board of Salud Digna (SDI-2020-2).

Statistical analysis Descriptive statistics and incidence rate
According to the data type, we performed a χ 2 test for categorical variables expressed as frequencies or percentages. Continuous variables are shown as mean or median and interquartile ranges (IQR). Standardized rates were calculated with the direct method using the standard world population (Ahmad et al., 2000); standardization was made by age groups every ten years and sex of every 100 persons.

Multivariate analysis
Age and sex-adjusted stepwise multivariate logistic regression was performed to evaluate the association between SARS-CoV-2 infection and potential risk factors (e.g., lifestyle).

Tandem clustering
To date,11 symptoms are considered to be predictors of COVID-19 positiveness; however, some look little related between them or confuse clinicians. To determine the relevance of symptoms to SARS-CoV-2 infection, tandem clustering by Multiple Correspondence Analysis (MCA) and K-mean method were run in R 4.0.2 (CRAN project) with FactoMineR and factoextra libraries. The cluster number was defined through a hierarchical cluster for Principal Components developed with 45,000 cases by Elbow Method. Subsequently, we calculate the agreement between these groups by kappa index (Ki = 0.97). We tested Hartigan-Wong, Lloyd, Forgy, and MacQueen methods with 1000 repetitions and 100,000 iterations for each one. Finally, we choose the MacQueen method for k-means due to its better interclass inertia.
Finally, a p 0.05 value was considered to be a statistically significant threshold in all tests. Analyses were performed with SPSS 23 (SPSS Inc., Chicago, Ill., USA) or R 4.0.2 (CRAN project), and graphics were designed with GraphPad Prism 8 (GraphPad Software Inc., San Diego, CA.).

Geographical distribution of SARS-CoV-2 infections
This work includes data from 26 of 32 states (81%) from Mexico, founding heterogeneous incidence of SARS-CoV-2 with at least one state per region with significantly higher incidence versus national incidence (Figure 1 Table 1). Moreover, the temporal course of the SARS-CoV-2 pandemic shows a maximum incidence in the second week of July (50%), whereas it decreases after that (36% in September 2020) ( Figure 1B).

Demographic characteristics of the population studied
From June 1st to September 30th, 482,413 nasopharyngeal swabs were collected in Salud Digna clinics; 244,171 (51%) are males, and 238,242 (49%) are females. The median age of all people studied is 36 years (IQR: 16-56 years) ( Table 1). The child population is under-represented due to government confinement measures (including closing schools) implemented since March 27th ( Figure 1C).

Symptoms related to COVID-19 in outpatients
To date, around 11 symptoms have been related to COVID-19, making its diagnosis very difficult because they are not specific to the disease. The main symptoms among people tested were headache (50%), arthralgia or myalgia (38%), and sore throat (36%) ( Table 2). No differences were found in symptom incidence by sex or age. For the above, we performed a tandem clustering analysis to identify more related symptoms in confirmed COVID-19 cases to help healthcare workers to diagnose COVID-19 early. For this, we took the remaining SARS-CoV-2-positive cases (159,598) after discounting presymptomatic people. This allowed us to identify three symptomatic groups associated with the incidence of COVID-19, who share headache, sore throat, and myalgia/arthralgia as frequent symptoms but differ between themselves due to other symptoms. We found one common cluster which was the most frequent (87,253 = 56%) in which none of the symptoms were currently prominent but had part of the core symptoms documented by the WHO (2020a): the respiratory cluster (52,827 = 34%) in which symptoms like fever (60%), cough (56%), chills (47%), runny nose (38%), and breathing difficulty (32%) are prominent; and the gastrointestinal cluster (16,518 = 10%) with a high frequency of diarrhea (74%), abdominal pain (63%), and vomiting (49%) symptoms in respect to the other two profiles (Figure 3 A and B).

Presymptomatic and symptomatic patients
Around 13% (25,520 cases) of SARS-CoV-2 positive patients had no symptoms on the test day and on the previous days. They were considered presymptomatic because they were not followed-up to determine whether they were asymptomatic. The sex distribution of presymptomatic and symptomatic carriers showed significant differences (presymptomatic males vs. females p < 0.0001; symptomatic males vs. females p < 0.0001); on the other hand, presymptomatic carriers were more common among older adults than children, young people, and young adults (age <10 = 26%, age 10À20 = 27%, age 21À40 = 29%, age >65 = 31%, respectively), and these differences were statistically significant (p < 0.0001). Further, a similar trend was observed in symptomatic cases. This age-specific trend of presymptomatic and symptomatic cases was statistically significant (p = 0.0273, p < 0.0001, respectively) (Table 3).

Discussion
At present, the COVID-19 pandemic constitutes the most significant health challenge since the H1N1 and MERS-CoV epidemics. After 11 months of the Wuhan COVID-19 onset, over 71 million confirmed cases have been reported with a general mortality rate of 2% (WHO, 2020b). To date, America leads confirmed cases and death rates, with the US at the top (WHO, 2020b). Mexico is one of the countries with high SARS-CoV-2 incidences (43%), only behind Argentina (65%); meanwhile, other American countries with high confirmed cases like the US ($16 million cases) have lower incidence (7%) (WHO, 2020b). In this regard, Mexico's test rate for SARS-CoV-2 is 0.08% per thousand people versus 3% in the US or 0.43% in Argentina.
As a consequence, Mexico has a high reported mortality (9%) per confirmed cases of 114,298 confirmed deaths by COVID-19, which is higher than other American countries (the US-2%, Argentina-3% or Brazil-3%) (WHO, 2020b). In addition, the temporal analysis between July 1st to September 30th showed that after reaching a higher incidence in the second week of July (50%), this trend was reversed, showing a lower incidence (36%) by the end of September; however, the country-level incidence remains higher (43%).
At the state level, Suarez et al. reported that after the first month of the COVID-19 pandemic in Mexico, the states with higher incidence were Mexico City, Mexico State, Baja California, Sinaloa, and Tabasco. This work shows that main metropolises like Mexico City have lower COVID-19 incidence, while other states have $18% more. Thus, as we can see, these data on the outpatients are similar to the hospitable database from the Mexican Health Ministry and indicate that we need to make some refinements to actual pandemic management to reduce the incidence and mortality of COVID-19.
Differences in the pandemic's onset in each state, population density, the obedience of mitigation actions to reduce virus spread, and disparities between rural and urban populations could help understand the regional incidence of COVID-19 in the country (Suárez et al., 2020). Similarly, in the US, people who live in rural areas have less access to the SARS-CoV-2 test than people who live in big cities like New York, which shows how the social, economic, and health discrepancies can impact the burden of COVID-19 ( Souch and Cossman, 2020). Likewise, in Mexico, the South and Occident regions have a higher vulnerability index than other regions, limiting their access to health services (Santana-Castañeda, 2020).
The first reports of COVID-19 proposed different risks of develop severe COVID-19 disease and consequently death between sexes (Englmeier, 2020;de Groot and Bontrop, 2020). Our results show a slight difference in the incidence of COVID-19 in females and males; however, the male to female incidence ratio was 1.02:1. The sex difference is more significant in mortality rate than in the incidence of COVID-19 (Peckham et al., 2020). Thus, even with the evidence of higher amounts of ACE2 receptors in men and the lower number of receptors involved in innate immunity (Englmeier, 2020;de Groot and Bontrop, 2020), more epidemiological information is needed to clarify its relevance to COVID 19 etiology.
Similarly, an increase in COVID-19 severity has been associated with older people (Lusignan et al., 2020), while the risk of SARS-CoV-2 infection in this study was high in young adults. In this study, around 77% were people under 50 years ( Figure 1B); this can be due to the population-age distribution in Mexico compared to other countries. De Souza et al. showed that other Latin-American countries like Brazil have a similar age-group distribution and found similar conclusions (Souza et al., 2020). Additionally, the infection in young adults could be because population groups around their thirties and forties are economically active, which means a higher exposure rate due to mobility (Amorim et al., 2020).
Thus, we found that logistics people, informal traders, or delivery persons have a higher risk of SARS-CoV-2 infection that could be explained by a high rate of contact due to their professions. In occupations like office workers, customer service, and public servants, the risk is lower than other jobs might be due to the reduction in mobility's; however, we cannot assure this statement due to a lack of information about working modalities in the last group. Furthermore, an increase in mobility is related to a higher risk of being infected, as seen in other countries, due to the relaxation of confinement measures.
Moreover, it is difficult to infer the viral infection's presence through the symptoms because they are not specific to COVID-19,  which could confuse clinicians delaying the diagnosis. Therefore, we perform a tandem clustering analysis to identify possible relationships among symptoms reported. We found three symptomatic profiles: patients with non-specificity for any symptoms, others with mainly respiratory symptoms (Fever, Cough, Chills, Runny nose, and Breath difficulty), and those with gastrointestinal features (Diarrhea, Abdominal pain, and Vomiting); this classification could help healthcare workers to improve the diagnosis and treatment of COVID-19. Unfortunately, because of our study's nature, we could not determine the relationship between symptomatic profiling and clinical outcomes. Therefore, more studies are needed to assess if these clusters could predict clinical outcomes of COVID-19. Finally, one issue that is still controversial is the presymptomatic/asymptomatic patient's contribution to the SARS-CoV-2 spread. In this regard, the Centers for Disease Control and Prevention (CDC) determined that one of the major SARS-CoV-2 infected people groups is asymptomatic. In the best-predicted scenario, these people have 75% efficiency to contagious other people (CDC, 2020). A meta-analysis reported a proportion of asymptomatic people from 4% to 50% (Oran and Topol, 2020). In this study, patients were not followed-up for any arising symptoms later, and people with positive SARS-CoV-2 tests who showed no symptoms on the test day or on previous days were considered to be presymptomatic, which was about 13% of cases. Recent evidence in Korea showed that 36% of people analyzed were asymptomatic at the time of the PCR test, but 19% of these patients developed symptoms soon after the test and were virus transmitters even after three weeks (Lee et al., 2020), which emphasizes that the lack of symptoms does not mean less transmission capacity. In addition, we found that the proportion of SARS-CoV-2 positive and presymptomatic people is higher in older people (31%) versus children (26%). Current evidence is controversial; some studies indicated higher presymptomatic proportions in children versus older people arguing that lower ACE2 receptor amounts in young people and less aggressive immune system response explain the high rates of presymptomatic children (He et al., 2020;Ma et al., 2020); however, other studies showed no significant differences between age groups and presymptomatic proportions (Jung et al., 2020;Yu et al., 2020). Thus, more studies are necessary to characterize presymptomatic and asymptomatic carriers better, and increased testing could help identify them to reduce the virus spread.

Strengths and limitations
This work provides a general insight into the burden of COVID-19 in Mexican outpatients. It is important to note that we studied a Figure 3. Tandem clustering analysis for COVID-19 related symptoms. Panel A shows the symptoms' dispersion plot through main component analysis (MCA); the red color indicates the gastrointestinal cluster, gold shows the respiratory cluster, and blue shows the common cluster (no symptoms stand out). Dimension 1 (Dim 1) and Dimension 2 (Dim 2) shows the variance explained by each one (22.2 and 9.4%, respectively). Panel B shows the frequency and distribution of COVID-19 related symptoms in each cluster identified. Symptoms are indicated as 1 = Conjunctivitis, 2 = Abdominal pain, 3 = Headache, 4 = Sore throat, 5 = Myalgia/Arthralgia, 6 = Breath difficulty, 7 = Diarrhea, 8 = Chills, 9 = Fever, 10 = Runny nose, 11 = Constant dry cough, and 12 = Vomiting). self-selected population tested for SARS-CoV-2 infection, which involves a potential sub-representation of subjects with specific features such as age (e.g., children) and jobs (e.g., healthcare workers). Therefore, because of patients' lack of follow-up in this study, we could not accurately identify presymptomatic and asymptomatic people. Further, this study's information does not represent the Mexican population as a whole; however, this work and the hospital-based information could help us understand the pandemic's behavior.
Moreover, in the case of symptoms, they were self-reported by patients, allowing the possibility that they could be misidentified in some cases, which could impact our tandem clustering analysis. Thus, more studies in other populations are needed to enhance the knowledge of clinical manifestations of COVID-19.
In conclusion, we found a high SARS-CoV-2 incidence (41%), with young adults (26-40 years) being the most affected. Further, we found a presymptomatic carrier's proportion of 13%, which increases with age; this could help implement triage in target populations to identify them and reduce virus spread. Furthermore, we identified three symptomatic COVID-19 profiles: common (56%), respiratory (34%), and gastrointestinal (10%) that we expected could help clinicians diagnose COVID-19 early. Further studies are needed to determine whether these profiles could predict a patient's clinical outcomes.

Conflict of interests
None declare.

Founding source
None declare.

Ethical approval
This study was approved by the Ethical Review and Research Board of Salud Digna (SDI-2020-2).