An epidemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing coronavirus diseases (C0VID-19) initially reported in Wuhan, China has rapidly emerged into a global pandemic affecting millions of people worldwide. Molecular detection of SARS-CoV-2 using reverse transcription polymerase chain reaction (RT-PCR) forms the mainstay in screening, diagnosis and epidemiology of the disease. Since the virus evolves by accumulating base substitutions, mutations in the viral genome could possibly affect the accuracy of RT-PCR-based detection assays. The recent availability of genomes of SARS-CoV-2 isolates motivated us to assess the presence and potential impact of variations in target sites of the oligonucleotide primers and probes used in molecular diagnosis. We catalogued a total of 132 primer or probe sequences from literature and data available in the public domain. Our analysis revealed that a total of 5862 unique genetic variants mapped to at least one of the 132 primer or probe binding sites in the genome. A total of 29 unique variants were present in ≥ 1% of genomes from at least one of the continents (Asia, Africa, Australia, Europe, North America, and South America) that mapped to 36 unique primers or probes binding sites. Similarly, a total of 27 primer or probe binding sites had cumulative variants frequency of ≥ 1% in the global SARS-CoV-2 genomes. These included primers or probes sites which are used worldwide for molecular diagnosis as well as approved by national and international agencies. We also found 286 SARS-CoV-2 genomic regions with low variability at a continuous stretch of ≥ 20bps that could be potentially used for primer designing. This highlights the need for sequencing genomes of emerging pathogens to enable evidence-based policies for development and approval of diagnostics.
- ●SARS-CoV-2 variants impact RT-PCR efficiency in detection.
- ●A total of 29 global SARS-CoV-2 genetic variants had a frequency ≥ 1%.
- ●The thermodynamic stability of the virus-primers complex gets perturbed.
- ●A number of recommended primer or probe sequences had high variant frequency.
Initially reported from a city in China, the coronavirus disease 2019 (COVID-19) has now rapidly emerged as a global pandemic. Reverse transcription polymerase chain reaction (RT-PCR) based assays have been the mainstay for the diagnosis and screening of COVID-19 due to their high sensitivity and specificity (
Shen et al., 2020). These assays utilize oligonucleotide primers and probes specific to the viral nucleic acid. The SARS-CoV-2 has been continuously evolving and has an estimated substitution rate of 1.19–1.31 × 10−3 per site per year (
- Shen M.
- Zhou Y.
- Ye J.
- Abdullah Al-Maskri A.A.
- Kang Y.
- Zeng S.
- et al.
Recent advances and perspectives of nucleic acid detection for coronavirus.
J Pharm Anal. 2020; https://doi.org/10.1016/j.jpha.2020.02.010
Li et al., 2020). Recent reports that suggest genetic variation in viruses at the primers or probes binding site could decrease the sensitivity of RT-PCR based assays (
- Li X.
- Zai J.
- Zhao Q.
- Nie Q.
- Li Y.
- Foley B.T.
- et al.
Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2.
J Med Virol. 2020; 92: 602-611
Yang et al., 2014). Motivated by the availability of a large number of genomes of SARS-CoV-2 isolates globally, we attempted to understand the genomic variants and their potential impact on molecular diagnostic assays.
- Yang J.-R.
- Kuo C.-Y.
- Huang H.-Y.
- Wu F.-T.
- Huang H.-Y.
- Cheng C.-Y.
- et al.
Newly emerging mutations in the matrix genes of the human influenza A(H1N1)pdm09 and A(H3N2) viruses reduce the detection sensitivity of real-time reverse transcription-PCR.
J Clin Microbiol. 2014; 52: 76-82
We analysed the genome sequences of SARS-CoV-2 isolates deposited in GISAID (
Shu and McCauley, 2017) as on 26th September 2020. Only complete genome sequences that had ≥99% alignment with the Wuhan-Hu-1 reference genome (NC_045512.2) (
- Shu Y.
- McCauley J.
GISAID: Global initiative on sharing all influenza data - from vision to reality.
Euro Surveill. 2017; 22https://doi.org/10.2807/1560-7917.ES.2017.22.13.30494
Wu et al., 2020) and <5% degenerate bases were considered for the analysis. Genome sequences having clustered mutations and higher than expected divergence were also excluded from the analysis. The individual genomes were re-aligned to the Wuhan-Hu-1 reference genome using EMBOSS needle (
- Wu F.
- Zhao S.
- Yu B.
- Chen Y.-M.
- Wang W.
- Song Z.-G.
- et al.
A new coronavirus associated with human respiratory disease in China.
Nature. 2020; 579: 265-269
Rice et al., 2000) and the pairwise alignments were parsed to identify variants using bespoke scripts. The primer/probe sequences were compiled using extensive literature searches as well as from public databases and were mapped to the reference genome using BLAST (
- Rice P.
- Longden I.
- Bleasby A.
EMBOSS: the European Molecular Biology Open Software Suite.
Trends Genet. 2000; 16: 276-277
Altschul et al., 1990). The SARS-CoV-2 genomic variant coordinates were overlapped with the primer/probe binding sites. Melting temperature (Tm) and Gibbs free energy (ΔG°37) at standard condition were calculated for primer or probe sequences. We also evaluated the internal and terminal mismatches that could have an impact on the thermodynamic stability of the nucleic acid secondary structure as well as on the Tm Supplementary Method 1. We have also identified regions in the SARS-CoV-2 genome with low variability. We have considered variants below 95th percentile of the frequency and identified continuous stretches of ≥ 20bps for designing primers. The choice of 20 bps is guided by the standard length of the primer/probe sequences.
- Altschul S.F.
- Gish W.
- Miller W.
- Myers E.W.
- Lipman D.J.
Basic local alignment search tool.
J Mol Biol. 1990; 215: 403-410
A total of 45,830 high quality genome sequences which comprise 4779 sequences form Asia, 25,091 sequences from Europe, 859 from Africa, 12,949 sequences from North America, 791 sequences from South America and 1361 from Australia were used in the analysis. Our analysis revealed a total of 88,880 unique single nucleotide variants (SNVs) across the genome. We compiled a total of 132 primers or probe sequences Supplementary Data 1. A total of 5862 unique genetic variants mapped to 132 primers or probes binding sites in the SARS-CoV-2 genome. Out of these, a total of 29 unique variants had allele frequency ≥ 1% in at least one of the six continents from where the SARS-CoV-2 genomes were isolated Table 1 and Figure 1. We have also observed potential differences in the ΔG°37 and Tm, that affect the thermodynamic stability of secondary structure and the annealing of the primers and probes to the viral cDNA/RNA respectively Table 1. Of significant interest, three variants with over 30% frequency each in genomes mapped to the primer 2019-nCoV-NFP GGGGAACTTCTCCTGCTAGAAT that targets N gene which is a part of the China Centers for Disease Control and Prevention (CDC) protocol (WHO in house assay, 2020). A cumulative variant frequency of 93.5% was found in 2019-nCoV-NFP binding site. Variants with >1% frequency were also found in primer / probes encompassing S, M, ORF1ab, and ORF3a genes Table 1. A total of 27 primers and probes sequences had cumulative variant frequency >1% of which 11 were approved by the national regulatory bodies mainly by the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) and has been widely used across the globe Supplementary Data 2. Our analysis also suggests 286 genomic regions/sites with variants frequency below 95th percentile (corresponding to variant frequency of 1.7 × 10−4) Supplementary Figure 1 and Supplementary Data 3.
Table 1Summary of Primer and Probe sequences and genomic variants analysis.
The variant frequency, primers/probes frequency, Gibbs free energy (ΔG), and melting temperature (Tm) for reference and alternate allele in the Indian SARS-CoV-2 isolates. Only primers/probes with a frequency of more than 1% in any of the six continents are included in this Table.
Tm-Melting Temperature Temperature, ΔG- Gibbs Free Energy, Ref- Reference, Alt- Alternate, No.- Number, Afr- Africa, Aus- Australia, Eur- Europe, NA-North America,SA- South America.
Our analysis suggests that genome sequencing of isolates in an epidemic could provide useful insights into assessing the diagnostic efficacies as also suggested by previous authors (
Khan and Cheung, 2020). We surmise that this could possibly drive policies on evaluation and approvals of the assays for screening and diagnosis. The study also highlights the need for rapid and wide-spread sharing of genomic data of pathogens as well as molecular probe information through public archives during pandemics.
- Khan K.A.
- Cheung P.
Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome.
R Soc open sci. 2020; 7200636
MR performed the genome analysis and variant calls. BJ performed the quality assessment of the genome dataset. AJ and SM1 co-ordinated the compendium of primers and probes with help of Bhavya Balaji Krishnan, Manasa Sharma, Sreya Mandal, Teresa Fernandez and Sumayra Sultanji. SM2 contributed to mapping the primers to the genomic loci. AJ performed the analysis of variants mapping to the probe-target sites and was assisted by MR. VS and SS provided the conceptual overview to the analysis. VS, MR and AJ wrote the manuscript, the content and analysis which was read and agreed upon by all authors.
The study does not require any ethical approval
Declaration of interests
We acknowledge the researchers who have made the SARS-CoV-2 genomes available in the public domain. A comprehensive list of genomes, contributing laboratories, and acknowledgement is available in Supplementary Data 4. Authors acknowledge Paras Sehgal for constructive comments which enriched the manuscript.
Authors acknowledge funding from CSIR Indiathrough grants MLP2005. AJ, BJ and SM acknowledge a research fellowship from CSIR India. The funders had no role in the preparation of the manuscript or decision to publish.
Appendix A. Supplementary data
The following is Supplementary data to this article:
- Basic local alignment search tool.J Mol Biol. 1990; 215: 403-410
- Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome.R Soc open sci. 2020; 7200636
- Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2.J Med Virol. 2020; 92: 602-611
- EMBOSS: the European Molecular Biology Open Software Suite.Trends Genet. 2000; 16: 276-277
- Recent advances and perspectives of nucleic acid detection for coronavirus.J Pharm Anal. 2020; https://doi.org/10.1016/j.jpha.2020.02.010
- GISAID: Global initiative on sharing all influenza data - from vision to reality.Euro Surveill. 2017; 22https://doi.org/10.2807/1560-7917.ES.2017.22.13.30494
- A new coronavirus associated with human respiratory disease in China.Nature. 2020; 579: 265-269
- Newly emerging mutations in the matrix genes of the human influenza A(H1N1)pdm09 and A(H3N2) viruses reduce the detection sensitivity of real-time reverse transcription-PCR.J Clin Microbiol. 2014; 52: 76-82
Published online: November 09, 2020
Accepted: October 26, 2020
Received in revised form: October 25, 2020
Received: August 5, 2020
© 2020 The Authors. Published by Elsevier Ltd on behalf of International Society for Infectious Diseases.
User licenseCreative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) |
How you can reuse
Elsevier's open access license policy
Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0)
For non-commercial purposes:
- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article
Elsevier's open access license policy