Advertisement
Short Communication| Volume 102, P460-462, January 2021

Analysis of the potential impact of genomic variants in global SARS-CoV-2 genomes on molecular diagnostic assays

  • Author Footnotes
    1 Contributed equally and would like to be known as joint first authors.
    Abhinav Jain
    Footnotes
    1 Contributed equally and would like to be known as joint first authors.
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Author Footnotes
    1 Contributed equally and would like to be known as joint first authors.
    Mercy Rophina
    Footnotes
    1 Contributed equally and would like to be known as joint first authors.
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Saurabh Mahajan
    Affiliations
    St. Joseph’s College, Langford Gardens, Bengaluru, Karnataka 560027 India
    Search for articles by this author
  • Bhavya Balaji Krishnan
    Affiliations
    Imperial College London, South Kensington, London SW7 2BU, United Kingdom
    Search for articles by this author
  • Manasa Sharma
    Affiliations
    Ramaiah University of Applied Sciences, Bengaluru, Karnataka 560054, India
    Search for articles by this author
  • Sreya Mandal
    Affiliations
    St. Joseph’s College, Langford Gardens, Bengaluru, Karnataka 560027 India
    Search for articles by this author
  • Teresa Fernandez
    Affiliations
    St. Joseph’s College, Langford Gardens, Bengaluru, Karnataka 560027 India
    Search for articles by this author
  • Sumayra Sultanji
    Affiliations
    St. Joseph’s College, Langford Gardens, Bengaluru, Karnataka 560027 India
    Search for articles by this author
  • Bani Jolly
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Samatha Mathew
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Sridhar Sivasubbu
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Vinod Scaria
    Correspondence
    Corresponding author at: CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India.
    Affiliations
    CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mathura Road, Delhi 110025, India

    Academy of Scientific and Innovative Research (AcSIR), CSIR-HRDC Campus, Sector 19, Kamla Nehru Nagar, Ghaziabad, Uttar Pradesh 201002, India
    Search for articles by this author
  • Author Footnotes
    1 Contributed equally and would like to be known as joint first authors.
Open AccessPublished:November 09, 2020DOI:https://doi.org/10.1016/j.ijid.2020.10.086

      Abstract

      An epidemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing coronavirus diseases (C0VID-19) initially reported in Wuhan, China has rapidly emerged into a global pandemic affecting millions of people worldwide. Molecular detection of SARS-CoV-2 using reverse transcription polymerase chain reaction (RT-PCR) forms the mainstay in screening, diagnosis and epidemiology of the disease. Since the virus evolves by accumulating base substitutions, mutations in the viral genome could possibly affect the accuracy of RT-PCR-based detection assays. The recent availability of genomes of SARS-CoV-2 isolates motivated us to assess the presence and potential impact of variations in target sites of the oligonucleotide primers and probes used in molecular diagnosis. We catalogued a total of 132 primer or probe sequences from literature and data available in the public domain. Our analysis revealed that a total of 5862 unique genetic variants mapped to at least one of the 132 primer or probe binding sites in the genome. A total of 29 unique variants were present in ≥ 1% of genomes from at least one of the continents (Asia, Africa, Australia, Europe, North America, and South America) that mapped to 36 unique primers or probes binding sites. Similarly, a total of 27 primer or probe binding sites had cumulative variants frequency of ≥ 1% in the global SARS-CoV-2 genomes. These included primers or probes sites which are used worldwide for molecular diagnosis as well as approved by national and international agencies. We also found 286 SARS-CoV-2 genomic regions with low variability at a continuous stretch of ≥ 20bps that could be potentially used for primer designing. This highlights the need for sequencing genomes of emerging pathogens to enable evidence-based policies for development and approval of diagnostics.

      Graphical abstract

      Keywords

      • SARS-CoV-2 variants impact RT-PCR efficiency in detection.
      • A total of 29 global SARS-CoV-2 genetic variants had a frequency ≥ 1%.
      • The thermodynamic stability of the virus-primers complex gets perturbed.
      • A number of recommended primer or probe sequences had high variant frequency.
      Initially reported from a city in China, the coronavirus disease 2019 (COVID-19) has now rapidly emerged as a global pandemic. Reverse transcription polymerase chain reaction (RT-PCR) based assays have been the mainstay for the diagnosis and screening of COVID-19 due to their high sensitivity and specificity (
      • Shen M.
      • Zhou Y.
      • Ye J.
      • Abdullah Al-Maskri A.A.
      • Kang Y.
      • Zeng S.
      • et al.
      Recent advances and perspectives of nucleic acid detection for coronavirus.
      ). These assays utilize oligonucleotide primers and probes specific to the viral nucleic acid. The SARS-CoV-2 has been continuously evolving and has an estimated substitution rate of 1.19–1.31 × 10−3 per site per year (
      • Li X.
      • Zai J.
      • Zhao Q.
      • Nie Q.
      • Li Y.
      • Foley B.T.
      • et al.
      Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2.
      ). Recent reports that suggest genetic variation in viruses at the primers or probes binding site could decrease the sensitivity of RT-PCR based assays (
      • Yang J.-R.
      • Kuo C.-Y.
      • Huang H.-Y.
      • Wu F.-T.
      • Huang H.-Y.
      • Cheng C.-Y.
      • et al.
      Newly emerging mutations in the matrix genes of the human influenza A(H1N1)pdm09 and A(H3N2) viruses reduce the detection sensitivity of real-time reverse transcription-PCR.
      ). Motivated by the availability of a large number of genomes of SARS-CoV-2 isolates globally, we attempted to understand the genomic variants and their potential impact on molecular diagnostic assays.
      We analysed the genome sequences of SARS-CoV-2 isolates deposited in GISAID (
      • Shu Y.
      • McCauley J.
      GISAID: Global initiative on sharing all influenza data - from vision to reality.
      ) as on 26th September 2020. Only complete genome sequences that had ≥99% alignment with the Wuhan-Hu-1 reference genome (NC_045512.2) (
      • Wu F.
      • Zhao S.
      • Yu B.
      • Chen Y.-M.
      • Wang W.
      • Song Z.-G.
      • et al.
      A new coronavirus associated with human respiratory disease in China.
      ) and <5% degenerate bases were considered for the analysis. Genome sequences having clustered mutations and higher than expected divergence were also excluded from the analysis. The individual genomes were re-aligned to the Wuhan-Hu-1 reference genome using EMBOSS needle (
      • Rice P.
      • Longden I.
      • Bleasby A.
      EMBOSS: the European Molecular Biology Open Software Suite.
      ) and the pairwise alignments were parsed to identify variants using bespoke scripts. The primer/probe sequences were compiled using extensive literature searches as well as from public databases and were mapped to the reference genome using BLAST (
      • Altschul S.F.
      • Gish W.
      • Miller W.
      • Myers E.W.
      • Lipman D.J.
      Basic local alignment search tool.
      ). The SARS-CoV-2 genomic variant coordinates were overlapped with the primer/probe binding sites. Melting temperature (Tm) and Gibbs free energy (ΔG°37) at standard condition were calculated for primer or probe sequences. We also evaluated the internal and terminal mismatches that could have an impact on the thermodynamic stability of the nucleic acid secondary structure as well as on the Tm Supplementary Method 1. We have also identified regions in the SARS-CoV-2 genome with low variability. We have considered variants below 95th percentile of the frequency and identified continuous stretches of ≥ 20bps for designing primers. The choice of 20 bps is guided by the standard length of the primer/probe sequences.
      A total of 45,830 high quality genome sequences which comprise 4779 sequences form Asia, 25,091 sequences from Europe, 859 from Africa, 12,949 sequences from North America, 791 sequences from South America and 1361 from Australia were used in the analysis. Our analysis revealed a total of 88,880 unique single nucleotide variants (SNVs) across the genome. We compiled a total of 132 primers or probe sequences Supplementary Data 1. A total of 5862 unique genetic variants mapped to 132 primers or probes binding sites in the SARS-CoV-2 genome. Out of these, a total of 29 unique variants had allele frequency ≥ 1% in at least one of the six continents from where the SARS-CoV-2 genomes were isolated Table 1 and Figure 1. We have also observed potential differences in the ΔG°37 and Tm, that affect the thermodynamic stability of secondary structure and the annealing of the primers and probes to the viral cDNA/RNA respectively Table 1. Of significant interest, three variants with over 30% frequency each in genomes mapped to the primer 2019-nCoV-NFP GGGGAACTTCTCCTGCTAGAAT that targets N gene which is a part of the China Centers for Disease Control and Prevention (CDC) protocol (WHO in house assay, 2020). A cumulative variant frequency of 93.5% was found in 2019-nCoV-NFP binding site. Variants with >1% frequency were also found in primer / probes encompassing S, M, ORF1ab, and ORF3a genes Table 1. A total of 27 primers and probes sequences had cumulative variant frequency >1% of which 11 were approved by the national regulatory bodies mainly by the Centers for Disease Control and Prevention (CDC) and World Health Organization (WHO) and has been widely used across the globe Supplementary Data 2. Our analysis also suggests 286 genomic regions/sites with variants frequency below 95th percentile (corresponding to variant frequency of 1.7 × 10−4) Supplementary Figure 1 and Supplementary Data 3.
      Table 1Summary of Primer and Probe sequences and genomic variants analysis.
      The variant frequency, primers/probes frequency, Gibbs free energy (ΔG), and melting temperature (Tm) for reference and alternate allele in the Indian SARS-CoV-2 isolates. Only primers/probes with a frequency of more than 1% in any of the six continents are included in this Table.
      Tm-Melting Temperature Temperature, ΔG- Gibbs Free Energy, Ref- Reference, Alt- Alternate, No.- Number, Afr- Africa, Aus- Australia, Eur- Europe, NA-North America,SA- South America.
      Figure 1
      Figure 1Genome Wide distribution (A) and frequency (B) of the 29 genetic variants with allele Frequency ≥ 1% in at least one of 6 continents.
      Note: Colour required in Print.
      Our analysis suggests that genome sequencing of isolates in an epidemic could provide useful insights into assessing the diagnostic efficacies as also suggested by previous authors (
      • Khan K.A.
      • Cheung P.
      Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome.
      ). We surmise that this could possibly drive policies on evaluation and approvals of the assays for screening and diagnosis. The study also highlights the need for rapid and wide-spread sharing of genomic data of pathogens as well as molecular probe information through public archives during pandemics.

      Author contributions

      MR performed the genome analysis and variant calls. BJ performed the quality assessment of the genome dataset. AJ and SM1 co-ordinated the compendium of primers and probes with help of Bhavya Balaji Krishnan, Manasa Sharma, Sreya Mandal, Teresa Fernandez and Sumayra Sultanji. SM2 contributed to mapping the primers to the genomic loci. AJ performed the analysis of variants mapping to the probe-target sites and was assisted by MR. VS and SS provided the conceptual overview to the analysis. VS, MR and AJ wrote the manuscript, the content and analysis which was read and agreed upon by all authors.

      Ethical approval

      The study does not require any ethical approval

      Declaration of interests

      None

      Acknowledgements

      We acknowledge the researchers who have made the SARS-CoV-2 genomes available in the public domain. A comprehensive list of genomes, contributing laboratories, and acknowledgement is available in Supplementary Data 4. Authors acknowledge Paras Sehgal for constructive comments which enriched the manuscript.
      Authors acknowledge funding from CSIR Indiathrough grants MLP2005. AJ, BJ and SM acknowledge a research fellowship from CSIR India. The funders had no role in the preparation of the manuscript or decision to publish.

      Appendix A. Supplementary data

      The following is Supplementary data to this article:

      References

        • Altschul S.F.
        • Gish W.
        • Miller W.
        • Myers E.W.
        • Lipman D.J.
        Basic local alignment search tool.
        J Mol Biol. 1990; 215: 403-410
        • Khan K.A.
        • Cheung P.
        Presence of mismatches between diagnostic PCR assays and coronavirus SARS-CoV-2 genome.
        R Soc open sci. 2020; 7200636
        • Li X.
        • Zai J.
        • Zhao Q.
        • Nie Q.
        • Li Y.
        • Foley B.T.
        • et al.
        Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2.
        J Med Virol. 2020; 92: 602-611
        • Rice P.
        • Longden I.
        • Bleasby A.
        EMBOSS: the European Molecular Biology Open Software Suite.
        Trends Genet. 2000; 16: 276-277
        • Shen M.
        • Zhou Y.
        • Ye J.
        • Abdullah Al-Maskri A.A.
        • Kang Y.
        • Zeng S.
        • et al.
        Recent advances and perspectives of nucleic acid detection for coronavirus.
        J Pharm Anal. 2020; https://doi.org/10.1016/j.jpha.2020.02.010
        • Shu Y.
        • McCauley J.
        GISAID: Global initiative on sharing all influenza data - from vision to reality.
        Euro Surveill. 2017; 22https://doi.org/10.2807/1560-7917.ES.2017.22.13.30494
        • Wu F.
        • Zhao S.
        • Yu B.
        • Chen Y.-M.
        • Wang W.
        • Song Z.-G.
        • et al.
        A new coronavirus associated with human respiratory disease in China.
        Nature. 2020; 579: 265-269
        • Yang J.-R.
        • Kuo C.-Y.
        • Huang H.-Y.
        • Wu F.-T.
        • Huang H.-Y.
        • Cheng C.-Y.
        • et al.
        Newly emerging mutations in the matrix genes of the human influenza A(H1N1)pdm09 and A(H3N2) viruses reduce the detection sensitivity of real-time reverse transcription-PCR.
        J Clin Microbiol. 2014; 52: 76-82