Risk of post-treatment Lyme disease in patients with ideally-treated early Lyme disease: A prospective cohort study

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


Introduction
Lyme disease (LD) is a vector-borne disease caused by various genospecies of the spirochetal bacteria Borrelia burgdorferi sensu lato (Steere et al., 2016). When untreated, the infection can cause specific early and/or late dermatologic, cardiac, neurologic, and joint manifestations (Steere et al., 2016). Following appropriate antibiotic treatment, LD has also been associated with persistent symptoms in a subset of patients, particularly fatigue, musculoskeletal pain, and cognitive difficulties, which can impact quality of life (Klempner et al., 2001;Rebman et al., 2017b). The association of infectious disease with a proximate chronic condition is not unique to LD (Carapetis et al., 2005;Hickie et al., 2006;Nalbandian et al., 2021).
In the United States and Europe, a small number of longitudinal studies have examined the prevalence and severity of persistent symptoms following treatment for LD, (Aucott et al., 2013b;Smith et al., 2002;Weitzner et al., 2015;Wills et al., 2016) with much of this data coming from early clinical trials (Dattwyler et al., 1997;Luft et al., 1996;Luger et al., 1995;Robert et al., 1992;Ursinus et al., 2021;Wormser et al., 2003). Variability in treatment paradigms, study design, enrollment criteria, and especially outcome classification have led to a wide range in estimated prevalence of persistent symptoms in this population (0-50%, however 10-20% is frequently cited) (Marques, 2008). Even fewer prospective cohort studies have included controls unexposed to LD so as to compare symptom prevalence to previously uninfected populations. In both our prior work (Bechtold et al., 2017) and in a recent publication by Wormser et al., (Wormser et al., 2020) overall symptom reporting among patients as a whole months to years following treatment for early LD did not differ from unexposed controls. However, also in both studies, a patient subset could be identified as having post-treatment Lyme disease (PTLD), either through standardized symptom surveys (Bechtold et al., 2017) or medical record review by a clinician (Wormser et al., 2020). A recent European study of PTLD showed that in well-defined treated LD patients, persistent symptoms were significantly more prevalent and severe than in patients without LD (Ursinus et al., 2021).
The Infectious Diseases Society of America (IDSA) previously proposed a research case definition for studying patients with PTLD, a distinct subgroup of patients with persistent symptoms who may present with a clinical diagnosis of chronic LD (Feder Jr. et al., 2007;Rebman et al., 2017b;Wormser et al., 2006). Patients who meet criteria for PTLD have prior documented LD, do not have other specific co-morbidities, and have both specific symptoms and functional impact of symptoms at least 6 months from their initial LD diagnosis and treatment. In previous publications, we have operationalized this IDSA research case definition through use of survey instruments and applied it to patients with a history of LD (Aucott et al., 2013a;Bechtold et al., 2017).
In the current study, we applied these PTLD-defining symptom and functional impact criteria to two cohorts of participants followed prospectively; those exposed and those unexposed to early LD. The primary aim of this study was to assess whether those with a history of LD would be more likely to meet these criteria than those without. This represents the first time that such differences have been examined using standardized criteria for PTLD in a large US sample.

Study Participants
For this longitudinal, prospective cohort study, adult patients (≥ 18 years) with early LD selfreferred or were recruited from primary or urgent care settings in 2008-2020. Eligible participants were primarily enrolled at study sites in Maryland with a small number enrolled at a satellite site in southeastern Pennsylvania. At enrollment, participants were required to have a visible (erythema migrans) EM ≥ 5 cm in diameter diagnosed by a health care provider and to have ≤ 72 hours of appropriate antibiotic treatment for early LD. Additional details and baseline clinical characteristics of this sample have been previously published (Rebman et al., 2021).
Participants without a clinical or serologic history of LD were recruited from similar primary care settings, through community recruitment using flyers, and online advertising. This cohort was required to be two-tier seronegative for antibodies to B. burgdorferi at the time of enrollment and at all subsequent visits, as well as be free of any history of prior clinical LD. All participants in both groups were excluded for the same range of self-reported prior medical conditions, (Wormser et al., 2006) specifically: chronic fatigue syndrome, fibromyalgia, unexplained chronic pain, sleep apnea or narcolepsy, autoimmune disease, chronic neurologic disease, liver disease, HIV, cancer or malignancy in the past 2 years, major psychiatric illness, or drug or alcohol abuse. We did not exclude participants from either group based on medical conditions not referenced in the IDSA's PTLD proposed case definition.
The Institutional Review Board of the Johns Hopkins University School of Medicine approved this study, and all participants signed written consent prior to initiation of study activities.

Data Collection
At the baseline study visit, all participants self-reported demographic characteristics and prior physician-diagnosed co-morbidities. Participants also completed the Life Events Checklist Additionally, all participants completed the following: the Fatigue Severity Scale (FSS), a 9-item fatigue metric with summary scores ranging from 9 to 63; (Krupp et al., 1989) the Short-Form McGill Pain Questionnaire (SF-MPQ), a 15-item measure of pain with summary scores ranging from 0 to 45; (Melzack, 1975) the Pittsburgh Sleep Quality Index (PSQI) sleep metric with summary scores ranging from 0 to 21; (Buysse et al., 1989) and the Beck Depression Inventory-II (BDI-II), a 21-item depression metric which can be separated into somatic and affective subscales (Beck et al., 1996;Storch et al., 2004). For these measures, higher scores indicate a worse experience of symptoms. Finally, functional impact was measured by the Short-Form Health Survey, Version 2 (SF-36). This 36-item quality of life measure has 8 norm-based subscale scores as well as Physical and Mental Component summary scores (PCS and MCS, respectively) which can also be compared with the US population mean (50 ± 10) (Ware et al., 2000). For this measure, a higher score indicates better life functioning.

Outcome Definition
Outcome data for participants with prior LD were drawn from two follow-up study visits conducted approximately 6 months and 1 year following the end of appropriate antibiotic treatment for their early LD. Similarly, data from participants without prior LD were extracted from two study visits occurring 6 months apart.
At each visit, participants were considered to meet criteria for 'symptoms' if fatigue, pain, and/or cognitive complaints were present at the 'moderate' or 'severe' level on the PLQS. Similarly, participants were considered to meet criteria for 'functional impact' if an average composite score of 4 specific norm-based subscales on the SF-36 was 0·5 SD below the population mean, as previously identified (Aucott et al., 2013a).
At each visit, we categorized all participants into 3 outcome groups based on whether they met these 'symptoms'/'functional impact' criteria. Those meeting criteria for both symptoms and functional impact met our operationalized case definition for PTLD ('PTLD' group). Those meeting criteria for symptoms but not functional impact were considered a 'Symptoms Only' group. The remaining participants were considered 'Return to Health.' Final status determination across the two visits was hierarchically determined. Participants meeting PTLD criteria at either visit were determined to be in this group. Among remaining participants, those meeting Symptoms Only criteria at either visit were determined to be in this group. All remaining participants were considered Return to Health, including the small (4·2%) proportion meeting criteria at either visit for functional impact but not symptoms.
To assess symptom differences by outcome status, PLQS, FSS, SF-MPQ, PSQI, and BDI-II scores were averaged across the two time points.

Statistical Analyses
To test the hypothesis that participants exposed to prior, treated LD would be more likely to meet criteria for "PTLD" than unexposed control participants, we first tested for statistical difference between groups using Fisher's exact test. We then fit a multinomial logistic regression model with symptoms/functional impact status as the outcome variable and participant cohort as the primary exposure. Relevant demographic, psychosocial, and co-morbidity variables were compared between participant groups, and both p<0·15 and clinical relevance were used to select potential confounders for inclusion in the adjusted model.
Further analyses then explored specific symptom differences by participant cohort. First, we presented the distribution of scores for the PLQS, FSS, SF-MPQ, PSQI, and BDI-II. Next, these scores, as well as a subset of core PTLD symptoms chosen based on review of the literature by a clinical expert (JA) from the PLQS, were compared between participant groups in adjusted analyses. We fit individual linear regression models with each score as the outcome and participant group as the primary exposure, adjusting for potential confounders. Finally, we conducted analyses using individual logistic regression models to compare the odds of 'moderate' or 'severe' symptom reporting between participant groups for specific PLQS symptoms.
A p<0·05 was considered statistically significant. All analyses were performed using R (version 4.0.2).

Results
A total of 293 participants with and 54 without prior LD were enrolled in the study. Of those enrolled at the time of early LD diagnosis, 87% and 83% returned for their 6-month and 1-year follow-up visit, respectively. Ninety-three percent of those without prior LD completed both time points. A final sample of 234 participants with and 49 without prior LD were available for analysis, as described in Table 1. Although not statistically significant, participants with LD had a higher median age, were a lower percentage female, and had a lower number of years of education (Table 1). A significantly higher percentage of participants with prior LD were white, non-Hispanic. The two groups were similar in LEC score. Among self-reported co-morbidities at the baseline visit, those present in ≥ 5% of the sample were: thyroid disease, heart disease/hypertension, migraine headaches, and carpal tunnel syndrome. Although none were statistically significant by group, participants with prior LD reported a higher percentage of heart disease/hypertension. There was an overall statistically significant difference in the distribution of symptoms/functional impact status by participant group (p = 0·038), with a significantly higher percentage of participants with prior LD that met criteria for PTLD (13·7% vs. 4·1%). A total of 43.1% of participants with prior LD met criteria for either PTLD or Symptoms Only compared to 24.5% without a history of prior LD (p=0·016).
We then fit a multinomial logistic regression model with outcome status as the response and patient group as the primary predictor, adjusting for sex, race, years of education, LEC, and heart disease/hypertension (Table 2). Although no significant differences in LEC score were found between groups, it was included as we considered it potentially relevant to outcome status. We found that participants with prior LD were approximately 5·28 times as likely to meet PTLD than Return to Health criteria compared to those without (p = 0·042). We also found that females were 4·17 times as likely as males to meet PTLD than Return to Health criteria (p=0·001), and that for each 1-unit increase in LEC score, reflecting an additional potentially traumatic life event, the risk of meeting PTLD criteria increased 30% (p=0·003). Figure 1 shows the distribution of average PLQS scores, as well as the percentage reporting each symptom at an average moderate or severe level by group. Descriptively, participants with prior LD had a higher severity of many symptoms. Among the 10 symptoms we selected, on average participants with prior LD had scores approximately 0·2 to 0·3 points higher than those without in joint pain, memory changes and depression (p= 0·006, 0·027, and 0·046, respectively), adjusting for sex and LEC (Table 3). They also had 2 to 3 times as high odds of developing moderate or severe fatigue and muscle pain (p= 0·002, and 0·047, respectively).
The distributions of the scores for the standardized self-administered questionnaires are presented in Figure 2. Individual linear regression models adjusting for sex and LEC showed that participants with prior LD had significantly worse scores on the SF-MPQ and the SF-36 PCS compared to those without (Table 4, p= 0·006 for both).

Discussion
In our prospective study, participants with history of early LD were more likely to meet symptom and functional impact criteria for PTLD than those without. These differences remained significant in multivariate analyses controlling for relevant co-morbidity, demographic, and psychosocial factors. This is important because it represents the first time that such differences have been examined and found using the standardized IDSA criteria for PTLD in a large US sample.
Studying PTLD is challenging, primarily due to its symptom-based phenotype. Without specific physical findings or practical laboratory biomarkers linked to a defined pathophysiology, PTLD diagnosis, progression, and response to treatment remain vague. Even B. burgdorferi antibody levels are not specific or sensitive in the post-treatment period, and are not included in the IDSA proposed case definition (Wormser et al., 2006). Investigators are therefore left to interpret and operationalize 'symptoms' and 'functional impact' in study-specific ways. These are often the circumstances for other infection-associated conditions as well, including recently described "post-acute" or "long" COVID following SARS-CoV-2 infection, (Nalbandian et al., 2021) which anecdotally shares a similar clinical phenotype to PTLD. While others have relied on physician evaluation, (Wormser et al., 2020) in the current study and in our prior publication, (Bechtold et al., 2017) operationalization of PTLD criteria was performed using surveys.
Using the instruments detailed above among a Mid-Atlantic cohort of participants, we found that PTLD criteria can be fulfilled in approximately 14% with history of prior LD compared to approximately 4% of those without. The 14% is within the 10-20% range of estimated persistent symptom prevalence frequently cited in the literature (Marques, 2008). It is also similar to the 20% recently reported by Wormser and colleagues based on physician assessment and review of records (Wormser et al., 2020). While these estimates indicate that the majority of patients promptly treated for early LD do generally return to their pre-morbid health, it also suggests that PTLD is not necessarily rare, nor an inconsequential public health concern. Especially considering ongoing tick habitat expansion, (Sonenshine, 2018) LD now affects an estimated 476,000 patients annually in the United States (Kugeler et al., 2021). Furthermore, although the majority of patients may not progress to PTLD, this should not invalidate the experience of the subset of those patients who do. The focus of longitudinal studies, including the current one, has historically been to examine whether samples of patients with LD as a whole report symptoms over time, regardless of clinical outcome. This is a different research aim from understanding the clinical phenotype and life impact among the subgroup of LD patients with PTLD, which can be significant (Klempner et al., 2001;Rebman et al., 2017bRebman et al., , 2017a. Indeed, the distribution of SF-14 of participants with prior LD, despite the statistically significant difference in SF-36 PCS scores between groups. The most frequently identified PTLD symptoms are fatigue, pain, and cognitive dysfunction (Rebman et al., 2017b;Wormser et al., 2006). These symptoms are often non-specific and are often reported in general clinical populations as well as in patients with prior LD. In the past, this has led to the conclusion that symptoms in PTLD are likely representative of an expected background "noise" of daily life (Wormser et al., 2006). These statements have historically relied upon extrapolating symptom frequencies across sample populations and symptom measures never intended nor designed to be comparable, potentially masking severity differences and risking false conclusions. In the current study, 24% of participants without a history of prior LD reported these symptoms at a moderate or severe level (our estimate of symptom background "noise"), which is statistically significant compared to 43% with prior LD (p=0·016). On both our PLQS symptom survey as well as more broadly-utilized standardized instruments, specific cohort differences appeared most prominently in these known PTLD symptoms.
The current study is also the first to examine and control for the impact of potential confounders on the relationship between cohort group and PTLD. Sex, race, and education were included in the initial model, given statistically significant or borderline significant demographic differences between the cohort groups. Furthermore, as the presence of co-morbid conditions has been found to be associated with increased symptom reporting in LD, (Wills et al., 2016) we also controlled for heart disease/hypertension, the only condition present in ≥ 5% of the sample which appeared to differ by group. Lastly, mental health conditions and prior exposure to traumatic life events have previously been found and/or theorized to be associated with persistent symptoms following LD (Sigal and Hassett, 2005;Solomon et al., 1998;Wills et al., 2016;Wormser et al., 2006). Therefore, we included LEC score, a measure of potential prior traumatic life events, in multivariate models, and we specifically examined depression both in our PLQS survey and through the use of the BDI-II.
Notably, we found that participants with prior LD remained over 5 times as likely to meet PTLD than Return to Health criteria compared to those without after controlling for these other factors, and there was a statistical trend in both the descriptive and multivariate analyses for these groups to differ in the risk of Symptoms Only as well. We also found that female sex and a higher LEC score remained significant predictors in multivariate models, indicating that these factors are independently associated with meeting PTLD criteria. The reasons for these findings are unclear.
It is possible that female sex and/or increased exposure to traumatic life events impact the initial biologic response to B. burgdorferi infection and increase risk for persistent symptoms (Klein and Flanagan, 2016;Seiler et al., 2020). However, as these factors remained significant after controlling for cohort group, this supports an association with increased symptoms and functional impact unrelated to prior LD exposure (Barsky et al., 2001;McFarlane, 2010).
Finally, although cohort group did predict PLQS depression score, it did not predict moderate/severe level for this variable, nor did it predict BDI-II or SF-36 MCS score. This is consistent with our previous publications that mild or moderate depression is likely not a primary driver of increased symptom reporting in PTLD, at least among patients who are promptly diagnosed and treated (Bechtold et al., 2017).
This study does have limitations, in particular the smaller number of control participants. We accounted for this as best we could by: a) generally recruiting both cohorts from similar underlying geographic and clinical populations, and b) controlling statistically for demographic and clinical factors observed to vary by group. Additionally, in order to increase specificity of our outcomes, and to mirror the IDSA proposed case definition, we excluded participants for a range of conditions with significant symptom overlap with PTLD. Nonetheless, patients with these conditions can be exposed to B. burgdorferi infection as well, therefore future studies will be needed to assess generalizability of our findings to more clinically and immunologically diverse patient populations. Furthermore, our study of ideally-diagnosed and treated participants with early LD limits generalizability to those with risk factors for PTLD, namely delayed diagnosis and treatment. Lastly, a small number of participants (9 cases and 3 controls) met criteria for functional impact but not symptoms, and were characterized as Return to Health. To be certain that this did not influence interpretation of our results however, we conducted a sensitivity analysis with these 12 participants removed, and it did not considerably change our reported results.
In this US Mid-Atlantic prospective cohort study, participants ideally diagnosed and treated for prior Lyme disease reported more symptoms on standardized surveys and were more likely to meet an operationalized definition of PTLD than those without, even after controlling for relevant confounding factors. These findings contribute to a growing literature with the common goal of better understanding and characterizing PTLD.

Contributions
All authors contributed to the study conception and design. Funding acquisition was performed by JNA. Data collection was performed and supervised by JNA, IY, DP, SAG, and AWR. Data curation and statistical analyses were performed by TY and AWR. The first draft of the manuscript was written by AWR and TY, and all authors commented on previous versions of the manuscript. All authors verify the underlying data, and have read and approved the final manuscript.
We are grateful to the research participants who contributed their time and effort towards this study. We would also like to thank Erica Mihm and Cheryl Novak for their efforts in patient recruitment and conducting study visits.

Ethical Approval
We have read the policy of the journal on ethical consent and the standards of animal care. This work has been approved by the Johns Hopkins Institutional Review Board and complies with its human subject research informed consent regulations. No animals were a part to this research.

Conflicts of Interest/Competing Interests
The following financial relationships have been disclosed by the authors, all are not directly or other agency. All authors had full access to the data in the study and accept responsibility to submit for publication.

Availability of data and material
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability
The code generated for analyses during the current study is available from the corresponding author on reasonable request.