Variability in clinical assessment of clade IIb mpox lesions

Highlights • There are challenges in categorizing mpox lesions.• We assessed the inter-rater reliability and agreement among clinicians.• We found moderate reliability and agreement in mpox lesion assessment.• This study has implications for interpreting trial results.• There is a need for standardized lesion classification in clinical trials.


Introduction
Mpox (formerly known as monkeypox) is an infectious disease caused by the mpox virus.The 2022-2023 global outbreak of mpox is caused by clade IIb mpox virus -the focus of this study [1] .
Clade IIb mpox clinical presentation differs from previous outbreaks in Africa (clade I/IIa disease) with a predominance of perianal and genitourinary lesions, new complications such as proctitis, and lower mortality [2] .There is a broader spectrum of lesions that defy the classification systems previously used for Orthopoxviruses.
Classifying whether a lesion is active or resolved is important for clinical decisions about treatment and public health decisions about self-isolation.Accurate assessment is also critical for research [2] .Several clinical trials have been launched worldwide to investigate potential treatments.These all use lesion resolution as their primary endpoint.Urgent investigation is therefore warranted to ensure that clinical trials and clinical decision-making are based emergency medicine, internal medicine, dermatology, general practice, or sexual health who had managed clade IIb human mpox virus disease.The exclusion criteria were clinicians not practicing clinically or completing their internship or foundation years.

Data collection process
Participants were presented with 20 slides, each containing a single image of a clade IIb mpox lesion.The participants were asked to categorize each lesion independently, choosing one of the following options: active, crusted and scabbed, resolved and healed, or unable to classify.Participants were provided with the World Health Organization's working definition of each of these categories [4] .Participants had 30 minutes to complete the questionnaire and could only complete it once.Each clinician's specialty, country of practice, estimated number of mpox patients seen, self-rated confidence in assessing mpox lesions, and perception of the availability of clinical guidelines for managing mpox patients were sought.Participants did not receive training for the study.

Data analysis
All analyses were performed using R Statistical Software (v4.1.2;R Core Team 2021).Categorical variables were described as proportions.Continuous variables were described as mean and standard deviation if normally distributed or median and inter-quartile range (IQR) if not normally distributed.Inter-rater reliability was measured using Fleiss' kappa coefficient due to its suitability for nominal data and studies that involve multiple raters [5] .The 'irr' package in R used for calculating Fleiss' kappa was unable to calculate the confidence interval, therefore it was generated through bootstrapping using the 'raters' package and Monte Carlo simulation.Inter-rater agreement was measured using proportion of exact agreement.The levels of agreement across all lesions were plotted to investigate clinicians' agreement patterns.A range of sensitivity analyses were conducted to assess the robustness of the findings (see Appendix B for these methods).

Description of participants
A total of 109 potential participants received and opened the survey, 53 of whom completed the survey.All participants who started the survey completed it.A description of the participant's specialism, country of practice, number of mpox patients seen, and feelings towards adequacy of available guidelines for managing mpox are shown in Table 1 .On a scale for self-reported confidence assessing mpox patients on a scale of one (no confidence) to ten (complete confidence) the median confidence rating was 7 (IQR: 5-8) (Appendix C for distribution).

Inter-rater agreement
The exact proportion of agreement between the participants was found to be 61%.The proportion agreement per lesion question is shown in Figure 1 .
Sensitivity analyses are described in Appendix B. These analyses included recalculating the inter-rater reliability after removing question 20 as it had a high proportion of 'unable to classify' responses, excluding individual participants one at a time to account for possible outlier participants, and restricting to clinicians who had seen more than 20 mpox patients in their clinical practice.It also included recalculating the inter-rater agreement removing all 'unable to classify' responses.

Discussion
Clinicians with experience in treating mpox show only moderate agreement on classifying lesions based on single time-point photography of lesions.The proportion of exact agreement between the participants in this study was 61%.For the inter-rater reliability, the Fleiss Kappa was 0.417.This has clear implications for decisions regarding initiation of treatment, infection prevention and control procedures, and outcomes for clinical trials.
When expert clinicians are unable to agree on lesions despite the provision of a descriptive framework it creates a real challenge.
Lesion assessment cannot be restricted to a very small subset of specifically trained dermatologists because this does not meet the reality of the broad range of treatment providers for a disease with over 80,0 0 0 cases.Furthermore, our anecdotal experience from lesion assessment working groups suggests that uncertainty continues even when decision-making is restricted to super-specialists.This will be an issue for the creation of 'gold standard' guidance.Some trials are using self-assessment of lesions from patients and the accuracy and reliability of this approach is not certain.
Interestingly, these findings occur despite participants' relative confidence in their ability to assess mpox lesions (median = 7, IQR = 5-8).It is uncertain whether clinicians who feel less confident in their ability to classify lesions were less likely to fill in the questionnaire, and whether they would be more or less likely to agree with others' assessment.Our sensitivity analysis, which we restricted to clinicians who had seen more than 20 mpox patients, suggests that those with more experience are more likely to agree but only minimally.
In terms of clinical assessment, lesion status is used to make decisions about whether to refer to specialist care, initiate or continue antiviral treatment, may contribute to decisions around hospitalization, and in some jurisdictions is the basis of guidance regarding self-isolation and infection prevention and control requirements.Disentangling the influence of lesion assessment on these decisions, compared with the myriad of other factors that contribute to a clinical decision is difficult, but clearly more robust and objective assessment options would assist with the development of clinical guidelines and help standardize infection prevention procedures.Selection of lesion resolution as an endpoint for clinical guidelines and trials was a very reasonable extrapolation from clade I mpox and smallpox (which show a more uniform progression of lesions) at the time, early during this outbreak, when they were developed.How to balance refinement of these outcomes as more data accumulates is a challenge.Clinical characterization protocols are critical, but waiting for the outcomes of these means that trials would potentially launch too late.In the interim, patients would continue to be provided treatment under compassionate care circumstances with minimal assessment of efficacy or safety.At the time of writing, no superior endpoint for clinical trials is obvious.

Limitations
This study reports agreement rather than accuracy because there is no 'correct answer' to compare to.Assessment of a lesion based on a single photograph is likely to be inferior to assessment in real-life clinical settings, or assessment of a lesion over time where the evolution of the lesion can be taken into account.The study had a modest effective response rate of 49%.Most (81%) respondents were also from the same specialty (infectious disease physicians) who, at this point in the epidemic, tend to be responsible for the management of these patients in the locations of this study.This may affect the generalizability of the findings to other specialties managing clade IIb mpox disease.This may also reflect response bias because it is possible that clinicians with more expertise in mpox disease were more likely to participate.It is, there-fore, plausible that the true inter-rater reliability and agreement might be lower than reported.

Conclusion
This study assessed the inter-rater reliability and agreement of clade IIb mpox lesion assessment.We demonstrate moderate levels of reliability and agreement, providing valuable insights into the consistency of lesion categorization in clinical settings and the validity of lesion assessment for mpox as an endpoint in clinical trials.

Table 1
Description of study participants.
Figure 1.Proportion agreement for each question.