Highlights
-
Commonly used outcome measures are highly collinear, falling into two distinct clusters that correspond to respiratory status and motor function.
-
This mixed cohort of adults with SMA showed stable outcome measure scores over 12 months.
-
Future research should explore new outcome measure selection algorithms, and establish anchor-based minimal clinically important differences.
Introduction
Spinal Muscular Atrophy (SMA) is an autosomal-recessive neuromuscular disorder affecting approximately 1 in 21,472 births in Canada and 1 in 10,000 live births globally, characterized by progressive muscle atrophy and other systemic complications resulting from the degeneration of alpha motor neurons in the anterior horn cells of the brainstem and spinal cord. Reference Sugarman, Nagan and Zhu1–Reference Price, Hodgkinson and Westbury4 An estimated 95% of SMA cases are caused by a homozygous deletion of the survival motor neuron 1 (SMN1) gene, which encodes the SMN protein. Reference Sugarman, Nagan and Zhu1–Reference Verhaart, Robertson and Wilson3 There is a broad spectrum of clinical phenotypes of SMA due to varying levels of functional SMN protein produced by the survival motor neuron 2 (SMN2) gene. Reference Farrar, Park and Vucic2 The most commonly used classification systems stratify by age of symptom onset and achieved motor milestones (Types 1–4), while a second system stratifies by current functional status (non-sitters, sitters, and walkers). Reference Wirth, Karakaya, Kye and Mendoza-Ferreira5 Despite these classification systems, it is recognized that the SMA phenotype exists on a continuum. Reference Wirth, Karakaya, Kye and Mendoza-Ferreira5
Disease-modifying treatments (DMT) have altered the expected natural history and disease progression of SMA. DMT has been primarily adopted in pediatric populations, although incomplete data and uncertain benefits for adults with SMA (awSMA) remain. Reference Wirth, Karakaya, Kye and Mendoza-Ferreira5–Reference Baranello, Darras and Day9 Due to the limited evidence available, clinicians, researchers, and SMA community members have advocated for improved outcome measures (OM) that are sensitive, reliable, and responsive to change to measure disease state meaningfully. Reference Farrar, Park and Vucic2,Reference Duong, Wolford and McDermott6,Reference Querin, Lenglet and Debs10–Reference Montes, Gordon, Pandya, De Vivo and Kaufmann13 In this environment, the selection, revision and refinement of OMs to capture the meaningful experiences and clinical progression of awSMA have become an evolving area for clinical and research development. Reference Duong, Wolford and McDermott6,Reference Slayter, Hodgkinson and Lounsberry14–Reference Trundell, Skalicky and Staunton19
The Spinal Muscular Atrophy Recommended Toolkit (SMART) is a Canadian expert consensus-derived core set of eight OMs for use in awSMA. Reference Slayter, Hodgkinson and Lounsberry14 Despite its regular use in clinical and research settings, SMART has incomplete validation evidence, which limits the understanding of how these OMs perform in a real-world setting. Reference Slayter, Hodgkinson and Lounsberry14,Reference Slayter, Casey and O’Connell15,Reference Sansone, Walter and Attarian17,Reference Oskoui and Potter20 Since the introduction of the SMART in 2021, no studies have examined how the complete toolkit performs, and few validation studies have previously compared included OMs head-to-head. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21–Reference Pera, Coratti and Forcina24
Frequent and intensive monitoring is burdensome for awSMA and their family members, as assessment visits can cause fatigue and are costly due to the need to travel long distances to specialty clinics. For clinicians and researchers, inefficient OM use comes with the risk of prolonged clinical trial length, suboptimal research results, excessive costs, and ineffective resource allocation in the clinic. Reference Fogel25,Reference Coster26 Understanding the relationships between OMs and the shared and unique latent constructs being measured will reduce potentially unnecessary costs borne by patients, clinicians and researchers. Understanding OM performance will help optimize the frequency of assessment, reduce unnecessary OM collection, and enhance OM performance, ultimately improving clinicians’ and researchers’ understanding of disease progression.
This 12-month observational study aimed to examine the performance of the OMs included in the SMART by evaluating their validity and sensitivity to change in an awSMA population. Secondly, this study aimed to identify potential modifications of the SMART to enhance its clinical utility, reduce unnecessary testing, and identify future areas of development to improve clinical measurement for awSMA.
Methods
Participant recruitment & study design
Participants were recruited consecutively from a single-site interdisciplinary tertiary adult SMA rehabilitation clinic between December 2021 and November 2023. This referral center services all of New Brunswick and partially covers Atlantic Canada. Following national recommendations for routine monitoring, which recommend every 6 months after DMT initiation for the first year and annually thereafter, participants completed both baseline and 12-month visits, completing the SMART at both time points. Reference Slayter, Hodgkinson and Lounsberry14,Reference Hodgkinson, Chapman and Izenberg27
All participants provided written informed consent to participate. To be eligible for the study, participants must have been above 16 years of age and have an existing SMA diagnosis of types 1 to 4, without need for specific genetic predisposition, for participants between 16 and 18 years old, additional informed consent was obtained from the substitute decision-maker with written and informed assent from the participant. Participants must have been able to complete an interview in English or French, either directly or with the support of a caregiver. Exclusion criteria included any active physical or cognitive comorbidity not attributable to SMA, which, in the opinion of the clinician, could confound the assessment of OMs. At the time of recruitment, there were no consistent methods for determining treatment eligibility, due to a mixed coverage model that led to variability in treatment eligibility. Reference Price, Hodgkinson and Westbury4,Reference Hodgkinson, Chapman and Izenberg27 Based on their functional status, participants were stratified into one of three groups: non-sitters, sitters or walkers. Non-sitters could not sustain an unassisted seated position for more than three seconds, sitters could remain seated without assistance for more than three seconds, and walkers could ambulate at least four steps without assistance.
Outcome measures
The SMART is a consensus-derived toolkit of OMs designed for use in an awSMA population. It includes eight OMs stratified by functional group (non-sitter, sitter, walker), covering the primary domains of gross and fine motor function, respiratory function, and global patient-reported outcome measures (PROMs). Reference Slayter, Hodgkinson and Lounsberry14 Figure 1 presents the complete set of OMs included in SMART, by functional group. The SMART is a representative core outcome set, in use by the Canadian Neurological Diseases Registry (CNDR), that includes several OMs used in other studies. Reference Duong, Wolford and McDermott6,Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21,Reference Côté, Hodgkinson, Nury, Bastenier-Boutin and Rodrigue28–Reference Pitarch Castellano, Cabrera-Serrano and Calvo Medina30 Each OM was completed by the most appropriate clinician (i.e., physician, physiotherapist, occupational therapist or respiratory therapist) who was trained to administer the OM. The same clinician completed the follow-up measurements whenever possible to reduce the risk of introducing inter-rater reliability error.

Figure 1. SMART outcomes. CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC = Forced Vital Capacity; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow; PROM = Patient-Reported Outcome Measure; RULM = Revised Upper Limb Module; SD = Standard Deviation; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.
Motor OMs include the Revised Upper Limb Module (RULM) Reference Mazzone, Mayhew and Montes31 , which sitters and non-sitters completed; the Hammersmith Functional Motor Scale – Expanded (HFMSE) Reference O’Hagen, Glanzman and McDermott32 , which sitters and walkers completed; and the Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders (CHOP-ATEND) Reference Duong, Wolford and McDermott6 was completed by non-sitters. The 6-minute walk test (6MWT) Reference Young, Montes and Kramer33 and the Timed Up and Go (TUG) Reference Dunaway, Montes and Garber34 were completed by walkers.
All participants completed respiratory function OMs, which included the Forced Vital Capacity (FVC), measured in liters (L) and percent predicted (% Pred), and the Peak Cough Flow (PCF), measured in liters per minute (L/min). All participants also completed the Spinal Muscular Atrophy Functional Rating Scale (SMAFRS), a 10-question clinician-administered scale that assesses patient-reported ratings of their ability to complete functional tasks, such as eating, dressing, transferring, ambulation, and hygiene. Reference Elsheikh, Prior and Zhang35,Reference Sadjadi, Kelly and Glanzman36
To determine clinically significant responses, a minimal clinically important difference (MCID) of 3 points for the HFMSE and 2 points for the RULM has been suggested based upon expert opinion. Reference Pera, Coratti and Mazzone23,Reference Pera, Coratti and Forcina24,37 The minimum detectable change (MDC) threshold for the 6MWT in awSMA is estimated to be 30m. Reference Young, Montes and Kramer33 The remaining OMs included in the SMART do not have established MCID or MDC values in awSMA.
Statistical & psychometric analysis
All data was recorded during the participants’ routine clinical visits into a study data log before being transcribed into Microsoft Excel (Version 16.81). In the instance of missing or incomplete data, the participant’s medical chart was reviewed to identify any missing results. If data remained missing, the individual or pair of results was removed from the affected statistical analysis portion. The authors did not complete any data imputation, due to the risk of artificially altering validity estimates and sensitivity to change. Descriptive statistics were completed in Microsoft Excel and R 4.2.3. Reference Coratti, Bovis and Pera38 Hypothesis testing, correlational analysis and data visualization were completed with R 4.2.3 Reference Coratti, Bovis and Pera38 and RStudio39. An alpha of 0.05 was determined a priori to indicate statistical significance. The Wilcoxon signed-rank test compared the change in OM score from baseline to 12-month visits. The Benjamini-Hochberg procedure was used to correct for multiple comparisons. 40 The data analysis file is available upon reasonable request.
Criterion validity is the degree to which an OM reflects a gold standard, and is further subdivided into concurrent validity (CV) and predictive validity (PV). 41,Reference Benjamini and Hochberg42 CV is the degree to which an OM measures expected or unexpected constructs, which are, respectively, described as convergent and divergent validity. Reference Benjamini and Hochberg42 PV is the degree to which an OM predicts future criterion measures. Reference Benjamini and Hochberg42 Both concurrent and PV were measured using the Spearman correlation coefficient (SCC) Reference Prinsen, Vohra and Rose43 by comparing the OM of interest against other known OMs. In the case of PV, baseline OMs were correlated to 12-month scores. Any SCC with fewer than five complete pair-wise data points or with an insignificant p-value (>0.05) after correction was not reported to reduce the risk of reporting unstable correlations. Correlation coefficients were interpreted according to previously published recommendations, with classifications of very strong (>0.9), strong (0.7–0.89), moderate (0.4–0.69), and weak (<0.4). Reference Schober, Boer and Schwarte45 Estimates of OM sensitivity to change, including the standardized response mean (SRM) and OM mean interval score difference, were calculated. The SRM was interpreted according to Cohen’s thresholds, with values categorized as trivial (<0.2), small (0.2–0.5), moderate (0.5–0.8) and large (>0.8). Reference Streiner, Norman and Cairney44
Results
Sixteen awSMA were determined to be eligible and consented to participate. One participant was lost to follow-up before completing the study and was removed from data analysis. No screened individuals were excluded based on predetermined exclusion criteria. Fifteen participants completed the study, comprising 46.7% males and a mean age of 35.5 years (SD 16.7). Two-thirds of participants received DMT (66.7%), with six receiving nusinersen and four receiving risdiplam (Table 1). The duration of the DMT treatment at the time of recruitment ranged from less than 1 year to 4 years. The final sample consisted of seven non-sitters, one sitter and seven walkers, representing a broad spectrum of functional abilities (Table 1)(Figure 2).

Figure 2. Paired scatter plot of FVC (L) compared to remaining SMART by functional group. Line indicates matched individual between time points. CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = Forced Vital Capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go.
Table 1. Participant Demographics.

SMA = Spinal Muscular Atrophy; SD = Standard Deviation.
As expected, there was substantial variation in OM scores, with SMAFRS scores ranging from 0 to 50, RULM scores from 0 to 37, and FVC scores from 0.52 to 6.13 (L). The SMAFRS and RULM best discriminated between non-sitters and walkers but struggled to differentiate among walkers (Figure 3, Panel K). When the FVC exceeded 2 liters, the SMAFRS and RULM exhibited ceiling effects (Figure 2, Panels C and D). When the PCF exceeded 200 L/min, a similar ceiling effect was observed in both SMAFRS and RULM (Figure 3, Panels N and O). The FVC and PCF did not exhibit ceiling or floor effects and remained discriminative throughout the studied sample (Figure 2, Panels A and B).

Figure 3. Highlighted paired scatter plots of remaining SMART by functional group. Line indicates matched individual between time points. CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = Forced Vital Capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go.
Longitudinal change in SMART outcomes
None of the SMART OMs showed statistically significant changes between baseline and 12 months (Table 2). Heterogeneity was apparent among participants in each of the OMs, with interval changes in FVC ranging from a loss of 0.1L to a gain of 0.6L, and the RULM ranging from a 1-point loss to a 3-point gain (Figure 4). One participant achieved the MCID for the HFMSE, and three achieved the MCID for the RULM. The CHOP-ATEND showed interval improvement in five of six participants (Figure 4, Panel F). Figure 5 does not identify divergence of effects when stratified by treatment status, consistent with the finding of no statistically significant changes throughout the study reported in Table 2. Only the CHOP-ATEND demonstrated a large sensitivity to change with an SRM of 1.01 (95% CI −0.03 – 2.06). The remaining OMs produced trivial-to-moderate sensitivity to change as measured by the SRM (Table 2).

Figure 4. Waterfall plot of change in SMART outcomes between baseline and 12 months by functional group. *Horizontal dashed line indicates mean difference. CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = Forced Vital Capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow (L/min); RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.

Figure 5. Waterfall plot of change in SMART outcomes between baseline and 12 months by treatment status. *Horizontal dashed line indicates mean difference. CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = Forced Vital Capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow (L/min); RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.
Table 2. Longitudinal change in SMART outcomes from baseline and 12-Month visits

^Corresponds to Wilcoxon signed-rank test with only complete pair-wise data after adjustment with Benjamini-Hochberg Procedure.
CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; CI = Confidence Interval; FVC (L) = forced vital capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow; RULM = Revised Upper Limb Module; SD = Standard Deviation; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.
Psychometric analysis
Concurrent validity
SMART OMs showed variable collinearity when measuring across different latent constructs (i.e., motor function, respiratory function and overall functional status). Table 3 presents the CV results at baseline and 12-month visits. The SMAFRS was most frequently correlated with other measures, having broad correlations with both motor and respiratory OMS. The SMAFRS exhibited strong correlations to the FVC (L), FVC (% Pred) and PCF. The SMAFRS also showed very strong correlations with the RULM (0.96, 0.92) and HFMSE (0.97, 0.86). The SMAFRS and CHOP-ATEND were not significantly correlated at baseline, though a strong correlation emerged at the 12-month visit (0.94).
Table 3. Concurrent Validity Matrix of SMART Outcomes at Baseline and 12 Months

The upper triangle of the matrix denotes baseline visit correlations, while the lower triangled enotes the 12-month visit correlations.
6MWT and TUG did not exhibit any statistically significant correlations at either time point and were subsequently omitted.
Concurrent validity assessed with Spearman’s Correlation Coefficient, omission of correlation coefficients when P > 0.05 or N <5 is denoted by *. P-values determined after correction with Benjamini-Hochberg procedure.
CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = forced vital capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow; RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.
The RULM correlated strongly with the CHOP-ATEND (0.93) at 12 months but was insignificant at baseline. HFMSE was only strongly correlated (0.73) at the 12-month visit (Table 3). The 6MWT and TUG did not exhibit any statistically significant correlations.
The respiratory OMs showed very strong correlations between FVC (L) and PCF (0.93, 0.99). The FVC (% Pred) was not as strongly correlated to the FVC (L) (0.85, 0.83) or the PCF (0.76, 0.83). Respiratory OMs exhibited moderate-to-strong correlations with the RULM and SMAFRS (Table 3).
Predictive validity
When examining the PV from baseline to 12 months, a strong correlation was observed when comparing within each OM, ranging from 0.94 to 1. Table 4 presents the PV matrix results. There continued to be a reduction in correlation when comparing FVC (L) and FVC (% Pred). Among the motor OMs, the RULM very strongly correlated with the SMAFRS (0.93) and CHOP-ATEND (0.94) at 12 months. The CHOP-ATEND and 6MWT at baseline did not exhibit any significant correlations. In contrast, the TUG at baseline showed a strong to very strong correlation with the SMAFRS (−0.95), HFMSE (−0.82) and TUG (0.94) at 12 months. The SMAFRS had very strong correlations with the RULM (0.97), CHOP-ATEND (0.99) and HFMSE (0.99).
Table 4. Predictive Validity of SMART Outcomes

Predictive validity assessed by Spearman’s Correlation Coefficients, omission of correlation coefficients when P > 0.05 or N <5 is denoted by *. P-values determined after correction with Benjamini-Hochberg procedure.
CHOP-ATEND and 6MWT did not exhibit statistically significant correlations and were subsequently omitted.
CHOP-ATEND = Children’s Hospital of Philadelphia Adult Test of Neuromuscular Disorders; FVC (L) = Forced Vital Capacity measured in liters; FVC (% Pred) = Forced Vital Capacity measured as a percentage of predicted value; HFMSE = Hammersmith Functional Motor Scale – Expanded; PCF = Peak Cough Flow; RULM = Revised Upper Limb Module; SMAFRS = Spinal Muscular Atrophy Functional Rating Scale; SMART = Spinal Muscular Atrophy Recommended Toolkit; TUG = Timed Up and Go; 6MWT = Six Minute Walk Test.
Discussion
This prospective observational 12-month study provides preliminary validation evidence of the OMs comprising the SMART, which is used across Canada for awSMA but has not been directly compared since its inception. Reference Slayter, Hodgkinson and Lounsberry14 This study contributes to the limited available validation evidence for awSMA and directly compares some of the commonly used OMs in this population. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21, Reference Stolte, Bois and Bolz22,Reference Schober, Boer and Schwarte45 The results suggest that SMART OMs measure two distinct latent constructs in a real-world setting but are relatively insensitive to capture clinical change over 12 months. The finding of relative insensitivity to change is compatible with previous literature of awSMA, which remains a major limitation in the current status of OMs for awSMA. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21,Reference Schober, Boer and Schwarte45–Reference Annoussamy, Seferian and Daron47
Longitudinal changes
The lack of statistically significant changes over a 12-month interval, with generally low-to-moderate sensitivity to change and no visual trends observed in Figure 3, suggests that frequently used OMs did not identify changes over the study period. The small sample size and mixed-treatment cohort may reduce the ability to detect change in mean score, effectively reducing statistical power. However, the studied cohort remains representative of the real-world clinical environment, where treatment decisions are made with these OMs among a clinically heterogeneous patient population. In support of our findings, previous literature has also reported low-to-moderate responsiveness and sensitivity to change among commonly used OMs for awSMA. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21–Reference Pera, Coratti and Forcina24,Reference Schober, Boer and Schwarte45,Reference Muni-Lofra, Coratti and Duong48 Despite subgroup analysis by treatment status not being completed due to the small sample size of this study, there was no visual divergence between treated and untreated participants, nor did treatment status explain the heterogeneity of OM results seen over 12 months (Figure 5), suggesting that the findings of low mean interval score difference are less likely to be due to the mixed-treatment cohort.
The OM longitudinal changes over 12 months (Table 2) consistently fell between those expected from a treated cohort of awSMA and previous natural history studies. However, differences between studied functional populations may reduce the comparability of results. The observed mean increase in HFMSE score of 0.38 (SD 2.33) points over 12 months falls between previously reported natural history studies suggesting an expected decline of up to 0.5 points annually, and interventional studies which found increases of between 1.7 and 3 points for awSMA treated with nusinersen, depending on functional type. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio29,Reference Yeo, Tizzano and Darras49–Reference Mercuri, Finkel and Montes52 The observed increase of 0.4 points on the RULM at 12 months is again above an expected loss of 0.4 points among untreated awSMA but less than the previously reported increase of up to 1.6 points among awSMA treated with nusinersen, depending upon functional type. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio29,Reference Mercuri, Finkel and Montes52,Reference Pera, Coratti and Mazzone23 Similarly, the 8.75m increase in 6MWT over 12 months falls between the results of Mazzone et al., who reported a gain of 18.06m among untreated individuals with type 3b SMA and Günther et al., who found a 30.86-m improvement at 14 months among treated awSMA. Reference Muni-Lofra, Coratti and Duong48,Reference Mercuri, Finkel and Montes52
Our finding of small mean interval differences suggests either comparatively slow disease progression within the studied cohort compared to other studies or a moderated effect due to the mixed-treatment status. Figure 5 Panels A, B, and F do not suggest a moderated effect, as individuals with the most significant interval improvements were not receiving DMT. Alternatively, the longitudinal change results may suggest that measurement error accounts for most change, which is reflected in the small SRM values reported in Table 2.
Overall, the longitudinal results of this study suggest that currently used OMs have difficulty identifying change attributable to disease progression at 12 months, indicating that longer measurement intervals are needed to distinguish change attributable to disease progression from potential measurement error. Current OMs should be refined to improve their expected responsiveness, in addition to the evaluation and establishment of patient-reported, anchor-based MCIDs to aid clinical interpretation of OMs among awSMA.
Psychometric validation
The criterion validity results from this study suggest that the SMART OMs measure only two distinct clusters of latent constructs. The first cluster is the respiratory OMs, which were demonstrated by high collinearity between the FVC and PCF. The second cluster is the motor OMs, including the RULM, HFMSE, CHOP-ATEND, TUG and 6MWT, which also clustered together. The strong correlation between the SMAFRS and RULM suggests substantial shared underlying latent constructs, supporting the importance of upper limb motor function for a person’s functional independence, or reflecting the shared heavy weighting of the upper extremity function domain among both measures. Given the strong correlation between the SMAFRS and RULM as well as the SMAFRS and HFMSE, the indications for the simultaneous completion of these OMs should be reconsidered to reduce the burden of testing on patients and the use of clinician resources for clinical monitoring of awSMA. Our results support the previously identified need to develop an OM that is comprehensive, valid, reliable and responsive to patient-reported changes among a wide spectrum of awSMA. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21,Reference Cohen46,Reference Annoussamy, Seferian and Daron47
When the FVC exceeded 2L or the PCF was above 200 L/min, the RULM and SMAFRS appeared to exhibit ceiling effects, suggesting that both RULM and SMAFRS may poorly differentiate between disease status in patients with an FVC or PCF above these thresholds. It is possible that similar ceiling effects were not seen among the 6MWT, TUG, CHOP-ATEND and HFMSE due to their inclusion of only subgroups of the study. The finding of ceiling effects has been observed previously, including in Vazquez-Costa et al., in a larger prospective cohort study, which observed ceiling effects in the RULM and HFMSE, and floor effects of the HFMSE when they compared the HFMSE, RULM, 6MWT and EK2. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21 Additional studies have also reported ceiling effects with the RULM, which was most apparent among walkers. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21–Reference Pera, Coratti and Mazzone23 Further research should investigate whether the FVC or PCF could be used as an adjunct support tool to optimize OM selection, ensuring the use of the right OM for the right patient at the right time.
FVC (L) was more strongly correlated with other respiratory measures. At the same time, the FVC (% Pred) exhibited a lesser correlation, which may be due to unintentionally introduced measurement error through standard correction protocols, as accurate estimation of height and weight can be challenging, particularly among non-sitters and sitters who are known to have higher rates of scoliosis and respiratory changes, even in childhood. Reference Kaufmann, McDermott and Darras53 Further research should examine the factors contributing to potential introduced measurement error in FVC (% Pred) to determine if the FVC (L) or FVC (% Pred) should be used in an awSMA population.
Study limitations
This study has several limitations. Most importantly, it included only fifteen participants, with an overrepresentation of non-sitters and walkers and only one sitter. This unequal distribution of SMA functional types limited subgroup analysis and reduced the generalizability of this study. Furthermore, the small sample size limits the power of this study, affecting the certainty and generalizability of the results.
To mitigate the effects of a small sample size, within-subject differences were primarily examined, reducing the error when comparing between subjects. Second, calculating CV at both time points makes a subjective assessment of statistical stability possible by comparing the CV of results. For example, the results were consistent between the RULM and SMAFRS at both time points, whereas the CHOP-ATEND and SMAFRS produced an inconsistent SCC, suggesting a less reliable result. To reduce the risk of reporting unreliable comparisons, we did not report pairwise comparisons with less than five pairs, and limited reporting to only those with statistically significant correlations after correction. Sensitivity to change results should be interpreted in light of the exploratory nature of this study and should not be viewed as determinative, as evidenced by the wide 95% confidence intervals of the SRM reported in Table 2. Regardless of attempts to mitigate the small sample size, it remains a limitation, and further, more extensive studies should seek to replicate our results to ensure generalizability. While reflective of the real-world clinical setting, the mixed cohort of this study may have moderated the longitudinal results by introducing further heterogeneity in a significantly heterogeneous population.
Conclusions
This preliminary small sample study attempts to fill gaps in the literature by providing validation evidence of the SMART, a core outcome set used across Canada among awSMA. Previous literature has examined several of these tools in various ways, although none have investigated this full set of OMs in combination. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21–Reference Pera, Coratti and Forcina24 The results suggest that while OMs measure distinct latent constructs in a real-world setting, they were insensitive to change over 12 months. Our results support those previously reported, finding that OMs among awSMA remain valid, though they are generally poorly responsive, necessitating caution when determining treatment efficacy over relatively short measurement intervals. Reference Vázquez-Costa, Povedano and Nascimiento-Osorio21,Reference Stolte, Bois and Bolz22,Reference Schober, Boer and Schwarte45,Reference Günther, Wurster and Brakemeier54 Organizations must consider the limitations of the tools used in awSMA, ensuring that decisions regarding DMT efficacy at the individual patient level reflect true disease progression, rather than measurement artifacts. This necessitates a holistic evaluation of the patient, rather than relying on single-test decisions.
The ongoing development of robust clinical OMs, biomarkers, and other markers of disease activity will support further treatment development and the identification of an evolving natural history of SMA in a treatment era. Continued revision of OMs is needed, including review of OM selection and monitoring strategies, re-evaluation of highly collinear OMs, reduction of test frequency and OM number, identification of patient-reported anchor-based MCIDs, and developing simpler OMs. These recommended improvements may minimize unnecessary or inefficient testing, reduce patient fatigue and optimize clinical resource requirements for monitoring awSMA.
Supplementary Material
This material has not been formatted for publication.
Acknowledgments
This study was approved by the Horizon Health Network Research Ethics Board (#101329). The study authors thank the participants and their families for providing their time. We would also like to thank the SMA clinic’s research team and clinical staff at the SCCR for their valuable contributions and support of this project. We would like to acknowledge ResearchNB and the Faculty of Medicine at Dalhousie University for providing student grants, as well as Hoffman-La Roche Limited for their financial support of this study. Data analysis files are available upon reasonable request.
Author contributions
JS conceptualized the study, completed data analysis, interpretation, writing and manuscript review. CO conceptualized the study, acquired funding, supervised and facilitated project administration, and wrote and revised the manuscript. LC completed data curation, data analysis, and study administration and wrote and reviewed the manuscript. DD completed the investigation, data curation, project administration and manuscript review. AC completed the investigation, data curation and manuscript review. SM facilitated project administration, supervision, and manuscript review.
Funding statement
We would like to acknowledge ResearchNB and the Faculty of Medicine at Dalhousie University for providing student grants, as well as Hoffman-La Roche Limited for their financial support of this study.
Competing interests
JS reports receiving funding to support the study from the Dalhousie University Faculty of Medicine and ResearchNB. DD reports consulting fees and personal compensation from Hoffman-La Roche Limited. AC reports receiving personal compensation and honoraria from Hoffman-La Roche Limited and the Neuromuscular Disease Network for Canada (NMD4C). CO reports receiving funding to support receiving grants, payment or honoraria and has served on advisory boards from Hoffman-La Roche Limited and Biogen, received grants from the Canadian Neuromuscular Disease Registry (CNDR), is a member of the medical and scientific advisory committee of Muscular Dystrophy Canada and is a grant co-applicant with the NDMD4C. LC and SM have no declared conflicts of interest.