Publisher's Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product.
The world-renowned experts at JAMA(R) explain statistical analysis and the methods used in medical research
Written in the language and style appropriate for clinicians and researchers, this new JAMA Guide to Statistics and Methods provides explanations and expert discussion of the statistical analytic approaches and methods used in the medical research reported in articles appearing in JAMA and the JAMA Network journals.
This addition to the JAMAevidence(R) series is particularly timely and necessary because today's physicians and other health care professionals must pursue lifelong learning to keep up with the ever-expanding universe of new medical science and evidence-based clinical information. Readers and users of research articles must have a firm grasp of the myriad new statistical, analytic, and methodologic approaches used in contemporary medical studies. To provide concrete examples, the explanations in the book link to research articles that incorporate the specific statistical test or methodological approach being discussed.
Points to drive home: 1. Noninferiority trial - Goal: To demonstrate that the new treatment is at least almost as good as the old treatment. - Method: Using a predetermined acceptable noninferiority margin; using 1-tailed statistical tests - Caveats: Sample size is usually larger than placebo-controlled trial.
2. Dose-finding trial: - Goal: Find optimal doses by conducting a phase 2 in drug development process.
3. Pragmatic trial: - Goal: To help typical clinicians/ patients make difficult decisions in typical clinical care settings by maximizing the chance that the trial results will apply to patients who are usually seen in practice (external validity).
4. Stepped-wedge clinical trial - A type of cluster design in which the clusters are randomized to the order in which they receive the experimental regime. All clusters begin the study with the control intervention, by the end of the trial, all clusters are receiving the experimental regime. - All participants in all clusters ultimately receive the experimental regime, thereby ensuring that all participants have an opportunity to potentially benefit from the intervention. - Limitations: Require larger sample size, time factor must be accounted for.
4. Cluster randomized trials: - Goal: To evaluate the efficacy of treatments typically involving changes at the level of health care practice, hospital unit - Method: Participants are randomized in groups so that all members within a single group are either experimental/ controlled arm. - Limitations: Participants from the same cluster tend to be more similar to each other, therefore, it neatly violates a common assumption of most statistical tests, namely, that individual observations are independent of each other; risk of contamination of treatments; nearly impossible to maintain blinding of treatment assingment. - Caveats: Consider whether the use of clustering was well justified; would it have been possible to use individual-level randomization?; what would be the likelihood of contamination?; sources of bias; whether the intracluster correlation was appropriately accounted for in the design, analysis, and interpretation; consider whether adequate number, size, similariy of clusters, loss to follow-up.
5. Sample size calculation for a hypothesis test - Goal: To calculate sample size, a power analysis is conducted. The power of a hypothesis test is the probability of obtaining a statistically significant result when there is a tru difference in treatments. - "Power can also be thought of as the probability of the complement of a type II error. If we accept a 20% type II error for a difference in rates of size d, we are saying that there is a 20% chance that we do not detect the difference between groups when the difference in their rates is d. The complement of this, 0.8 = 1 - 0.2, or the statistical power, means that when a difference of d exists, there is an 80% chance that our statistical test will detect it." - To calculate the sample size, the baseline rate and the minimum detectable difference (MDD) are required. The baseline rate is typically available from the literature, since it is often based on a therapy that has been studied. The MDD choice is more subjective, it should be a clinically meaningful rate difference, or a scientifically important rate difference, or both, that is also feasible to detect. - Caveats: Baseline rate might be incorrect, MDD might be arguably appropriate; We should not interpret a lack of significance for an outcome other than the one on which the power analysis was based as confirmation that no difference exists, because th analysis is specific to the parameter settings.
6. Minimal clinically important difference - MCID is a patieng-centered concept which defines the smallest amount an outcome must change to be meaningful to patients. - Statistical significance is linked to the sample size. Given a large enough sample, statistical significance between groups may occur with very small differences that are clinically meaningless. - The MCID can be calculated using consensus (as known as Delphi), anchor, or distribution-based (should not be used) methods.
7. Randomization in clinical trials: Permuted blocks and stratification - Goal: To minimize the imbalance between groups in a small sample-size study. - Method: These are restricted randomization processes + Permuted: To ensure the balance of the number of the patients in each group. + Stratification: To balance 1 or a few prespecified prognostic characteristics between treatment groups.
9. The "utility" in composite outcome measures - A composite end point is an outcome that is defined as occurring if 1 or more of the components occur. - Goal: Because a composite outcome occurs more frequently than its individual components, composites can reduce the number of study participants required to achieve the desired power of a study, making it easier and less expensive to conduct a clinical trial. - Limitations: all components are considered equally which is less likely the case in practice, the beneficial effect might be misleading.
10. Missing data - Methods for handling missing values, each of which is based on different assumptions and has different limitations, include: multiple imputation, single imputation (complete case analysis, last observation carried forward, baseline observation carried forward, mean value imputation, random imputation. - Key questions to consider when selecting a method: (1) Why are data missing? (2) How do patients with missing and complete data differ? (3) Do the observed data help predict the missing values? - The cause for missing data called censoring. Non-imformative censoring is when the censoring provides no information for what the missing data should be. - 3 ways by which data may be missing: MCAR (missing completely at random), MAR (missing at random), MNAR (missing not at random) - the most problematic. - In general, multiple imputation is the best approach for modeling the effects of missing data.
11.The intention-to-treat (ITT) principle - Under ITT, study participants are analyzed as members of the treatment group to which they were randomized regardless of their adherence to, or whether they received, the intended treatment. - The alternatives include, per-protocol and modified intent-to-treat (MITT) analyses. - Since all patients must be analyzed under the ITT, it is essential that all patients be followed up and their primary outcomes determined. - ITT should not be applied for assessing the safety.
12. Analyzing repeated measurements using mixed models. - In longitudinal studies, repeated measurements are usually performed which are likely to be more similar to each other for a particular patients than for different patients. - Mixed models are ideally suited to settings in which the individual trajectory of a particular outcome for a study participant over time is influenced both by factors that can be assumed to be the same for many patients (e.g., the effect of an intervention) - the fixed effect, and by characteristics that are likely to vary substantially from patient to patient (e.g., the severity of the ankle fracture) - the random effect.
13. Logistic regression: Ralating patient characteristics to outcomes. - LR has the ability to "adjust" for confounding factors, it, factors that are associated with both other predictor variables and the outcome. - LR quantitatively links one or more predictors thought to influence a particular outcome to the odds of that outcome. - Limitations: + Colinearity: Multiple variables convey closely realted information. + When 2 variables provide overlapping information, minor random variation in the data can greatly and unpredictably influence how much of the association is attributed to one factor vs the other in the model. + The variables must have a constant magnitude of association across the range of values for that variable. + LR analyses assume that the effect of one predictor is not influenced by the value of another predictor - interaction.
14. LR diagnostics. - The accuracy of a LR model is mainly judged by considering 'discrimination' and 'calibration.' - Discrimination is the ability of the model to correctly assign a higher risk of an outcome to the patients who are truly at higher risk (ie, ordering them correctly). - Calibration is the ability of the model to assign the correct average absolute level of risk (ie, accurately estimate the probability of the outcome for a patient or group of patients). - C statistics, is AUROC, short for 'concordance' between model estimates of rík and the observed risk. - To test whether a model is well calibrated, use Hosmer-Lemeshow or calibration plot. - Limitations: Using AUROC alone as a metric assumes that a false-positive result is just as bad as a false-negative result; With large sample sizes the Hosmer-Lemeshow statistic can yeild false-positive results and thus falsely suggest that a model is poorly calibrated.; Hosmer-Lemeshow test depends on the number of risk groups; With sample sizes smaller than 500, the test has low power and can fail to identify poorly calibrated models.
15. Number Needed to Treat (NNT). - NNT is the number of patients who need to be treated with one therapy vs another for 1 additional patient to have the desired outcome. - Calculated by dividing 100 by the reciprocal of the absolute risk reduction between the groups. - The number needed concept may be applied to both therapeutic and diagnostic studies. E.g., when a therapy increases adverse events, it is the number needed to harm. - Why is NNT important? It is intuitively understandable by patients and clinicians. - Limitations: + The same NNT may represent increases in treatment success (e.g., from 5% to 15% or from 85% to 95%) that may be viewed differently by patients and clinicians. + It can be challenging to compare and integrate different NNTs since they have different denominators. + NNT aligns more closely with patients ('my chance of benefit is 1 in X'). While the benefit per hundred aligns more closely with clinicians ('out of 100 patients I treat, I will help X'). + NNT reflects the number not the importance of events. + NNT does not convey the financial costs and benefits of treatment. + When patient outcomes vary over time, the NNT should capture the vary benefits (e.g., at early, middle, late stage NNTs).
This entire review has been hidden because of spoilers.