About the p-hacking email: several remarks for anyone wishing to go down that rabbit hole ...
1. In the larger realm of scientific misconduct, some of p-hacking's nearest kin are researcher degrees of freedom
(addressed in earlier SGU episodes), HARKing
, and data dredging
. As Steve suggested, some variants of these practices may be appropriate (e.g., with proper planning, interpretation, and reporting), but often they're not.
2. How to rigorously manage stopping early (e.g., based on planned "interim looks") in a trial falls under adaptive trial design
, a major topic in the science of clinical trials. Related to Cara's comment about "observational information first," these entail altering the trial based on what we've learned from its previous data. Besides simply ending a trial early, other adaptive changes include adding or dropping an arm/group or modifying entry criteria or interventions.
3. Steve seemed to suggest that stopping early is largely a statistical matter, though he mentioned cost/risk criteria. How this is handled varies, but in some (better?) trials an independent data monitoring committee
(DMC) makes such decisions. A DMC may include or use input from one or more statisticians (e.g., via the study's protocol or generic guidelines for trials) but tends to comprise experts in other relevant domains (e.g., focal medical/health topics, ethics).
4. Adaptive trial design relates to more broadly applicable stats problems in sequential analysis
as well as value of information
. Both of these can be used to decide whether we have enough data about available options/actions to stop collecting more and choose an option. For instance, sequential analysis, which may involves things like alpha-spending functions and monitoring boundaries, has historical roots in quality control (e.g., control charts for manufacturing processes) dating back to at least WWII.
5. Stopping early is linked to multiple comparisons
, subgroup analyses
, and outcome reporting bias
, all of which Steve mentioned or alluded to, in that they involve the statistical notion of multiplicity
: various ways to draw more than one "inference" (i.e., extrapolation from a data sample to its parent population/universe) from the same data set. Experts disagree about whether or how to handle multiplicity, and it's one area where classical/frequentist strategies may differ markedly from Bayesian approaches. To wit, Andrew Gelman's thoughts about multiple comparisons: http://andrewgelman.com/2014/10/14/one-lifes-horrible-ironies-wrote-paper-usually-dont-worry-multiple-comparisons-now-spend-lots-time-worrying-multiple-comparisons/