## Tips

This is a brief description of common statistical misunderstandings that often appear in manuscripts.

1. The greatest problem in medical research is insufficient statistical testing.

No, evaluations of inferential uncertainty may be necessary, but hypothesis testing is not. The greatest problems in medical research are related to inadequate research questions, flawed study designs, and confused interpretation of findings.

2. Why are p-values controversial?

P-values are often misunderstood, incorrectly interpreted as descriptive measures. Findings in a sample are considered practically important when p<0.05, and a p>0.05 is considered an indication of equivalence. P-values are, however, uncertainty measures, and a statistically significant finding is not necessarily scientifically relevant. Scientific relevance has to be shown by other means than p-values. Furthermore, statistical nonsignificance cannot be used to claim equivalence as a p>0.05 just reflects uncertainty. This incorrect use of p-values has evolved into an unfortunate standard and become a substitute for scientific reasoning.

3. What measure can be used to show the uncertainty of an estimated treatment effect?

Estimation uncertainty needs to be considered when the clinical relevance of an estimated effect is evaluated. The p-value cannot be used as this measures the uncertainty of the relation between the null hypothesis and the data, not of the estimated effect size. The correct uncertainty measure of an estimated effect is its confidence interval.

4. Why are odds ratios controversial?

The odds ratio is in some cases (e.g. in case-control studies) a relevant measure in itself, but in other cases (e.g. cohort studies) it is used as an approximation of the relative risk of an exposure. The approximation is good when the baseline risk is low, but otherwise two similar odds ratios can have different clinical interpretations (and two different odds ratios the same) because:
`RR = OR/(1-R+OR*R)`
where R = baseline risk, RR = relative risk, and OR = odds ratio. The clinical significance of a treatment effect cannot always be evaluated if the studied effect is presented as an odds ratio. The problem can be avoided by using a statistical method that provides direct estimates of the relative risk.

5. When analysing data, it is important to check that all continuous variables have Gaussian distributions.

No, some statistical methods, such as Student's t-test, are based on an underlying assumption of a Gaussian distribution, but why should all continuous variables in a research project have a Gaussian distribution? Furthermore, the p-value from a distributional test is as all other p-values a measure of uncertainty. It cannot directly show whether or not a variable has a Gaussian distribution. Moreover, in some cases it is not the observed variables but a derived one that is assumed to have a Gaussian distribution, like the residual of a linear model, and this can have a Gaussian distribution even when the original variables do not.

First, a null hypothesis may or may not include assumptions about a parameter, and a non-parametric null hypothesis can often be tested using a distribution-free test, but the term non-parametric has no specific implications for data. Second, distribution-free tests provide p-values but not necessarily effect size estimates, and p-values are controversial see 2, which means that such tests are not useful for evaluation of clinical significance.

7. Why shouldn't I use Bonferroni corrections?

Multiplicity issues (related to the testing of multiple null hypotheses) are important to address in confirmatory studies, and one way is to use a Bonferroni correction, i.e. by lowering the significance level by a factor of 1/m, where m is the number of tested null hypotheses. However, to avoid subjectivity the adjustment should be pre-specified, and as it has negative effects on the statistical power of the comparisons, it should also be accounted for in the sample size calculation, and this incresases patient numbers and costs. Multiplicity problems can often be avoided in the study design by careful endpoint definitions or solved by using closed test procedures or more effective adjustment methods such as Holm's or Hochberg's methods. In addition, while the existence of multiplicity issues is a problem in confirmatory studies, this is not relevant in exploratory or hypothesis generating studies. Furthermore, the statistical analysis of observational studies needs to include validity considerations as selection and confounding bias cannot be prevent in the study design, which implies that detailed pre-specification is not practically possible. Moreover, the strategy, common in laboratory studies, of Bonferroni correcting for the number of exposure groups but ignoring that multiple endpoints are tested, does not solve the multiplicity problem.

8. I have always performed my lab experiments in triplicates, and now the statistical reviewer complains about n=3.

Is the sample size of 3 really based on a sample size calculation with acceptable risks of false positive and false negative outcomes? Or are these risks unknown? If the uncertainty of the test result is too great, the test result will not be reliable. It is thus important to know the statistical precision. It seems to me that a statistical test based on a sample size of 3 is unlikely to provide reliable empirical evidence, and publishing scientific findings based on clairvoyance instead of empirical evidence is not easy, at least not in journals claiming that they present scientific work.

9. Predictors, covariates, regressors, independent variables, and risk factors.

To be continued...