Measuring Statistical Evidence

The role of “statistical evidence” in statistics is central and yet most approaches to developing a theory of statistics are somewhat ambiguous about how this is to be measured. At the very least this creates ambiguity and at its worst it leaves one with the impression that statisics is completely lacking in any logical foundations. If the worst case applies, why would one have any confidence in the inferences drawn from a statistical analysis of real data?

There are of course several attempts to deal with statistical evidence in the statistical literature. Perhaps the most prominent is the commonly used p-value. But this suffers from numerous well-documented difficulties as a measure of evidence. In fact, it is fair to say that the p-value is really not a valid measure of evidence. Pure likelihood theory comes closer to dealing adequately with the concept but there are basic gaps that need to be filled in and this seems unlikely to be possible without the addition of a prior to a problem. With the addition of a prior we do have a valid measure of statistical evidence, namely, the Bayes factor. But even here there are issues that need to be addressed. First there is the issue of the definition of the Bayes factor as this can be approached in several ways with some leading to better results than others. Second, and perhaps most important, there is the issue of calibration as in when is a Bayes factor reflecting strong evidence for or strong evidence against, etc.

The book Measuring Statistical Evidence Using Relative Belief ( discusses these issues. Furthermore, a measure of statistical evidence together with a calibration of this measure is presented and it is shown how a theory of statistical inference (estimation, prediction, hypothesis assessment, etc.) is determined by this. The basic measure of evidence is called the relative belief ratio and it is closely related to the Bayes factor. The approach produces inferences with many optimal properties as is demonstrated in the text.

Of course, this requires the prescription of a (proper) prior and many object to this addition because of its subjectivity. The issues surrounding objectivity and subjectivity are discussed in the text. The following quote somewhat summarizes the point of view taken.

“No matter how the ingredients are chosen they may be wrong in the sense that they are unreasonable in light of the data obtained. So, as part of any statistical analysis, it is necessary to check the ingredients chosen against the data. If the ingredients are determined to be flawed, then any inferences based on these choices are undermined with respect to their validity. Also, checking the model and the prior against the data, is part of how statistics can deal with the inherent subjectivity in any statistical analysis. There should be an explicit recognition that subjectivity is always part of a statistical analysis. A positive consequence of accepting this is that it is now possible to address an important problem, namely, the necessity of assessing and controlling the effects of subjectivity. This seems like a more appropriate role for statistics in science as opposed to arguing for a mythical objectivity or for the virtues of subjectivity based on some kind of coherency.”

So objectivity is indeed the (unattainable) goal in any scientific work and the necessity of subjectivity is dealt with, as much as is possible through statistical tools. For example, with a measure of evidence we can assess the extent to which a prior induces bias into a statistical analysis.

Comments on these or any other issues associated with measuring evidence are welcome.