Several times I’ve encountered the comment that inferences based on relative belief are the same as inferences based on integrated likelihoods. This is in a formal sense correct but it misses an important point. The purpose of this post is to make this clear. For what follows take all probability measures as discrete. There is no change in the argument for the continuous case except one has to add some irrelevant Jacobians. Also all priors are proper.

Suppose interest is in inference about a parameter psi=PSI(theta) where theta is the model parameter. Anything that locates the value of theta in the inverse image PSI^{-1}{psi} is a nuisance parameter and the standard, and appropriate, Bayesian approach is to integrate these out. Let f(x|theta) be be the density of the data x when theta is true and L(theta|x) denote the likelihood. Note that L(theta|x)=cf(x|theta) for some c>0. So a likelihood is defined only up to a positive constant multiple and any function of theta in this equivalence class serves as a likelihood function.

Let pi(theta) be the prior on theta and pi(theta|psi) be the conditional prior of theta given psi. So

pi(theta|psi) = pi(theta)/(sum pi(theta) over theta in PSI^{-1}{psi}) = pi(theta)/pi_PSI(psi)

is the conditional prior density of theta given psi and pi_PSI(psi) is the marginal prior of psi. The integrated likelihood of psi is given by

sum L(theta|x)pi(theta|psi) over theta in PSI^{-1}{psi}

= c times sum f(x|theta)pi(theta|psi) over theta in PSI^{-1}{psi}.

The relative belief ratio of psi is given by

RB(psi|x) = pi_PSI(psi|x)/ pi_PSI(psi)

= (sum pi(theta|x) over theta in PSI^{-1}{psi})/ pi_PSI(psi)

where pi(theta|x) is the posterior density of theta and pi_PSI(psi|x) is the posterior density of psi. Now

pi(theta|x) = f(x|theta)pi(theta)/m(x) = L(theta|x)pi(theta)/cm(x)

where

m(x) = sum f(x|theta)pi(theta) over all theta

is the prior predictive density of the data x. It is immediate then that

RB(psi|x) = (integrated likelihood)/cm(x).

So indeed a relative belief ratio is an integrated likelihood. Since relative belief inferences for psi are determined by the ordering induced by the relative belief ratios, this implies that these inferences are the same as those induced by the integrated likelihood ordering.

So what is the difference? The difference lies in the interpretation of these quantities and this is significant. For note that RB(psi|x) is measuring change in belief from a priori to a posteriori. By the basic principle of evidence (this it isn’t a theorem but an axiom), if the data have caused the probability to go up, then there is evidence in favor and evidence against if the probability has gone down. So RB(psi|x) > 1 means there is evidence that psi is the true value and RB(psi|x) < 1 means there is evidence that psi is not the true value while RB(psi|x) = 1 means no evidence either way. Contrast this with the value that an integrated likelihood takes. Actually, the specific value is meaningless as c>0 is arbitrary. Any likelihood can at most determine relative evidence between values.

Does this difference matter? Yes, in many ways, but as an illustration consider a well-known example, called the prosecutor’s fallacy, where the role of the relative belief ratio as a measure of evidence clarifies some issues. This is discussed in the book in Examples 4.5.4, 4.6.3 and 4.7.4 where the relevant numerical computations are provided. According to this example the prosecutor has noted that a defendant shares a trait with the perpetrator of a crime and, since the trait is rare in the population, concludes (even calculating an erroneous probability of guilt) that this is overwhelming evidence of guilt. A statistician calculating the appropriate posterior probability of guilt finds that this is very small and concludes that this is evidence of innocence. Both are wrong! It defies common sense to suppose the fact that the defendant and the perpetrator share the same trait is not evidence in favor of guilt and indeed the relative belief ratio for guilt is greater than 1. The real question is whether or not this is strong or weak evidence of guilt and to determine this it is necessary to calibrate the value of the relative belief ratio. The calibration issue for relative belief ratios is discussed in the book and, when the proposal for calibration is followed in this example, it is determined that the evidence for guilt is weak (which doesn’t mean the evidence for innocence is strong as there is evidence for guilt and against innocence). So it is (hopefully) unlikely that we would convict based on only weak evidence for guilt.

But now change the circumstances of the problem but with exactly the same numbers. In this case the question is whether or not a person is a carrier of a deadly infectious disease when the data tell us that the individual has been in an area where the disease is rampant. So there is evidence of the person being a carrier and again it is weak, but should the person be quarantined or not? The answer to this question, as with the legal case, has nothing to do with statistics. The role of statistics is to provide a measure of the evidence and its strength. Once this is done other factors involving ethics, risks, etc. determine the outcome.

None of this follows from the integrated likelihood as it is not a measure of evidence. A central thesis of the book is that any theory of statistics has to be built on a measure of evidence as that is the main issue in applications of statistics.

There are a number of other benefits that arise from providing a measure of evidence. The book discusses many of these but there is one that is particularly notable. Given a measure of evidence you can measure a priori the bias in the prior, namely, you can calculate the prior probability, based on a particular amount of data, of obtaining evidence for or evidence against a particular hypothesis. For example, if you are told that the data provides evidence in favor (against) of a hypothesis would you believe this is relevant if you also were told that the prior probability of obtaining evidence in favor (against) is very large? In other words, the prior may be such that there is a foregone conclusion and that is what bias is measuring. Using integrated likelihood does not provide a means to answer such a question because it is not a measure of evidence while the relative belief ratio is a measure of evidence and so leads directly to a measure of bias.

So I do not agree that relative belief is just using integrated likelihood and too much that is relevant is lost by thinking of relative belief in this way. In fact, there is no need to even mention the concept of likelihood from the perspective of relative belief.

Hi Mike, just a minor suggestion that you may consider to use MathJax in wordpress for LaTeX symbols.

https://docs.mathjax.org/en/v1.1-latest/platforms/wordpress.html

Best,

Jun

LikeLike