Demystifying the Bayesian analysis of ego-depletion

At SPSP 2018, there was the big reveal of the awesome Many Labs replication report of ego depletion effects.
Thirty-six labs participated, and not even the labs had yet seen the full results.
I was in my room decompressing a bit (conferences are exhausting), but following along through Sanjay Srivastava’s live tweeting the session.

The anticipation for the results was palpable even on twitter. I had to rush down there to hear the results myself.
I ran in just in time to hear the results.

Wagenmakers couldn’t be there to deliver the results of a Bayesian analysis himself, unfortunately.
The speaker did their best to convey the results, but I had the sense that some nuances were missed a bit.
I want to reiterate: The speaker did a good job! But there are some subtleties to these analyses that unless you’re a fairly experienced Bayesian, you will miss or under-emphasize.
So this post isn’t to criticize the speaker at all!

The gist I got from the Bayesian results was that:

  • The data are approximately 4x more likely under the point-mass null model than under the a-priori specified model.
  • The meta-analytic effect is approximately .08. (I mistakenly read it as .06, but others told me .08; I was in the back, sorry!)

But it seemed like the take-home message was “The null was favored; but the meta-analytic effect is .08, so let’s instead talk about how we think about small effects”.

Those I spoke with after the session had the same take-home message — The effect exists, but it’s tiny: $d = .08$.

It’s not actually this simple. The “real” best-estimate is, according to Wagenmakers, about $d = .016$.
The “problem” is that the Bayesian meta-analytic method used provides a conditional effect estimate — An effect conditioned on the assumption that an effect is non-zero and positive.
To non-Bayesians, this is easily ignored. To Bayesians, this is a critical piece of information, because that condition completely changes the point estimates.

Let’s dig into the method, as best as I currently understand it.
Keep in mind, please, I have absolutely nothing to do with this analysis; I’m not involved in this project.
Many things I type here may be incorrect, because I simply do not know the details of the analysis.
Also worth mentioning: I have no invested interest in the approach being correct, either professionally or personally.
Quite honestly, I have had a lot of disputes with Wagenmakers et al’s approach to inference, because not all Bayesians are alike, and I generally hate model averaging and Bayes factors. With that said, if I am correct about how the Bayesian approach was performed, I think it was a good approach for the purposes of this project. Kudos to the entire Many Labs project, and kudos to Wagenmaker et al for their analysis.

The two meta-analytic models

The slides mentioned that both a fixed-effects and random-effects meta-analysis was performed, and the overall effect was estimated using both.
This does not mean the data were used to “choose” a fixed or random-effects model.
Instead, this method weights the “overall” estimate by the meta-analytic model probability.

A fixed-effects meta-analysis in Bayes looks like this:

$$
p(\theta|y_1,\ldots,y_k,MA=1) \propto p(y_1,\ldots,y_k|\theta)p(\theta|MA=1)
$$
Or with sufficient statistics…
$$
p(\theta|\hat\theta_k,\sigma_{\hat\theta_k},MA=1) \propto p(\hat\theta_k|\sigma_{\hat\theta_k},\theta,MA=1)p(\theta|MA=1)
$$

Essentially, either one is simply stating that there is one fixed effect, and each dataset ($y_k$) or estimate-information combination ($\hat\theta_k,\sigma_{\hat\theta_k}$) is an instance of that fixed effect.

The random effects meta-analysis looks similar, except that each estimated effect is assumed to come from a distribution of effects with an unknown variance and mean.
The mean is taken to be the expected value of the effect.

$$
p(\Theta|y_1,\ldots,y_k,\theta_k,\sigma_{\theta_k},MA=2) \propto p(y_k|\theta_k,\Theta,\sigma_{\theta_k})p(\theta_k|\Theta,\sigma_{\theta_k})p(\Theta,\sigma_{\theta_k}|MA=2)
$$
Or with sufficient statistics…
$$
p(\Theta|\hat\theta_k,\sigma_{\hat\theta_k},\sigma_{\theta_k},MA=2) \propto p(\hat\theta_k|\sigma_{\hat\theta_k},\theta_k,\Theta,\sigma_{\theta_k})p(\theta_k|\Theta,\sigma_{\theta_k})p(\Theta,\sigma_{\theta_k}|MA=2)
$$

In order to get just the posterior for $\theta$ or $\Theta$ (fixed effects, respectively), you’d have to integrate out the unknowns, which I’ll skip for the time being, because MCMC does that for you.

But across fixed-effects and random-effects models, what is the posterior distribution? Well, you have to integrate over the particular models themselves.
Assuming we’re working with the raw data vector, $y$:
$$
\begin{align}
p(\Theta|y) &= p(\Theta|y,MA=2)p(MA=2|y) + p(\theta|y,MA=1)p(MA=1|y) \\
p(MA=1|y) &= \frac{p(y|MA=1)p(MA=1)}{p(y)} \\
p(MA=2|y) &= \frac{p(y|MA=2)p(MA=2)}{p(y)} \\
p(y|MA=1) &= \int p(y|\theta,MA=1)p(\theta|MA=1)d\theta \\
p(y|MA=2) &= \iiint p(y|\theta_k,\Theta,\sigma_{\theta_k},MA=2)p(\theta_k|\Theta,\sigma_{\theta_k},MA=2)p(\Theta,\sigma_{\theta_k}|MA=2)d\theta_k d\Theta d\sigma_{\theta_k} \\
p(y) &= p(y|MA=1)p(MA=1) + p(y|MA=2)p(MA=2)
\end{align}
$$

This all looks really convoluted, but in words:

  • The posterior probability distribution of the fixed effect is marginalized across the two different meta-analytic methods (fixed and random effect models), weighted by the posterior probability of each meta-analytic model given the data.
  • The posterior probability of a fixed-effects model is the probability of the data under that model times the prior probability (probably .5), divided by the marginal probability of the data across both. This is just Bayes theorem.
  • The posterior probability of a random effects model is the probability of the data under that model times the prior probability (probably .5), divided by the marginal probability of the data across both. This is just Bayes theorem.
  • The probability of the data under each model is the probability of the data given the parameters in each model, marginalized over the a-priori specified priors for each unknown parameter. This is where the $\text{Normal}^+(.3,.15)$ prior comes in for both $\theta$ and $\Theta$.

All of this comes from probability theory and Bayes theorem.
In a nutshell, you estimate a fixed-effects model and a random-effects model, assuming that the hypothesis about the effect is true and is expressed statistically as $\text{Normal}^+(.3,.15)$.
You get the marginal probability of the data under each model, using bridgesampling, because computing that manually is intractable.
This marginal probability represents the probability of observing the data they did given the a-priori hypothesis-informed uncertainty about the fixed effects.
The posterior probability distribution of the fixed effect is then informed by both the fixed and random-effects models, weighted by their respective marginal probabilities.

Both of these models assume the effect exists, and is positive.
A-priori, the hypothesis states that the ego-depletion effect is positive, and with uncertainty represented by Normal(.3,.15).
The posterior distribution for the ego-depletion effect, assuming such an effect exists as hypothesized, is informed by both the fixed and random-effects models, but weighted by the probability of observing the data they did under each model.
Really, the posterior distribution is: $p(\Theta|y,H1)$, meaning it is conditioned on both the observed data and that the hypothesized effect is true.

The hypothesis-driven prior on $\Theta$ (d) looks a bit like this:

And the posterior might look something like this:

But remember! This posterior is conditioned on the effect being positive, non-zero, because the prior constrains it to be so.
And this is likely not what the posterior actually looks like, this is just a best-guess approximation.
Something else to consider…

Posterior distributions and Bayesian point estimates are not like modal estimates

This is a quick aside.
Bayesian posteriors are distributions of effects, representing the data-informed posterior probability density or mass of parameters.
Outside of Bayesian inference, people are really accustomed to modal estimates, with an accompanying standard error or confidence interval.
In Bayesian methods, people often don’t report modal estimates (maximum a-posteriori), but rather medians or expectations with an accompanying credible interval.

When a prior induces a posterior boundary, these will not equal one another.
Just to build the intuition, let’s assume you fit a model without any boundaries, and the posterior looks like this:

This posterior suggests a mean, median, and mode of zero.

Now let’s use a prior to restrict negative values altogether. The posterior looks like this:

The modal estimate is still zero, but the median is about .629, and the mean (expected value) is about .765.

Why is this important?
The Bayesian MA posterior, conditioned on the effect being positive, has a median estimate of .08.
The mean estimate is necessarily higher, and the modal estimate is necessarily lower.
For what we currently know, that posterior could literally look like this:

In this posterior, the median is .08 (the same effect estimated by the Bayesian MA model), but the modal estimate is zero.

This is where Bayesian estimates and non-Bayesian estimates diverge: Maximizing estimators would say the effect is actually zero, whereas the Bayesian median and mean would be .08 and .10, respectively.
It’s not that Bayesian estimates are wrong, but they are really descriptions of a posterior distribution.
They are not intuitively the same as maximum-likelihood or other estimators. In this case, that posterior is really just a normal distribution, centered at zero, with all negative mass piled up at zero.
In other words, if the prior had not constrained the posterior to positivity, that posterior would have a mean, median, and mode of about zero.

This is important to recognize, because having a median effect of .08 is not particularly impressive or exciting.
In reality, that suggests a 50% posterior probability that the effect is between 0 and .08, assuming it can only be positive — And even effects that are in actuality zero can have a median estimate of .08 when positivity is induced.

The actual posterior

Wagenmakers and others compared two models.
One model was the combined-meta-analytic approach discussed above.
The other model assumed a point-mass of zero on the effect.

The Bayes Factor for this was $BF_{01} \approx 4$, from my best recollection.
This means the data are four times more likely to occur if the true effect were precisely zero than if the hypothesized (meta-analytic) effect of $\text{Normal}^+(.3,.15)$ were true.

If you want to know the overall posterior distribution for the effect, you have to marginalize across these models.
$$
p(\Theta|y) = p(\Theta|y,H1)p(H1|y) + p(\Theta|y,H0)p(H0|y)
$$

This posterior will look a bit like this:

The mean estimate is about .016.
That is — Assuming the effect could either be zero or positive, the effect is about .016.

This is not the traditional meta-analytic effect size estimate, but given the a-priori hypothesized effect size range and the alternative hypothesized effect of zero, then the meta-analytic effect would be approximately .016, not .08.
Once again, .08 assumes that the effect is positive and exists, which we are not certain of.
If we marginalize across the uncertainty in the models themselves, the expected estimate is about .016.

Hopefully, a more traditional meta-analytic estimate will be reported in the paper as well.
For example, using a zero-symmetric estimation prior (rather than the hypothesis-testing prior employed in the presented analyses) permits the effect to be any real number, weighted by the prior.
This would serve to probabilistically down-weight insane estimates (like $d\geq1$).
I am guessing that the meta-analytic effect would be between 0 and .05.
I am further guessing that if they used a prior mass at zero along with a zero-symmetric prior, the mean,median, and modal estimate would all be essentially zero (between 0 and .02).

Summary

From what I can gather:

  • The meta-analytic effect used a hypothesis-informed prior for the effect, meaning the meta-analytic effect is estimated assuming it is positive.
  • The data are about 4 times more likely to come from a parameter value of zero than from the hypothesized distribution of effects.
  • Assuming the hypothesis is correct, the median estimate is about .08.
  • Marginalizing across whether H1 or H0 are correct, the median estimate is about .016.
  • Do not take the meta-analytic effect size estimate as the take-home message, because that neglects the possibility that zero is a fairly probable estimate. I.e., don’t forget that the estimate is conditioned on the effect existing as hypothesized, which is 4x less likely than the null hypothesis.

Leave a Reply