Updating? Pooled data? Meta-analysis? Yes.

Meta-psychology is a journal that is founded on the principles of total transparency. Manuscripts are open. Reviews are open. Open commentary is permitted. Each draft is visible. It’s as though the journal cares about science or something.
Rickard mentioned a new submission about the utility of Bayesian updating, or as the authors call it – Posterior passing.

My comment can be read here by clicking on the upper-right comment button. Due to the excruciating 1000 character limit, you will have to hit the ‘+’ sign to read all of them.

Although I generally appreciate the paper, I think it misses an important point.
Their proposal to use “posterior passing” instead of meta-analysis or using full raw datasets is strange upon realizing that posterior passing is the same as having all of the raw data, and the same as a fixed-effects meta-analysis under a few reasonable assumptions.

Posterior Passing

The gist of posterior passing is that a researcher can take the posterior of a parameter computed from a previous sample, and use that posterior as a prior.
Or, as the rest of the world calls it – Bayesian updating.
The prior is then updated by the new sample data into a new posterior.
That new posterior can serve as someone else’s prior, and so on.
Over repeated samples, this results in decreasing uncertainty (smaller posteriors) about parameter values.

Posterior passing looks like the following. Assume $Y_j$ represents some current observation vector, and $Y_{-j}$ represents a set of previous observation vectors. Or: $Y_j$ is your data, and $Y_{-j}$ is not your data.
\begin{align}
p(\theta,Y_j|Y_{-j}) & \propto p(Y_{j}|\theta,Y_{-j})\overbrace{p(\theta|Y_{-j})}^{\text{Your prior, their posterior}}
\end{align}

That’s about it. Obtain data, estimate a posterior, use that posterior as a prior for the next sample. Keep doing it ad nauseam, and you have posterior passing.

The argument is that this method is underutilized in the sciences, and that engaging in posterior passing will permit researchers to quickly converge toward a more certain, accurate estimate.

I do not disagree with the idea of Bayesian updating, of course. It’s just a mathematical fact, and it is underutilized. People are way too afraid to incorporate prior information into a model.
With that said, the proposal has a few problems, some they mention and some they do not.

  • The proposed use of it assumes a single set of generative parameters across all of the studies. Given that multiple studies have minor to major differences between them, surely their corresponding data are not generated from precisely the same set of parameters. It would be difficult to find three, let alone 60, studies that are feasibly generated from the exact same parameters.
  • The quick convergence toward the “true value” strongly depends on the quality of the available posteriors from previous studies. $ p_\text{osterior}-\text{hacked} $ studies and publication bias result in a misleading assortment of posterior distributions, and though your own data may overwhelm the priors, it is not guaranteed. Indeed, if they had simulated the existence of hacking and pub bias, I suspect the performance would be underwhelming.

My bigger disagreement is with advocating it over meta-analysis, or even comparing it to having the full data available.

They’re the same things! Advocating for repeated Bayesian updating across studies in order to get a highly informative posterior is the same thing as advocating for full-data analysis and fixed effects meta-analysis.

Bayesian updating vs full-data analysis

Imagine two researchers are arguing about their next three samples.
One researcher wants to collect 300 observations, then another 300, then another 300.
She then wants to compute a posterior based on the first 300, feed that in as a prior for the second 300, and finally the resulting posterior into the third.
Specify an initial prior, then chain posteriors into the next model’s prior, until we get a three-study Bayesian update on the state of information.
The other researcher thinks that’s inferior to the full-data method.
He wants to just collect 900 and be done with it; analyze the whole thing.
Specify an initial prior, then throw all 900 observations into a single model.
They argue for hours. Friendships are severed. Children are neglected. They use all caps when texting because their voices have given out.

Neither researcher is right or wrong.
These two approaches will accomplish the exact same end result.
They should have just worked out the math and grabbed a beer instead.

\begin{align}
p(\theta,Y_j,Y_{-j}) &\propto p(Y_j|\theta,Y_{-j})\overbrace{p(\theta|Y_{-j})}^\text{A posterior} \\
&\propto p(Y_j|\theta,Y_{-j})p(Y_{-j}|\theta)p(\theta) \\
\text{If IID…} &\\
&\propto p(Y_\cdot|\theta)p(\theta)
\end{align}

Assuming your data and other data are IID, you can either use a posterior as your prior, or combine all the data.
The result is the same. The joint distribution decomposes into either one.

Consequently, using all available data in a single model is no better than using posteriors derived from all previous data to inform the current data.

Bayesian updating vs fixed-effect meta-analysis

A fixed-effect meta-analysis is defined as follows.
Some parameter, $\theta$, exists that is estimated by several $\hat\theta_i$. A variance estimate, $\sigma^2_{\theta_i}$ is computed for each $\hat\theta_i$ estimate.
It follows then, that:
$$
p(\theta | \hat\theta_\cdot,\sigma_{\theta_\cdot}) \propto p(\hat\theta_\cdot | \theta, \sigma_{\theta_\cdot})p(\theta)
$$

Decomposing this further,

\begin{align}
p(\theta | \hat\theta_\cdot,\sigma_{\theta_\cdot}) &\propto p(\hat\theta_\cdot | \theta, \sigma_{\theta_\cdot})p(\theta) \\
&\propto p(\hat\theta_j | \theta, \sigma_{\hat\theta_j})p(\hat\theta_{-j}|\theta,\sigma_{\hat\theta_{-j}})p(\theta) \\
&\propto p(\hat\theta_j | \theta, \sigma_{\hat\theta_j})p(\theta | \hat\theta_{-j},\sigma_{\hat\theta_{-j}})
\end{align}

If $\hat\theta_j, \sigma_{\hat\theta_j}$ are sufficient statistics for $Y_j$, then
$$
p(\hat\theta_j | \theta, \sigma_{\hat\theta_j})p(\theta | \hat\theta_{-j},\sigma_{\hat\theta_{-j}}) = p(Y_j|\theta)p(\theta|Y_{-j})
$$

Assuming each study’s $\hat\theta_j, \sigma_{\hat\theta_j}$ calculations are sufficient statistics and are independent of one another, the fixed-effect meta-analysis will provide the same information as posterior passing or full-data analysis.

Summary

In essence, whether you want to utilize posterior passing, incorporate all data into a single model, or use a fixed-effect MA, you will produce the same information in the posterior.

I think, anyway. Feel free to critique the above.

It is inconsequential, based on the above, to use one version or another. They will all suffer the same consequences with the same problems, and have the same troubling assumption that all included data are presumed generated from the same single parameter.

Practically speaking, I believe the fixed-effect MA would be the simplest to utilize of those three. Regardless, I think the assumption of one-true-effect for multiple studies is seriously improbable in practice, and would recommend a random effects meta-analysis.

Leave a Reply