I’m working on [yet another] Bayesian CFA model. I’ve grown to really like using Stan for latent variable models like CFA, SEM, mixtures, etc.

The problem is that the docs are rather sparse about how to do this, or even where to begin.

This post won’t get you to the point of writing the Stan code, but I can at least give the basic gist while I wait for Stan to finish.

Most CFA packages operate on covariance matrices. The “aim” is to specify a set of linear equations such that the *implied* covariance matrix adequately matches the observed covariance matrix.

This has a host of “problems” that people have been solving for years.

Some examples:

- Latent Interactions are hard
- Distributional assumptions are hard to include
- Assume normality, then tweak every fit statistic to be robust against the inherent non-normality.
- The observed covariance matrix is often treated as a “known parameter” in the minimization formula (e.g., this manifests in treating computing the density of the implied covariance matrix given the observed covariance matrix)

Although you can still take a covariance-comparison approach in Bayes, there isn’t really a need to if you have access to raw data.

Let’s say you have one factor and ten indicators.

For the sake of simplicity, we’ll include a normal assumption on the indicators.

Also for the sake of simplicity, we will standardize the data and assume a standardized latent variable (mean of zero, sd of 1).

The basic assumptions are as follows:

\begin{align}

\theta_i &\sim N(0,1) \\

y_{ij} &\sim N(\theta_i \lambda_j, \sigma_j) \\

\lambda_j &\sim N(0,1)^+ \\

\sigma_j &\sim Cauchy(0,1)^+

\end{align}

where $\theta_i$ is the latent score for person $i$, $y_ij$ is the observation of item $j$ for person $i$, $\lambda_j$ is a factor loading, and $\sigma_j$ is the residual standard deviation.

It’s really about that simple. Note that this method gives you the posterior distributions (and expected values) for factor scores *for free*.

That also means that in structural models, or moderated CFAs, you can very easily include interactions, as $\theta_i\times x_i$ or $\theta_i\times \gamma_i$ (where $\gamma_i$ is some other latent variable).

The nice thing about this is that it’s flexible. Want differing assumptions about the item likelihoods? Go for it; don’t use a normal likelihood for each item. Use ordered logit, ordered probit, skew-normal, skew-T, T, or whatever the heart desires.

Want to model residual covariance? Go for it. Want to allow everything to possibly residually covary using a soft constraint? Sure! Use a multivariate normal for all manifest variable likelihoods, and put a strong LKJ prior on $\Sigma_r$ to have a soft-constraint.

Do you want non-normal latent score distributions? Sure! Don’t assume latent scores are distributed normally; just use a different prior for $\theta_i$ and estimate any other parameters you need to.

This was a really simple model, but it’s very easy to code into Stan. Extensions are just about as easy.

If you want multiple factors to covary, use a MVN prior with an unknown correlation matrix.

If you want multiple factors to potentially load onto multiple indicators beyond those hypothesized, just use priors to place soft-constraints. This is equivalent to exploratory CFA/SEM modeling. You can have all factors load onto all items, just with non-hypothesized paths having peaked zero-priors for model identification.

Of course, the remaining problem is — How do you assess model fit?

Well, two ways come to mind.

First, you could actually estimate the observed covariance matrix for all observed variables and the model-implied covariance matrix within the same model, then compute the standard fit indices based on those matrices.

This would provide not only the fit indices of interest, but uncertainty about those fit indices.

This sounds like a terrible pain though.

Second, you can ignore model fit indices (e.g., RMSEA, SRMR, CFI) and just focus on predictive adequacy. This comes in multiple flavors.

You can obtain the log-likelihood of each observation within each manifest variable, and examine for which people the model is inadequate for which items using point-wise LOOIC.

This is a killer diagnostic tool.

You can also obtain the log-likelihood of each observation across all items jointly, and compare models’ person-wise LOOIC values. This would give you a metric akin to the AIC, but in the Bayesian world. It’s essentially the approximate joint leave-one-out error of the model. The best model is the one with the lowest LOOIC.

Another nice thing about the elpd (which is computed in the LOOIC), is that when used in model comparison, it’s akin to assessing the posterior predictive performance in a similar manner to an “informed” Bayes factor.

This isn’t the time to really get into what this means, but suffice it to say that IF you computed the posteriors of a model, then plugged those in as priors for a second equally-sized sample and computed the marginal likelihood, it would approximately equal what the elpd within the LOOIC attempts to estimate (I will take another post to explain why that is, but you can test that yourself if you want).