I was helping a labmate with a power analysis problem. He’s planning a study and will use SEM. Unfortunately, the scale he’s using only reported Cronbach’s $\alpha = .80$ with 14 items.
This is irritating, to say the least. But nevertheless, there are two approaches to this.
One, you could simulate “true” scores that have some desired relationship with another variable, then simulate 14 realizations of those true scores using a standard error that equates to $\alpha$.
But I wanted to see whether we could construct a latent variable model that implies an $\alpha = .80$.
Reliability in SEM is a complicated thing, with many definitions. Some are interested in AVE (average variance explained). Then there’s $\omega_1, \omega_2, \omega_3$, the definitions of which can be found in the semTools::reliability
help page.
$\omega_1$ and $\omega_2$ are more similar to each other than either are to $\omega_3$, but generally the definition can be thought of as “IF there is a latent variable, and we see J realizations of that latent variable, what percentage of the variance in the sum-scores is attributable to the latent variable?”
This is an estimate of the “true” reliability, which can be thought of as the proportion of variance in $\hat\theta$ explained by $\theta$. Of course, in reality we don’t have $\theta$, so we settle for the proportion of variance in some observed metric that is explained by the latent estimate. With greater N, this estimates converges with the “true” reliability.
Side note: If you have high N but a crappy measure, your reliability does not improve, it just means your estimate of your crappy reliability is more accurate of the true crappiness.
Here’s the basic, probably imperfect, proof of reliability:
$$
\tilde x_i = \sum x_{ij}$$
$$x_{ij} = \hat x_{ij} + \epsilon_{ij}$$
$$x_{ij} = \lambda_j\theta_i + \epsilon_{ij}$$
$$\tilde x_i = \sum_j \lambda_j\theta_i + \sum_j\epsilon_{ij}$$
$$Var(\tilde x_i) = Var(\sum_j \lambda_j\theta_i + \sum_j\epsilon_{ij})$$
$$Var(\tilde x_i) = (\sum_j\lambda_j)^2Var(\theta_i) + \sum_j Var(\epsilon_{ij})$$
$$Reliability_{\omega_2} = \frac{(\sum_j\lambda_j)^2Var(\theta_i)}{(\sum_j\lambda_j)^2 Var(\theta_i) + \sum_j Var(\epsilon_{ij})}$$
If variance is set to 1, and outcomes standardized: $$\omega_2 = \frac{(\sum_j\lambda_j)^2}{(\sum_j\lambda_j)^2 + \sum_j(1 – \delta_j)}$$
Well, what we have is $\alpha$, not $\omega_2$. However, $\alpha$ can be approximated in SEM assuming that the residual variance and loadings are equal across items.
This further simplifies things.
$$\alpha = \frac{(J\lambda)^2}{(J\lambda)^2 + J\delta} \
= \frac{(J\lambda)^2}{(J\lambda)^2 + J(1 – \lambda^2)}$$
Solving for $\lambda$ gives you:
$$\lambda = \sqrt{\frac{\alpha}{(J – \alpha(J-1))}}$$
This represents the $\lambda$ needed to obtain $\alpha$ with $J$ standardized items, assuming a standardized latent variable.
We tested it to be sure; we simulated data using lavaan in R, and sure enough, running psych::alpha(x)
gave $\alpha \approx .80$.
This is not to say that when scales have $\alpha=.80$ that the factor loadings are indeed equal to the above (they won’t be), but if you only have reliability and you want to simulate a measurement model that would suggest such reliability, then there you go.
This is also not for doing power analysis for the factor loadings themselves, of course.
But if you wanted to do a power analysis for a correlation or structural coefficient with one or more variables being latents + measurement error, this approach may work when information is lacking.