While grading tests today for our undergraduate advance stats course, I had a funny (I think, anyway) thought about the inconsistency of model simplifications.

The course is focused on linear models. It’s “advanced” statistics for this reason (advanced is… relative). So they learn to do independent samples t-tests, dependent samples-t tests, ANOVA all within a linear model framework.

Then of course, they learn about [normal-assumptive] multiple regression, and we teach them to think about statistics themselves as being a super simple linear model of sorts ($\bar x_{0,s,p} = \mu_{0,p} + \epsilon_{0,s,p}$).

Anyway, on the test, they have to interpret a paired-samples “t-test” (a linear model on paired score differences). I had a realization about something, that I think highlights why I don’t particularly care about model simplification, and some inconsistent practices we seem to engage in.

When the analyst *knows* that scores are paired, and they want to conduct a t-test, they wind up using a dependent samples t-test which explicitly includes a covariance in the computation of the standard error (It’s twice subtracted out).

However, I don’t know if anyone actually bothers to see whether the paired-sample correlation is “significant” before doing so.

They simply acknowledge that a dependency in scores exists due to the design and units collected. It’s a design feature, and it’s a data-generative feature that is assumed to exist.

Basically, the researcher knows that some dependency probably exists, and then they use the “appropriate” model or test for such a dependency, all most likely without actually assessing whether that assumed dependency is “significant” and should be included.

I say that to contrast what we tend to do with other models.

I was helping a labmate with some multilevel modeling problems shortly before grading the tests, and this is why these ideas clicked together.

A researcher collects a sample which inevitably consists of both known and unknown subpopulations and clusters for which dependencies *may* exist — If you know someone is of a certain demographic, then there’s a *chance* that their score may correlate with someone of the same demographic, and hence there is a dependency.

But what do we do?

Some MLM books and programs by default recommend or compute for you the significance of the random effects or random correlations.

These tests basically include likelihood ratio tests (likelihood with correlation over likelihood without correlation), Wald tests (Horrible for variance params, so not gonna discuss that), and variance-testing (Chi-2 test for whether the random variance is greater than the [estimated] expected sampling error variance).

A researcher may determine whether to include the random variance or not, depending on the significance, *despite* knowing that these clusters do exist and *could* have dependencies.

The funny realization is that – In these harder models where more thoughtwork is needed, researchers are told to check for these dependencies via some test or model comparison, and only if deemed necessary include it in the model.

But I hazard a guess – These people probably don’t do the same with paired-samples t-tests — They probably don’t test to see whether the paired-samples correlation is significant before using a dependent samples t-test.

Why? Because in the paired-samples t-test, you acknowledge the *design* and *structure* that presumably does exist in the data, and doing so improves your inferences. Yet we don’t do the same for linear models!

This dovetails with other recommendations for why random effects models should really be the default. We know there is most likely a dependency among scores to *some degree*; we don’t really need to test for that before fitting the model. If the researcher knows that a dependency may exist, just use MLM, period, just like if you have two paired data vectors you assume are dependent you would by default use a paired-samples t-test.

Basically what I am saying is – Be consistent.

If you would consider the design and possible dependencies, and in doing so be ok with using a paired-samples t-test without actually assessing the significance of the paired correlation, you should do the exact same thing with linear models.

If you suspect a dependency due to pairings and clusters, use a MLM instead of an LM, just as you would choose a paired-samples t-test instead of the independent samples t-test.

I am *not* suggesting that you actually check for significance before using a dependent samples t-test or MLM — On the contrary, your model should express what you know and can feasibly model. If subpopulations exist that could *feasibly* be dependent, just use a MLM; easy; no checking necessary. Best case scenario, your inference is improved; worst case scenario, it’ll basically collapse back into an LM anyway, just as a dependent t-test with zero covariance will wind up being an independent samples t-test.

This inconsistency has bothered me for years. More generally using any preliminary test of significance to decide whether to use a significance test (or which test to use) is likely to mess up the Type I error and power of the test you are interested in.

Yep, that’s certainly an angle to be considered. It’s a garden of forking paths problem that isn’t accounted for in the p-value typically used! The distribution of t/p values you would obtain if you just used dependent t-tests is different than the distribution of t/p values you would obtain if you did a paired-correlation test before proceeding to an independent or dependent test in a manner dependent on the paired-correlation test. Which means the p-value for the dependent t-test or independent t-test is not strictly valid, unless you yourself generated such a distribution and compared you observed t to the resulting implied sampling distribution.

I only wish we encouraged psychologists in particular to think

generativelyabout their models and assumptions. Even in simple situations like this.