This semester I was permitted to give one brief lecture on maximum likelihood in the advanced statistics undergraduate course.
The course focuses primarily on the linear model, so we talk about OLS linear models, and variance-based approaches to model selection.
I have free reign over three lectures each semester, and my final lecture was on maximum likelihood approaches, and how we can use them.
The knitted lecture notes are embedded below. Note: I know some content is incorrect; I throw the term “likelihood” around pretty freely and incorrectly, in order to focus on the intuition rather than the correctness of that term. So before anyone comments on how I used the term likelihood incorrectly — I know I did. It was intentional, and with only one lecture to spend, it wasn’t particularly worth it to me to go into what precisely likelihood means, or why the $p(Y|\mu,\sigma)$ is not Fisher-friendly (because it treats parameters as random values to be conditioned on, even if mathematically they are the same).
Anyway, these notes got good reviews from undergraduates who had previously never seen or heard of maximum likelihood; even a couple of graduate students thought the visualizations were helpful to them.
The undergraduates were also given a fairly basic homework assignment wherein they fit logistic regression models, computed and used LRT and AIC for model selection, and made predictions (e.g., if high in twitter addiction, the model would suggest a high probability of being a graduate student).