I’ve had a very clear plan for this blog for a long time— really, I have. For about three years, to be exact.
Three years ago, right before I left Chapel Hill for East Lansing my very dear friend Sam Hawke— who now holds his PhD in biostatistics from the University of North Carolina at Chapel Hill— told me his definition of the term “modeling” over a casual dinner. Sam defined modeling as the process of making a set of explicit assumptions about the data generating process in order to draw inferences from observed data.
In an instant, Sam’s definition of a seemingly simple term made years of grueling methods classes click. The many hours and tears spent trying (and often failing) to figure out proofs my first couple years of graduate school suddenly felt like time well spent. After all, those proofs were showing us how we could exploit certain assumptions about our data to draw trustworthy conclusions with our models— and also what breaks when those assumptions didn’t hold.
Sam’s definition also made me realize something else important: if models were really about assumptions regarding data-generating processes, then as someone with a knack for programming I could simulate the approximate data generating process of my own studies. I could examine what happens to my model parameters when those assumptions are violated to varying degrees. I also realized I could take a similar approach to teach myself more advanced techniques, especially for more complex study designs, that are often more likely to violate the assumptions of simpler models. For instance, I could simulate what happens to coefficient estimates and standard errors when the i.i.d. assumption is violated in linear regression— and how multilevel modeling addresses the problem.
I’m not going to pretend that I take the simulation approach all the time— or even most of the time. After all, I’m a busy graduate student, with way too much to do and not enough hours in the day. It would also be dishonest to say the approach is not time consuming, or for some problems well beyond my capabilities. But this year, I’ve been lucky enough to work with a stellar team of undergraduates in my awesome lab, and some really bright junior graduate students in my program. They’re all capable of learning the more advanced techniques we use in our studies, but they come from wildly different math and stats backgrounds. Some are (or were) humanities or arts majors, others come from STEM fields.
The way in which I write these simulations, I think, allows people with very different backgrounds to understand the consequences for our science when the assumptions about our data generating process made by our models don’t hold— and how alternative techniques can solve some problems that emerge. So, really, I’m finally getting around to starting this blog for them.
I also intend to use this blog to clear up common (but often very understandable!) areas of confusion— like the difference between correlation and regression in a tangible and intuitive way.
Now about the blog name. I went with “Non-Technically Speaking” for several reasons. First, I don’t take myself too seriously— on social media, and in every day life, I’m almost never “technically speaking.” But drawing on another meaning of the phrase, I want the blog to be approachable, non-intimidating, and beginner friendly. I’ll use equations only when necessary and rely mostly on simulation and visualizations to communicate key points. Finally, if you are someone who tends to be highly technical, this blog is probably not for you— and it may even drive you a bit crazy.
I don’t plan to go too into the weeds on the topics I cover here. My goal is to simply provide an accessible overview of:
1) the substantive problem a method solves
2) the implications of that problem with traditional approaches
3) how a method solves the problem, and
4) how to interpret results.
Anyway, happy reading! If all goes according to plan, I hope to cover about a topic a month through the summer.