Call Us: US - +1 845 478 5244 | UK - +44 20 7193 7850 | AUS - +61 2 8005 4826

Experimental and observational studies

The p-value’s on the periodic table: it’s the element of surprise.
The p-value says, “If I’m living in a world where I should be taking that default action, how unsurprising is my evidence?” The lower the p-value, the more the data are yelling, “Whoa, that’s surprising, maybe you should change your mind!”

To perform the test, compare that p-value with a threshold called the significance level. This is a knob you use to control how much risk you want to tolerate. It’s your maximum probability of stupidly leaving your cozy comfy default action. If you set the significance level to 0, that means you refuse to make the mistake of leaving your default incorrectly. Pens down! Don’t analyze any data, just take your default action. (But that means you might end up stupidly NOT leaving a bad default action.)

How to use p-values to get the outcome of your hypothesis test. (No one will suspect that my xkcd is a knockoff.)
A confidence interval is simply a way to report your hypothesis test results. To use it, check whether it overlaps with your null hypothesis. If it does overlap, learn nothing. If it doesn’t, change your mind.

Only change your mind if the confidence interval doesn’t overlap with your null hypothesis.
While a confidence interval’s technical meaning is little bit weird (I’ll tell you all about it in a future post, it’s definitely not simple like the credible interval we met earlier, and wishing does not make it so), it also has two useful properties which analysts find helpful in describing their data: (1) the best guess is always in there and (2) it’s narrower when there’s more data. Beware that both it and the p-value weren’t designed to be nice to talk about, so don’t expect pithy definitions. They’re just ways to summarize test results. (If you took a class and found the definitions impossible to remember, that’s why. On behalf of statistics: it’s not you, it’s me.)

What’s the point? If you do your testing the way I just described, the math guarantees that your risk of making a mistake is capped at the significance level you chose (which is why it’s important that you, ahem, choose it… the math is there to guarantee you the risk settings you picked, which is kind of pointless if you don’t bother to pick ‘em).