Post-test power analysis
As regular readers here know, we’re passionate about testing at Loyalty Builders. It’s the quickest way to improve your marketing campaigns.
We recently got to thinking and talking about power calculations made after a test comparing campaigns A and B was completed (remember: power is the ability of a test to detect an absolute difference). Suppose also that the sample response to B is somewhat better than the sample response to A. If the power is high, say the 90% or 95% we regularly urge our customers to use, everyone has confidence in the results of the experiment. Even if the power is merely very good, say 80%, the marketer has learned that his new treatment, used in campaign B, has only a 20% chance that there is actually a real differnece that is not being detected. B is actually that much better than A and there is little value to working to improve B.
But suppose that the calculated power is low, say 40%. Instead of throwing out the experiment as telling us nothing, we realized that there is a 60% chance B could actually yield better results than A. This is a great, counter-intuitive point: if your test concludes no difference, but the power is low, there is actually a fair chance that there is an actual difference that you’re missing. You should probably work to improve B or to conduct another, better test.
This conclusion wiped a lot of statistics snobbery off our figurative face. No matter how small your sample size, there may be useful information to be gathered from your test. Sometimes what looks bad is more promising than we thought. Sort of forced us to eat our humble pie [chart].
[My thanks to Dave Meeker and Bill Vorias, two of our math experts who did the work. I was the one who turned his nose up at the low power.]