Probably p-value and hypothesis testing are some of the concepts that everyone has some difficulty understanding at first, considering how schools and statistics classes usually teach, how to calculate p-value.

When someone is first introduced to hypothesis testing and p-values they always fail to understand why they are doing what they’re doing, and resort to either memorising formulas, searching online for p-value calculators or cram the golden rule:

  • p < 0.05 implies it is significant
  • p > 0.05 implies it is non significant.

But today we are here with an intuitive way of understanding what a p-value means, how do we calculate p-value and why do we even need it, to clear part of the mental cloud that exists around it, in people’s mind.

Let us begin with a simple example to explain p-value.

p value classroom

Imagine you’re a teacher in a school and you have a class of 30 students. After giving a test to them, most of them fail miserably and the principal comes in and tells you to do something, or you’re fired!

With looming tensions and sleeplessness, you somehow devise a new method of teaching your students and try it out on them for the next one month, after which the students take the test again.

Fortunately, the students nail the exam this time and everyone is happy. You still have your job, but again in the middle of the night you start overthinking,

Did the students really perform well because of my new teaching method or did this happen by chance, and it had nothing to do with my efforts? It is very possible it might have happened by chance. (especially when large numbers are involved)

After thinking about it for long, you decide to get your hands dirty and employ some statistics.

  • Firstly, you decide that comparing average scores of the two conditions (old teaching method and new teaching method) is the way to go. Your observation (what has really happened) is the alternate hypothesis. In terms of formulas:
    • Mean score (new) > Mean score (old)
  • Next, you ask yourself that what happens if the scores for the two different cases follow the same distribution (a fancy term for continuous histogram). This is called your null hypothesis (another fancy term). Again in terms of formulas:
    • Mean score (new) = Mean score (old)
  • Now you devise a plan and say to yourself
    • “I will assume the null hypothesis to be true” (my hypothesis, my rules)
    • “Then I will find the probability that the alternative hypothesis (what has really occurred) will occur” (using more maths)
    • If that probability is really low, that means that the effect (the thing which has happened) may not have occurred by chance. Going by the terminology used, this is called as a significant effect
    • But if that probability is not low, you can reason that the effect has most likely occurred by chance
    • This threshold you’ll compare your probability to is often called alpha (or significance level) which is usually set at 0.05

Now a glaring question comes into view, “How do you get to this p-value from your data?”

By using something called a hypothesis test. There are probably 100s of hypothesis tests out there but the question comes down to which one is best suited for your data. The question of which test to choose is a very detailed discussion and we’ll save it for another post. (let us know in the comments, if that is something you want)

Test scores usually follow a normal distribution (note that this has been observed in populations) and the difference in the means of independent samples drawn, from two normal distribution follows a particular type of distribution called the t-distribution.

Without going into the details, two independent samples from two normal distributions can be compared using a t-test and that is what we will use. This part will involve a lot of calculations. Just remember that to get to p-values from data, we need some distributions.

In short, we assume that the difference in sample mean scores (of the two conditions) follow the t-distribution and apply the t-test to get the t-statistic that can be translated to a probability which is our p-value.

You do all this and see that p-value is indeed less than 0.05 and say to yourself, “The probability that students performed well as compared to the last time, by mere chance is very small at 0.05 significance level” and feeling proud of your achievement, you let the sleep come to you. 😇


Also published on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *