Hypothesis testing Explained

What is Hypothesis testing?

Ans: The process of using probability and statistics to set up an experimental situation and decide whether or not to reject the “status quo” hypothesis based on sample data is called hypothesis testing.

Before getting into details of how Hypothesis testing works , let us get ourselfs familiar with some terminology related to Hypothesis testing

Null Hypothesis [$H_0$] : It is the “status quo” or “prior belief” .It assumes that the observation is due to a chance factor. The null hypothesis is assumed to be true unless proven otherwise.
Alternative Hypothesis [$H_1$] :Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect.We reject the null hypothesis in favor of the alternative hypothesis only if there is convincing statistical evidence against $H_0$. The alternative hypothesis is sometimes referred to as the research hypothesis.

Example: Suppose we wanted to determine whether a coin was fair and balanced(unbiased). A Null hypothesis states that half the flips would result in Heads and half in Tails.This can mathematically be written as follows

$H_o: P(Head) = 0.5$

Now the coin is tossed let’s say 10 times and 7 Heads and 3 Tails are observed.Now Alternative hypothesis states that the coin is a biased one as we did’nt observe equal number of Heads and Tails in our experiment.

$H_1: P(Head){\neq} 0.5$

Now let’s go back and visit our definition of $H_{0}$ -"It assumes that the observation is due to a chance factor".A chance factor is an influence that contributes randomly to each observation, and is unpredictable.

In simple sense, Null hypothesis argues that prior belief ( P(Head)= 0.5 in this case) is true and the observations from experiment is due to some randomness and hence can be ignored.

The definition of $H_1$ says that -"Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect".

Therefore alternative hypothesis argues that observations of the experiment are biased because the coin itself is a biased-coin and not due to some chance factor.

Now that we are done with Null hypotheis and Alternate Hypothesis , let’s move to the next terms

The two types of hypothesis tests, based on the alternative hypothesis $ H_1$, are:

Two-sided, or two-tailed, tests: When you want to detect a difference on either side of the mean, the test is said to be two-tailed and takes the form $ H_1: μ{\neq} value$. The two-sided test for the above example can be given as follows, $$ H_1: P(Head){\neq} 0.5.$$ Graphically a two tailed test can be represented as
One-sided, or one-tailed, tests: When you want to detect a difference on only one side of the mean, the test is said to be one-sided and takes the form $ H_1: μ < value$ or $ H_1: μ > value $. One sided test for the above problem can be given as $$ H_1: P(Head) > 0.5$$ $$ H_1: P(Head) < 0.5.$$

Note: Since we are showing the plots of normal distribution, it didn’t mean that hypothesis test only applicable for only normal distribution.

When an Hypothesis test is performed, we either have to reject Null hypothesis or fail to reject it. The possible errors that may occur are

Type-I error:
- A Type I error occurs when the researcher rejects a null hypothesis when it is true.
- The probability of committing a Type I error is called the significance level.
- This probability is also called alpha, and is often denoted by α.
Type one error can be interpreted as-$$ α=P(Type\, I\, error)=P(Reject\, H_0 \,when\, H_0 \,is \,true)$$
Type -II error:
- A Type II error occurs when the researcher fails to reject a null hypothesis that is false.
- The probability of committing a Type II error is called Beta, and is often denoted by β.
- The probability of not committing a Type II error is called the Power of the test.
Type two error can be interpreted as - $$ β=P(Type \,II \,error)=P(fail\, to \,reject\, H_0 \,when \,H_0 \,is \,actually\, false)$$

Note: We would like the probability of committing either one of these errors to be as small as possible. Unfortunately, decreasing the probability of committing one type of error only increases the probability of committing the other type of error. So our main focus of interest would be Type I error, i.e 𝛼

Significance level $\alpha$

Graphically can we explain what Significance level means?

The significance level determines how far our from the null hypothesis value we’ll draw that line on the graph. To draw a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.

The shaded region is also called as Critical region and the if the sample mean falls into that region, we reject the Null Hypothesis $H_0$ .

What does a significance level of α = 0.05 mean?

It means that if $H_0$ is actually true and the hypothesis test is repeated on different random samples of data from the same population, then we would expect $H_0$ to be incorrectly rejected 5% of the time.

$pValue$

P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. $pValue=P(Occuring\ of observation | H_0\ is\ assumed\ to\ be\ true)$

We fail to reject the null hypothesis $H_0$ if $p{Value> \alpha }$ and reject if $p{Value< \alpha }$.

<font color='orange' size=5>The Misunderstood p Value </font>

The p value is one of the most misunderstood quantities in psychological research. Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population.
But this is incorrect. The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.
<font size=2>Credit: https://opentextbc.ca/researchmethods/chapter/understanding-null-hypothesis-testing/</font>

(z-Score)

A z-score (aka, a standard score) indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula. $$z =\frac{(\overline{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}$$

Here is how to interpret z-scores.

A z-score less than 0 represents an element less than the mean.
A z-score greater than 0 represents an element greater than the mean.
A z-score equal to 0 represents an element equal to the mean.
A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.
A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.

Enough of theory , now let’s jump into Hypothesis implementation with example

Example 1

<font size=2>credit: https://xkcd.com/882/</font>

Example 2

A survey shows that the average black friday sales of male is much higher ($500) when compared to that of female. A company which is planning for it's black friday sales want to know if this is true and hence wanted to take data from samples of different sizes such as 100,500,1000 from the population and note their black friday spending details.The company wants to know if there is really any difference in spending or it is just by chance. Can you help the company come to a conclusion on this with the help of data provided about different samples?

Stating Null Hypothesis and Alternate Hypothesis
- Null Hypothesis $H_0$:The average spending of male and female is same i.e, $\mu_m= \mu_f$
- Alternative Hypothesis $H_a$: The average spending of male is greater than that of female, i.e, $\mu_m > \mu_f $
Choosing significance level
- As it was not mentioned in the problem we are taking the standard significance level $\alpha=0.15$
Setting up Test Statistic
- How do we decide whether or not to reject the null hypothesis H0 ?
  a. we start by determining a test statistic with our sample data
- What is test statistic?
  a. It is the evidence that we look for, to prove our null hypothesis
  b. The most natural choice for a test statistic of the difference in population mean is the difference in sample mean $\mu_m-\mu_f$.
Calculating the P-value using Permutation test
Refer- https://www.appliedaicourse.com/course/applied-ai-course-online/lessons/resampling-and-permutation-test-3/
- Q: Assuming that $H_0$ is true, what’s the probability of obtaining $\mu_m - \mu_f $ = $diff_{100}$ for the random sample of 100 data points each?

def calculate_p_value(sample1, sample2, alpha):
    #Step 1- calculate the difference between samples

    difference_between_sample_means = mean(sample1)-mean(sample2)

    #Step 2- Permuatation test

    #Step 2.1 Merge the sameples
    difference=[]
    total_sample = list(sample1)
    total_sample.extend(sample2)
    total_sample = np.array(total_sample)

    #Step 2.2 Sampling the data for 1000 times
    for i in range(0,1000):
        #Step 2.3 Picking 100 random numbers
        samples = random.sample(range(0, len(total_sample)), 100)

        #Step 2.4 First 50 random numbers are taken as set 1
        set1 = total_sample[samples[:50]].mean()

        #Step 2.5 Next 50 random numbers are taken as set 2
        set2 = total_sample[samples[50:]].mean()

        #Step 2.6 Taking the differnce between the two sets
        difference.append(set1 - set2)

    #Step3- Sort and count the number of values greater than the threshold

    difference.sort()
    count = sum(((i > diff) and (i>0)) for i in difference)
    pValue = count/len(difference)
    print("% of values > than the difference",diff," =",pValue*100,"%")
    print("The pValue=",pValue,"and P(Reject H0 when H0 is true)=",alpha)
    if pValue>alpha:
        print("We fail to reject the null hypothesis")
    else:
        print("We can reject the null hypothesis")

  print('_'*50)
  return difference
  ========================
  For Sample Size:  200
  The average spendings 100 male = 9881.62
  The average spendings 100 female= 7703.87
  The difference between mean of male and female spendings = 2177.750
  Percentage of values greater than the difference 2177.750  = 0.8 %
  The pValue =  0.008 and the P(Reject H0 when H0 is true)= 0.15
  We can reject the null hypothesis

in the above plot, when we take the sample size 100 we are getting pValue = 0.008, for sample size 500 we are getting pValue = 0.161 and for the sample size 1000 we are having pValue = 0.289

If we reject the null hypothesis, we do not prove the alternative hypothesis is true. We merely state there is sufficient evidence to reject the null hypothesis. If we fail to reject the null hypothesis, we do not prove the null hypothesis is true. We merely state there is not sufficient evidence to reject the null hypothesis. Unfortunately, whatever the decision, there is always a chance we made an error!

« 1 Central Limit Theorem 3 Backpropagation »

Overblown Concepts of ML

Hypothesis testing Explained

Significance level \(\alpha\)

\(pValue\)

We fail to reject the null hypothesis \(H_0\) if \(p{Value> \alpha }\) and reject if \(p{Value< \alpha }\).

(z-Score)

Example 1

Example 2

Overblown Concepts of ML

Hypothesis testing Explained

Significance level \(\alpha\)

\(pValue\)

We fail to reject the null hypothesis \(H_0\) if \(p{Value> \alpha }\) and reject if \(p{Value< \alpha }\).

__ (z-Score)__

Example 1

Example 2

(z-Score)