Chapter 11: Fundamentals of Hypothesis Testing - Statistics for LIS with Open Source R

Hypothesis testing refers to the process of choosing between two hypothesis statements about a probability distribution based on observed data from the distribution. Hypothesis testing is a step-by-step methodology that allows you to make inferences about a population parameter by analyzing differences between the results observed (the sample statistic) and the results that can be expected if some underlying hypothesis is actually true.

The methodology behind hypothesis testing:
1. State the null hypothesis.
2. Select the distribution to use.
3. Determine the rejection and non-rejection regions.
4. Calculate the value of the test statistic.
5. Make a decision.

Step 1. State the null hypothesis
In this step, you set up two statements to determine the validity of a statistical claim: a null hypothesis and an alternative hypothesis.

The null hypothesis is a statement containing a null, or zero, difference. It is the null hypothesis that undergoes the testing procedure, whether it is the original claim or not. The notation for the null hypothesis H₀ represents the status quo or what is assumed to be true. It always contains the equal sign.

The alternative statement must be true if the null hypothesis is false. An alternative hypothesis is represented as H₁. It Is the opposite of the null and is what you wish to support. It also never contains the equal sign.

Step 2. Select the distribution to use
You can select a sample or the entire population. In selecting the distribution, we must know the mean for the population or the sample.

2.1 Population mean
If you know the standard deviation for a population, then you can calculate a confidence interval (CI) for the mean, or average, of that population. You estimate the population mean, μ by using a sample mean, plus or minus a margin of error. The result is called a confidence interval for the population mean,

2.2 Confidence Intervals for Unknown Mean and Known Standard Deviation
For a population with unknown mean μ and known standard deviation α, a confidence interval for the population mean, based on a simple random sample of size n, is + z^*, where z^* is the upper (1-C)/2 critical value for the standard normal distribution.

To calculate the standard deviation stands for σ is replaced by the estimated standard deviation s, also known as the standard error. Since the standard error is an estimate for the true value of the standard deviation, the distribution of the sample mean is no longer normal with mean μ and standard deviation . Instead, the sample mean follows the t distribution with mean μ and standard deviation . The t distribution is also described by its degrees of freedom.

Step 3. Determine the rejection and non-rejection regions
In this step we calculate the significance level . The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

Step 4. Determine the value of the test statistics
The values of the test statistic separate the rejection and non-rejection regions.
Rejection region: the set of values for the test statistic that leads to rejection of H₀.
Non-rejection region: the set of values not in the rejection region that leads to non-rejection of H₀.

The P-Value: Another quantitative measure for reporting the result of a test of hypothesis is the p-value. It is also called the probability of chance in order to test. The lower the p-value the greater likelihood of obtaining the same result. And as a result, a low p-value is a good indication that the results are not due to random chance alone. P-value = the probability of obtaining a test statistic equal to or more extreme value than the observed value of H₀. As a result H₀ will be true.
We then compare the p-value with α:
1. If p-value < α, reject H₀.
2. If p-value >= α, do not reject H₀.
3. “If p-value is low, then H₀ must go.”

As mentioned in Chapter 8, the logic of hypothesis testing is to reject the null hypothesis if the sample data are not consistent with the null hypothesis. Thus, one rejects the null hypothesis if the observed test statistic is more extreme in the direction of the alternative hypothesis than one can tolerate.

Step 5. Make a decision
Based on the result, you can determine if your study accepts or rejects the null hypothesis. However, when the results of a hypothesis test are reported in academic journal, it is common to find that the author provides only the test statistic and its p-value result in the conclusions drawn from the data.

Next, Chapter 12, Correlation and Regression
Previous, Chapter 10, Confidence Interval Estimation

A Primer for Using Open Source R Software for Accessibility and Visualization