Unit VIII Small Sample Test

By Notes Vandar

Small Sample Test

  •   t-test: difference between two means of small samples with unknown common variance.
  • Chi-square test: significance of independence
  •   r-test: Significance of test for correlation coefficient.

 

  •   t-test: difference between two means of small samples with unknown common variance.

he t-test for the difference between two means of small samples with unknown common variance is used when comparing the means of two independent samples, especially when:

  • Sample sizes are small (n1,n2<30n_1, n_2 < 30),
  • The population variances are unknown, and
  • We assume the two populations have the same or common variance.

This is known as the two-sample t-test with pooled variance.

Assumptions:

  1. Both samples are random and independent of each other.
  2. The populations from which the samples are drawn follow a normal distribution.
  3. The two populations have equal variances.

Test Statistic:

When we assume the variances of the two populations are equal, we use a pooled estimate of the variance to calculate the test statistic. The test statistic for the t-test is given by:

t=(Xˉ1−Xˉ2)−(μ1−μ2)sp⋅1n1+1n2t = \frac{(\bar{X}_1 – \bar{X}_2) – (\mu_1 – \mu_2)}{s_p \cdot \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}

Where:

  • Xˉ1\bar{X}_1 and Xˉ2\bar{X}_2 are the sample means,
  • n1n_1 and n2n_2 are the sample sizes,
  • μ1−μ2\mu_1 – \mu_2 is the hypothesized difference between the two population means (typically 0 when testing for no difference),
  • sps_p is the pooled standard deviation, calculated as:

sp=(n1−1)s12+(n2−1)s22n1+n2−2s_p = \sqrt{\frac{(n_1 – 1) s_1^2 + (n_2 – 1) s_2^2}{n_1 + n_2 – 2}}

Where s12s_1^2 and s22s_2^2 are the sample variances.

Degrees of Freedom:

The degrees of freedom (df) for the t-test are calculated as:

df=n1+n2−2df = n_1 + n_2 – 2

Steps for Performing the t-Test:

  1. State the hypotheses:
    • Null hypothesis (H0H_0): μ1=μ2\mu_1 = \mu_2 (no difference in population means).
    • Alternative hypothesis (H1H_1): μ1≠μ2\mu_1 \neq \mu_2, μ1>μ2\mu_1 > \mu_2, or μ1<μ2\mu_1 < \mu_2 (depending on whether it’s a two-tailed, left-tailed, or right-tailed test).
  2. Calculate the test statistic (t): Use the formula above to compute the t-statistic.
  3. Determine the critical value from the t-distribution table based on the degrees of freedom and the chosen significance level (e.g., 0.05).
  4. Compare the calculated t-value with the critical value:
    • If ∣t∣|t| is greater than the critical value, reject the null hypothesis.
    • If ∣t∣|t| is less than the critical value, fail to reject the null hypothesis.
  5. Draw conclusions based on the comparison.

Example:

Suppose we have two small independent samples, and we want to test whether there is a significant difference in their population means. The data is:

  • Sample 1: Xˉ1=50\bar{X}_1 = 50, s1=5s_1 = 5, n1=10n_1 = 10
  • Sample 2: Xˉ2=45\bar{X}_2 = 45, s2=4s_2 = 4, n2=12n_2 = 12

We are testing at the 5% significance level to see if there is a difference in means.

Steps:

  1. Hypotheses:
    • H0H_0: μ1=μ2\mu_1 = \mu_2 (no difference in population means).
    • H1H_1: μ1≠μ2\mu_1 \neq \mu_2 (two-tailed test).
  2. Pooled standard deviation (s_p):

    sp=(10−1)⋅52+(12−1)⋅4210+12−2=9⋅25+11⋅1620=225+17620=40120≈20.05≈4.48s_p = \sqrt{\frac{(10 – 1) \cdot 5^2 + (12 – 1) \cdot 4^2}{10 + 12 – 2}} = \sqrt{\frac{9 \cdot 25 + 11 \cdot 16}{20}} = \sqrt{\frac{225 + 176}{20}} = \sqrt{\frac{401}{20}} \approx \sqrt{20.05} \approx 4.48

  3. Calculate the t-statistic:

    t=(50−45)4.48⋅110+112=54.48⋅0.1+0.0833=54.48⋅0.1833=54.48⋅0.4283=51.92≈2.60t = \frac{(50 – 45)}{4.48 \cdot \sqrt{\frac{1}{10} + \frac{1}{12}}} = \frac{5}{4.48 \cdot \sqrt{0.1 + 0.0833}} = \frac{5}{4.48 \cdot \sqrt{0.1833}} = \frac{5}{4.48 \cdot 0.4283} = \frac{5}{1.92} \approx 2.60

  4. Degrees of freedom:

    df=10+12−2=20df = 10 + 12 – 2 = 20

  5. Critical value: For a two-tailed test at the 5% significance level with 20 degrees of freedom, the critical value from the t-table is approximately 2.086.
  6. Compare: Since the calculated t=2.60t = 2.60 is greater than the critical value of 2.0862.086, we reject the null hypothesis.
  7. Conclusion: There is enough evidence to conclude that there is a significant difference between the two population means.

 

  • Chi-square test: significance of independence

The Chi-square test of independence is used to determine whether there is a significant association between two categorical variables. In other words, it helps assess whether the occurrence of one variable is independent of the occurrence of the other variable.

Key Concepts:

  • Null hypothesis (H0H_0): The two variables are independent (no association).
  • Alternative hypothesis (H1H_1): The two variables are not independent (there is an association).

Formula for the Chi-square statistic (χ2\chi^2):

χ2=∑(Oi−Ei)2Ei\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}

Where:

  • OiO_i is the observed frequency for the ii-th cell in a contingency table.
  • EiE_i is the expected frequency for the ii-th cell, calculated as: Ei=(Row total)×(Column total)Grand totalE_i = \frac{\text{(Row total)} \times \text{(Column total)}}{\text{Grand total}}

Steps for Performing the Chi-square Test of Independence:

  1. State the hypotheses:
    • H0H_0: The two variables are independent.
    • H1H_1: The two variables are not independent.
  2. Create a contingency table: This table shows the observed frequencies for the two categorical variables. The rows represent one variable, and the columns represent the other.
  3. Calculate the expected frequencies: For each cell in the contingency table, calculate the expected frequency based on the assumption of independence:

    Ei=(Row total)×(Column total)Grand totalE_i = \frac{\text{(Row total)} \times \text{(Column total)}}{\text{Grand total}}

  4. Compute the Chi-square statistic: Use the formula:

    χ2=∑(Oi−Ei)2Ei\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}Sum this value for all the cells in the table.

  5. Degrees of freedom (df): The degrees of freedom for a Chi-square test of independence is calculated as:

    df=(number of rows−1)×(number of columns−1)df = (\text{number of rows} – 1) \times (\text{number of columns} – 1)

  6. Determine the critical value from the Chi-square distribution table based on the degrees of freedom and the chosen significance level (e.g., 0.05).
  7. Compare the calculated χ2\chi^2 value with the critical value:
    • If the calculated χ2\chi^2 value is greater than the critical value, reject the null hypothesis.
    • If the calculated χ2\chi^2 value is less than the critical value, fail to reject the null hypothesis.
  8. Draw conclusions based on the comparison.

Example:

Suppose we want to test whether there is a significant association between gender (male/female) and preference for a particular type of beverage (coffee/tea). We collect data on 100 individuals and organize it into a contingency table.

Coffee Tea Row Total
Male 30 10 40
Female 20 40 60
Column Total 50 50 100

Step-by-Step:

  1. Hypotheses:
    • H0H_0: Gender and beverage preference are independent.
    • H1H_1: Gender and beverage preference are not independent.
  2. Expected Frequencies: For each cell, calculate the expected frequency:
    • Expected frequency for males preferring coffee: Emale, coffee=40×50100=20E_{\text{male, coffee}} = \frac{40 \times 50}{100} = 20
    • Expected frequency for males preferring tea: Emale, tea=40×50100=20E_{\text{male, tea}} = \frac{40 \times 50}{100} = 20
    • Expected frequency for females preferring coffee: Efemale, coffee=60×50100=30E_{\text{female, coffee}} = \frac{60 \times 50}{100} = 30
    • Expected frequency for females preferring tea: Efemale, tea=60×50100=30E_{\text{female, tea}} = \frac{60 \times 50}{100} = 30
  3. Observed and Expected Frequencies Table:
Coffee Tea Row Total
Male O = 30 O = 10 40
Female O = 20 O = 40 60
Expected E = 20 E = 20 E = 30
  1. Calculate χ2\chi^2: Using the formula χ2=∑(Oi−Ei)2Ei\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}, we calculate for each cell:
    • For males preferring coffee: (30−20)220=10220=5\frac{(30 – 20)^2}{20} = \frac{10^2}{20} = 5
    • For males preferring tea: (10−20)220=(−10)220=5\frac{(10 – 20)^2}{20} = \frac{(-10)^2}{20} = 5
    • For females preferring coffee: (20−30)230=(−10)230=10030≈3.33\frac{(20 – 30)^2}{30} = \frac{(-10)^2}{30} = \frac{100}{30} \approx 3.33
    • For females preferring tea: (40−30)230=10230=10030≈3.33\frac{(40 – 30)^2}{30} = \frac{10^2}{30} = \frac{100}{30} \approx 3.33

    Now, summing all these values:

    χ2=5+5+3.33+3.33=16.66\chi^2 = 5 + 5 + 3.33 + 3.33 = 16.66

  2. Degrees of freedom (df):

    df=(2−1)(2−1)=1df = (2 – 1)(2 – 1) = 1

  3. Critical value: For df=1df = 1 and a significance level of 0.05, the critical value from the Chi-square distribution table is approximately 3.841.
  4. Compare: Since the calculated χ2=16.66\chi^2 = 16.66 is greater than the critical value of 3.841, we reject the null hypothesis.
  5. Conclusion: There is a significant association between gender and beverage preference.

 

  •   r-test: Significance of test for correlation coefficient.

The r-test (or test for the significance of the correlation coefficient) is used to determine whether the correlation coefficient rr between two variables is significantly different from zero. In other words, it tests whether there is a statistically significant linear relationship between two variables.

Key Concepts:

  • Null hypothesis (H0H_0): There is no linear relationship between the two variables (ρ=0\rho = 0, where ρ\rho is the population correlation coefficient).
  • Alternative hypothesis (H1H_1): There is a linear relationship between the two variables (ρ≠0\rho \neq 0).

Test Statistic (t):

The test statistic for the r-test is calculated as:

t=rn−21−r2t = \frac{r \sqrt{n – 2}}{\sqrt{1 – r^2}}

Where:

  • rr is the sample correlation coefficient,
  • nn is the number of data points (sample size).

Degrees of Freedom:

The degrees of freedom (df) for this test is:

df=n−2df = n – 2

Steps for Performing the r-Test:

  1. State the hypotheses:
    • H0H_0: The population correlation coefficient ρ=0\rho = 0 (no linear relationship).
    • H1H_1: The population correlation coefficient ρ≠0\rho \neq 0 (there is a linear relationship).
  2. Calculate the correlation coefficient rr: You can use the Pearson correlation coefficient formula to find rr based on your data.
  3. Calculate the t-statistic: Use the formula:

    t=rn−21−r2t = \frac{r \sqrt{n – 2}}{\sqrt{1 – r^2}}Where rr is the sample correlation coefficient and nn is the sample size.

  4. Determine the critical value from the t-distribution table based on the degrees of freedom (df=n−2df = n – 2) and the chosen significance level (e.g., 0.05).
  5. Compare the calculated t-value with the critical value:
    • If ∣t∣|t| is greater than the critical value, reject the null hypothesis.
    • If ∣t∣|t| is less than the critical value, fail to reject the null hypothesis.
  6. Draw conclusions based on the comparison.

Example:

Suppose we want to test whether there is a significant correlation between hours studied and exam scores based on a sample of 10 students. The sample correlation coefficient r=0.65r = 0.65, and we want to test this at a 5% significance level.

Step-by-Step:

  1. Hypotheses:
    • H0H_0: ρ=0\rho = 0 (no linear relationship between hours studied and exam scores).
    • H1H_1: ρ≠0\rho \neq 0 (there is a linear relationship).
  2. Sample size and degrees of freedom:
    • Sample size n=10n = 10,
    • Degrees of freedom df=n−2=10−2=8df = n – 2 = 10 – 2 = 8.
  3. Calculate the t-statistic: Using the formula for the t-statistic:

    t=0.6510−21−0.652=0.6581−0.4225=0.65×2.8280.5775=1.83820.7601≈2.42t = \frac{0.65 \sqrt{10 – 2}}{\sqrt{1 – 0.65^2}} = \frac{0.65 \sqrt{8}}{\sqrt{1 – 0.4225}} = \frac{0.65 \times 2.828}{\sqrt{0.5775}} = \frac{1.8382}{0.7601} \approx 2.42

  4. Critical value: For df=8df = 8 and a 5% significance level in a two-tailed test, the critical value from the t-distribution table is approximately tcritical=2.306t_{critical} = 2.306.
  5. Compare: The calculated t=2.42t = 2.42 is greater than the critical value of 2.3062.306, so we reject the null hypothesis.
  6. Conclusion: There is a significant linear relationship between hours studied and exam scores at the 5% significance level.

Interpretation:

Rejecting the null hypothesis means that the correlation coefficient r=0.65r = 0.65 is significantly different from zero, suggesting that there is a statistically significant linear relationship between the two variables.

Important Questions
Comments
Discussion
0 Comments
  Loading . . .