13 Mean Difference Inference

This section covers making inferences on the difference between two population means. In other words, we are looking to compare the means of two populations - often seeing if they are different from each other. Each section here covers mean diffrence inference under different conditions and assumptions. If you are looking to compare the means of more than two populations or groups, view the section on ANOVA.

The sections are:

  • Mean Diffrence Inference (n.u.f): normal, iid, univariate, frequentist, known variance
  • Mean Difference Inference (n.u.f.u): normal, iid, univariate, frequentist, unknown variance

13.1 Normal-Two-Population-Frequentist-Independent-Equal-Variance

This section covers making inferences on the difference between two population means. The following assumptions and conditions are used:

Assumptions:

  • Both populations are normally distributed
  • The variances of the two populations are EQUAL. This is also known as
  • Observations are iid
  • Population variance can be known or unknown (check if it can be finite or infinite)

Other Conditions, Criteria, or Attributes:

  • Univariate mean inference
  • Frequentist perspective

13.1.1 Overview and Summary

Key Takeaways:

  • With the shape of the population distribution unknown, we will always use the T-statistic and T-distribution to create the confidence interval. This is true regardless if we know the population variance or not.
  • When your sample size exceeds 30 or so (need citation), the results from doing a T-test will be very similar to the results from a Z-test. This is because the T-distribution converges to a Z-distribution (standard normal distribution) as your sample size increases.
  • The rule of thumb is 30, but in practice it will depend on the skewness of the population distribution. The more skewed your population is, there more samples we need. (need citation) make sure not confuse with CLT. With sample sizes much larger than 30, we could use the Z-statistic for hypothesis testing and confidence interval building but it’s usually just easier to always use the T-statistic.

Overview and Summary:

Here we discuss how to perform univariate mean inference when the population can be any shaped distribution (with finite mean) and our observations are iid. Inference is done from a frequentist’s perspective. This is a common way to perform mean inference.

13.1.2 Point Estimation

Key Takeaways:

  • Under normal assumption, both maximum likelihood estimation and method of moments will result in the same estimate of the mean as the above formula.
  • The above formula will produce an unbiased estimate no matter what the population distribution is. Proof.

Point Estimation:

The following formula can be used to calculate an unbiased estimate of the population mean using sample data:

\(\bar{x} = \frac{\sum^n_{i=1}{x_{i}}}{n}\)

Where:

  • \(\bar{x}\) = unbiased estimate of population mean
  • \(n\) = sample size
  • \(x_{i}\) = your observations

The formula above not only produces an unbiased estimate under the assumptions on this page, but always produces an unbiased estimate of the population mean no matter the distribution (as long as the population mean exists).

13.1.3 Confidence Interval

With unknown population distribution, we construct a confidence interval using the T-distribution and T-table.

The formula for the T-statistic is:

\(T = \frac{\bar{x} - \mu}{\frac{S}{\sqrt{n}}}\)

Where:

  • \(\bar{x}\) = unbiased estimate of population mean
  • \(n\) = sample size
  • \(S\) = sample standard deviation using the following formula

13.1.4 Hypothesis Testing

With unknown population distribution, we can perform hypothesis testing using the T-distribution, T-table, and T-statistic (T-score).

13.1.5 Suggested Steps

Point Estimation:

  • Obtain sample data and ensure it is iid. Random samples are usually iid, by definition.
  • Calculate the sample mean using the following formula:

\(\bar{x} = \frac{\sum^n_{i=1}{x_{i}}}{n}\)

Confidence Interval:

  • Continuing from above, now calculate the confidence interval using: