Lab 3: Midwest College

Learning Objectives

Students will be able to:

  • Increase skill and comfort with basic data management and syntax in R.
  • Review and increase understanding of the basic methodology and syntax of the for loop
  • Simulating the sampling distributions for sample proportions and sample means.
  • Review the concepts of the sampling distribution and the normal approximation.
  • Practice constructing and interpreting confidence intervals.

Readings

To prepare for this lab assignment read the following lecture notes

  1. Sampling and Simulation
  2. Normal Distribution
  3. Introduction to Central Limit Theorem
  4. Confidence Intervals

Required Data and Packages

We will use the data from the “College of the Midwest” used in Section 2.1 in the textbook (Introduction to Statistical Investigations by Tintle et al.)

college <- read.table("http://www.isi-stats.com/isi/data/chap3/CollegeMidwest.txt", 
                      header = TRUE)

Throughout the lab we will use the mosaic package

library(mosaic)

Questions

Question 1

Set the seed to 2035 and take a simple random sample of size 30 from the college data frame. Save the random sample of college as a separate R object named sample_0, and print the first six rows to make sure you saved it correctly

We will treat sample_0 as our observed sample data from the population dataset college

Question 2

Compute the population proportion and sample proportion of students who live on campus, and assign the appropriate symbol to each of the values

Question 3

Create a bar graph that displays the proportions (or relative frequency) of students in your sample from Question 1 who live on campus or not

Hint: Use ?bargraph to find the R documentation for the bargraph() function and find an argument that allows you to change the type of scale (on the y-axis).

Question 4

Create a histogram of the sampling distribution of sample proportions. Superimpose a normal curve by including the argument fit = "normal"

Question 5

Compute the mean and standard error of the sampling distribution of sample proportions based on your simulation

Question 6

Visualize the population distribution and the sample distribution of the cumulative GPA for students at the College of the Midwest. Describe the distributions

Hint: Is the cumulative GPA variable quantitative or categorical?

Question 7

Compute the population mean and sample mean of students’ cumulative GPAs, and assign the appropriate symbol to each of the values

Question 8

Compute the population standard deviation and sample standard deviation of students’ cumulative GPAs, and assign the appropriate symbol to each of the values

Question 9

Set the seed to 9286 and simulate the sampling distribution of the mean cumulative GPA for the students at the College of the Midwest. Simulate the sample means with 1000 random samples of size 30. Provide the code that you use to accomplish this.

Question 10

Create a histogram of the sampling distribution of sample means. Superimpose a normal curve by including the argument fit = "normal"

Question 11

Compute the mean and standard error of the sampling distribution of sample means based on your simulation

Question 12

Do you think the sampling distribution of sample proportions is approximately normal? Be sure to comment on the relevant validity condition(s) and the histogram in Question 4

Hint: Try adjusting the number of histogram bins using the nint argument in the histogram() function.

Question 13

According to the theory-based method (i.e., normal approximation by invoking the Central Limit Theorem), what is the mean and standard deviation of the sampling distribution of sample proportions? How close are these values to your simulation-based values from Question 5?

Question 14

Report and interpret a 95% confidence interval for the proportion of students who live on campus using the theory-based approach. Does this interval contain the population parameter?

Question 15

Do you think the sampling distribution of sample means is approximately normal? Be sure to comment on the relevant validity condition(s) and the histogram in Question 10.

Question 16

According to the theory-based method (i.e., normal approximation by invoking the Central Limit Theorem), what is the mean and standard deviation of the sampling distribution of sample means? How close are these values to your simulation-based values from Question 11?

Question 17

Report and interpret a 95% confidence interval for the mean of the cumulative GPA of students using the theory-based approach. The appropriate multiplier can be obtained by running qt(0.975, df = 29). Does this interval contain the population parameter?