<- read.table("http://www.isi-stats.com/isi/data/chap3/CollegeMidwest.txt",
college header = TRUE)
Lab 3: Midwest College
Learning Objectives
Students will be able to:
- Increase skill and comfort with basic data management and syntax in R.
- Review and increase understanding of the basic methodology and syntax of the for loop
- Simulating the sampling distributions for sample proportions and sample means.
- Review the concepts of the sampling distribution and the normal approximation.
- Practice constructing and interpreting confidence intervals.
Readings
To prepare for this lab assignment read the following lecture notes
- Sampling and Simulation
- Normal Distribution
- Introduction to Central Limit Theorem
- Confidence Intervals
Required Data and Packages
We will use the data from the “College of the Midwest” used in Section 2.1 in the textbook (Introduction to Statistical Investigations by Tintle et al.)
Throughout the lab we will use the mosaic
package
library(mosaic)
Questions
Question 1
Set the seed to 2035 and take a simple random sample of size 30 from the college
data frame. Save the random sample of college
as a separate R object named sample_0
, and print the first six rows to make sure you saved it correctly
We will treat sample_0
as our observed sample data from the population dataset college
Question 2
Compute the population proportion and sample proportion of students who live on campus, and assign the appropriate symbol to each of the values
Question 3
Create a bar graph that displays the proportions (or relative frequency) of students in your sample from Question 1 who live on campus or not
Hint: Use ?bargraph
to find the R documentation for the bargraph()
function and find an argument that allows you to change the type of scale (on the y-axis).
Question 4
Create a histogram of the sampling distribution of sample proportions. Superimpose a normal curve by including the argument fit = "normal"
Question 5
Compute the mean and standard error of the sampling distribution of sample proportions based on your simulation
Question 6
Visualize the population distribution and the sample distribution of the cumulative GPA for students at the College of the Midwest. Describe the distributions
Hint: Is the cumulative GPA variable quantitative or categorical?
Question 7
Compute the population mean and sample mean of students’ cumulative GPAs, and assign the appropriate symbol to each of the values
Question 8
Compute the population standard deviation and sample standard deviation of students’ cumulative GPAs, and assign the appropriate symbol to each of the values
Question 9
Set the seed to 9286 and simulate the sampling distribution of the mean cumulative GPA for the students at the College of the Midwest. Simulate the sample means with 1000 random samples of size 30. Provide the code that you use to accomplish this.
Question 10
Create a histogram of the sampling distribution of sample means. Superimpose a normal curve by including the argument fit = "normal"
Question 11
Compute the mean and standard error of the sampling distribution of sample means based on your simulation
Question 12
Do you think the sampling distribution of sample proportions is approximately normal? Be sure to comment on the relevant validity condition(s) and the histogram in Question 4
Hint: Try adjusting the number of histogram bins using the nint argument in the histogram()
function.
Question 13
According to the theory-based method (i.e., normal approximation by invoking the Central Limit Theorem), what is the mean and standard deviation of the sampling distribution of sample proportions? How close are these values to your simulation-based values from Question 5?
Question 14
Report and interpret a 95% confidence interval for the proportion of students who live on campus using the theory-based approach. Does this interval contain the population parameter?
Question 15
Do you think the sampling distribution of sample means is approximately normal? Be sure to comment on the relevant validity condition(s) and the histogram in Question 10.
Question 16
According to the theory-based method (i.e., normal approximation by invoking the Central Limit Theorem), what is the mean and standard deviation of the sampling distribution of sample means? How close are these values to your simulation-based values from Question 11?
Question 17
Report and interpret a 95% confidence interval for the mean of the cumulative GPA of students using the theory-based approach. The appropriate multiplier can be obtained by running qt(0.975, df = 29)
. Does this interval contain the population parameter?