<- read.table("http://www.isi-stats.com/isi/data/chap3/CollegeMidwest.txt",
college header = TRUE)
Lab 4: Centers for Disease Control and Prevention
Learning Objectives
- Increase skill and comfort with basic data management and syntax in R.
- Review the concepts of the normal distribution.
- Review standardized statistics and introduce \(z\)-scores.
- Compute probabilities from the normal distribution and the \(t\)-distribution.
- Review the concepts of the \(t\)-distribution and the one sample \(t\)-test.
- Carry out an entire test of significance for the difference in two proportions using simulations and theory
Readings
To prepare for this lab assignment read the following lecture notes
For reference on past material covering logical statements, visualizing and analyzing categorical/numerical variables refer to
- Summarizing and Visualizing Numerical Variables
- Summarizing and Visualizing Categorical Variables
- Logical Statements
Required Data and Packages
We will use two data sets in this lab
- We will use the data from the “College of the Midwest” used in Section 2.1 in the textbook (Introduction to Statistical Investigations by Tintle et al.)
- The file “cdc.csv” on CCLE contains data based on a random sample conducted by the Centers for Disease Control and Prevention (CDC).
Throughout the lab we will use the mosaic
package
library(mosaic)
Questions
Question 1
Compute the probability of observing a person with a weight of 145 pounds or more from a population that follows a normal distribution with a mean of 127.1 pounds and a standard deviation of 11.7 pounds
Question 2
Compute the quantity in Question 1 by first converting the value to a z-score and then calculating the probability with respect to the standard normal distribution. Report both the z-score and the probability
Question 3
What is the probability of observing a value of 2.3 or more from statistics that follow a \(t\)-distribution with 35 degrees of freedom?
Question 4
Using a significance level of \(\alpha = 0.01\), state the conclusion for the hypothesis test in the example above in context
Question 5
Use the t.test()
function to compute a 99% confidence interval for the mean cumulative GPA of students at the College of the Midwest with the sample taken in the example above
Question 6
- Use either
&
or|
to indicate (byTRUE
orFALSE
) which entries ofex_vec
are greater than 4 AND less than 6. - Use either
&
or|
to indicate which entries ofex_vec
are less than 4 OR greater than 6.
Question 7
Use the correct notation and the format argument in the tally()
function to answer the following questions.
- What proportion of people in our sample said they are in good health?
- What proportion of people in our sample who said they are in good health also said they did not exercise at least once a week?
- What proportion of people in our sample said they exercise at least once a week and are in good health?
Question 8
What is the observed statistic that was computed above (i.e., what does obs_diff
represent)?
Hint: The obs_diff
value is a difference of what two proportions?
Question 9
Compute the simulation-based p-value
Hint: The abs()
function inputs a numeric vector and outputs the absolute value of each entry. What does mean(abs(diff_props) >= 0.01)
compute?
Question 10
Using a significance level of \(\alpha = 0.05\), decide whether or not you would reject the null hypothesis in favor of the alternative. Interpret this decision in context.
Question 11
- Compute the standardized statistic according to the simulation-based approach.
- Compute the standardized statistic according to the theory-based approach. Hint: Note that in the z-score formula, our value that we are standardizing is our observed difference \(\hat{p}_1 −\hat{p}_2\). The simulation-based and theory-based approaches differ in how the standard error is computed.
Question 12
Compute the theory-based p-value using the pnorm()
function How does this compare to the p-value from Question 9?