<- read.csv('your path to births.csv') births
Lab 2: North Carolina Births
Learning Objectives
Students will be able to:
- Increase skill and comfort with basic data management and syntax in R
- Create statistical graphics to answer questions about a data set
- Review simulating the null distribution for sample proportions
- Understand the basic methodology and syntax of the for loop
- Carry out an entire test of significance for one proportion using simulations
Readings
To prepare for this lab assignment read the following lecture notes
Required Data and Packages
We will continue to work with the NC data on births used in Lab 1
Throughout the lab we will use the mosaic
package
library(mosaic)
Questions
Question 1
What are the names of the variables that indicate whether or not the mother smoked and the weight of the baby?
Question 2
In addition to the variable that indicates whether or not the mother smoked, choose one numerical (quantitative) variable and one categorical variable that you think might also affect the weight of a baby. Explain why
Question 3
Create a data frame that has only the variables Gender, Premie, weight
, and Apgar1
, and the observations for which the baby was not premature. Print out the first few rows using the head()
function.
Question 4
Create a data frame that has only the four variables you identified in Questions 1 and 2 and the observations for which the mother was a smoker. Print out the first few rows using the head()
function
Question 5
Report the mean of
- the weights of all babies.
- the weights of those born to smoking mothers.
- the weights of those born to non-smoking mothers.
The sd()
function computes the standard deviation of the input. The syntax is identical to that of the mean()
function
Question 6
Report the standard deviation of
- the weights of all babies
- the weights of those born to smoking mothers
- the weights of those born to non-smoking mothers
Question 7
Using Births
, create two graphics that help answer this question: Does the birth weight of babies born to smoking mothers differ from those born to non-smoking mothers?
Note: You are not limited to histograms. If you want, you can look into dotPlot(), densityplot(), freqpolygon(), bwplot()
, and mosaicplot()
for other types of plots to visualize data
Question 8
Set the seed to 2023 and simulate tossing 19 coins, where the probability of heads is 0.3 using rflip()
. Print the first six lines of the output
Question 9
Set the seed to 2023 and simulate 1000 repetitions of tossing 19 coins,where the probability of heads is 0.3. Save the output from this simulation, and print the first six lines from the saved data frame
Question 10
Setup your own coin flip simulation. Pick your own seed, sample size, number of repetitions and probability. Print the first six lines of the output.
Question 11
Set the seed to 2350 and choose 25 numbers out of the numbers from 1to 1990, without replacement.
Question 12
Set the seed to 2350 and simulate 25 coin flips using the sample()
function, where the probability of heads (represented by 1) is 0.3. Print the proportion of heads obtained
Hint: What would the mean()
function do given the vector of coded coin flips?