Lab 2: North Carolina Births

Learning Objectives

Students will be able to:

  • Increase skill and comfort with basic data management and syntax in R
  • Create statistical graphics to answer questions about a data set
  • Review simulating the null distribution for sample proportions
  • Understand the basic methodology and syntax of the for loop
  • Carry out an entire test of significance for one proportion using simulations

Readings

To prepare for this lab assignment read the following lecture notes

  1. Logical Statements and Applications
  2. Introduction to for-loops
  3. Sampling and Simulation

Required Data and Packages

We will continue to work with the NC data on births used in Lab 1

births <- read.csv('your path to births.csv')

Throughout the lab we will use the mosaic package

library(mosaic)

Questions

Question 1

What are the names of the variables that indicate whether or not the mother smoked and the weight of the baby?

Question 2

In addition to the variable that indicates whether or not the mother smoked, choose one numerical (quantitative) variable and one categorical variable that you think might also affect the weight of a baby. Explain why

Question 3

Create a data frame that has only the variables Gender, Premie, weight, and Apgar1, and the observations for which the baby was not premature. Print out the first few rows using the head() function.

Question 4

Create a data frame that has only the four variables you identified in Questions 1 and 2 and the observations for which the mother was a smoker. Print out the first few rows using the head() function

Question 5

Report the mean of

  1. the weights of all babies.
  2. the weights of those born to smoking mothers.
  3. the weights of those born to non-smoking mothers.

The sd() function computes the standard deviation of the input. The syntax is identical to that of the mean() function

Question 6

Report the standard deviation of

  1. the weights of all babies
  2. the weights of those born to smoking mothers
  3. the weights of those born to non-smoking mothers

Question 7

Using Births, create two graphics that help answer this question: Does the birth weight of babies born to smoking mothers differ from those born to non-smoking mothers?

Note: You are not limited to histograms. If you want, you can look into dotPlot(), densityplot(), freqpolygon(), bwplot(), and mosaicplot() for other types of plots to visualize data

Question 8

Set the seed to 2023 and simulate tossing 19 coins, where the probability of heads is 0.3 using rflip(). Print the first six lines of the output

Question 9

Set the seed to 2023 and simulate 1000 repetitions of tossing 19 coins,where the probability of heads is 0.3. Save the output from this simulation, and print the first six lines from the saved data frame

Question 10

Setup your own coin flip simulation. Pick your own seed, sample size, number of repetitions and probability. Print the first six lines of the output.

Question 11

Set the seed to 2350 and choose 25 numbers out of the numbers from 1to 1990, without replacement.

Question 12

Set the seed to 2350 and simulate 25 coin flips using the sample() function, where the probability of heads (represented by 1) is 0.3. Print the proportion of heads obtained

Hint: What would the mean() function do given the vector of coded coin flips?