install.packages('mosaic')
MOSAIC Library
Installation
The goal of the mosaic
package is to make effective computation accessible to university-level students at an introductory level
The following packages from the mosaic suite
will be used throughout the labs.
{mosaic}
{mosaicCore}
{mosaicData}
You can the above packages by running the following command in the Console
If you are installing the package for the first time, please be aware that it might take some time since there are numerous dependencies that need to be installed beforehand (this is done automatically)
You can see the remaining packages part of the mosaic suite
on the Project MOSAIC Homepage, but they will not be used for this course
If you have successfully installed mosaic
package you should be able to run the command library(mosaic)
in the console without any errors. However, you will see the following messages
1library(mosaic)
#> Registered S3 method overwritten by 'mosaic':
#> method from
#> fortify.SpatialPolygonsDataFrame ggplot2
#>
#> The 'mosaic' package masks several functions from core packages in order to add
2#> additional features. The original behavior of these functions should not be affected by this.
#>
#> Attaching package: 'mosaic'
#> The following objects are masked from 'package:dplyr':
#> count, do, tally
#> The following object is masked from 'package:Matrix':
#> mean
#> The following object is masked from 'package:ggplot2':
#> stat
#> The following objects are masked from 'package:stats':
#> binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
#> quantile, sd, t.test, var
#>
#> The following objects are masked from 'package:base':
#> max, mean, min, prod, range, sample, sum
- 1
-
Make sure to load
mosiac
package first, you only have to do this once at the beginning of your script. Proceeding, this tutorial will assume you have loaded themosaic
package - 2
- The output above are simply messages when initially starting up the package warning you about masking of functions with the same name as different packages
For example, the function binom.test()
is used in two different packages mosaic
and stats
. If two packages use the same function name, then the package loaded last will hide the function from earlier packages. This is called masking
While in general, the order of packages being loaded does not matter, if you are using multiple packages which have functions with the exact same name it is better to explicitly call the function name using package_name::function
::binom.test()
mosaic::binom.test() stats
Commonly used functions
For numerical summaries, the following functions are mosaic
specific and will be used throughout the course
tally()
: tabulate categorical datafavstats()
: numerical summaries including: min, Q1, media, Q3, max, mean, sd, number of observations and missing valuesdiffmean()
: difference in meansprop()
: computes proportions for a single levelperc()
: computes percents for a single level
While the following functions are in base R, MOSAIC
provides equivalent functions with a formula interface
::mean()
mosaic::median()
mosaic::sd() mosaic
If you are going to use say the function mean
make sure to specify it is the mosaic
version. For example, mosaic::mean()
. If you want to know the differences between base R version run ?base::mean()
or ?mosaic::mean()
Randomization/Simulation
For randomization or simulations procedures we will primarily use the following mosaic
functions
rflip()
: simulates coin tosses for individuals not yet familiarized with the binomial distribution or just like this syntax and verbosity better.do()
: provides a natural syntax for repetition tuned to assist with replication and resampling methods
For a quick overview of the basic of MOSAIC package, refer to Minimal R for Intro Stats tutorial. Note, we will not use all of the functions in the mosaic
package
Examples
Numerical Summaries
Consider the following vector containing various types of animals (mostly pets)
<- c("fish", "cat", "fish", "cat" ,"bird" ,"fish" ,"bird" ,
animals "bird" ,"dog" ,"cat" ,"dog" ,"dog" ,"cat", "dog",
"dog", "cat", "bird", "cat" , "bird", "fish")
loading the mosaic
package and using the tally()
function
tally(animals)
#> X
#> bird cat dog fish
#> 5 6 5 4
we can obtain the counts for each animal. By default tally()
gives us counts, but we can also display the proportions using the format
argument
tally(animals, format = 'proportion')
#> X
#> bird cat dog fish
#> 0.25 0.30 0.25 0.20
which is more readable than using base R
prop.table(table(animals))
#> animals
#> bird cat dog fish
#> 0.25 0.30 0.25 0.20
Reading the documentation ?mosaicCore::tally
, format can take one of the following types
c("count", "proportion", "percent", "data.frame", "sparse", "default")
tally(animals, format = 'percent')
#> X
#> bird cat dog fish
#> 25 30 25 20
Simulation
For simulation the do()
function from mosaic
provides a natural syntax for repetition tuned to assist with replication and resampling methods. Consider the following example
1<- do(3) * 'hello'
do_hello do_hello
- 1
- Interpretation: write ‘hello’ three times
#> hello
#> 1 hello
#> 2 hello
#> 3 hello
We are going to “do” the word three times do(3)
using the operator *
not to confuse with multiplication
str(do_hello)
#> Classes 'do.data.frame' and 'data.frame': 3 obs. of 1 variable:
#> $ hello: chr "hello" "hello" "hello"
#> - attr(*, "lazy")= language ~"hello"
#> ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
#> - attr(*, "culler")=function (object, ...)
do
returns class do.data.frame
and data.frame
where each row is the output after performing the task after the *
operator one time
$hello do_hello
#> [1] "hello" "hello" "hello"
If you want to extract the values from the generated data.frame, we can use the $
operator
In the next example, we are going to write the numbers 1:4
five times
do(5) * 1:4
#> V1 V2 V3 V4
#> 1 1 2 3 4
#> 2 1 2 3 4
#> 3 1 2 3 4
#> 4 1 2 3 4
#> 5 1 2 3 4
again the output will be a data.frame where each row is the numbers 1:4, you can think of each row as a replication of the desired task. In some way, it is similar to the base R approach of using replicate
replicate(n=5, 1:4)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 1 1 1
#> [2,] 2 2 2 2 2
#> [3,] 3 3 3 3 3
#> [4,] 4 4 4 4 4
Another function from mosaic
to consider is rflip()
. The function rflip
simulates coin tosses, with heads as success and tails as failures.
1<- rflip(n=5)
coin_toss coin_toss
- 1
- Flip a fair coin five times
#>
#> Flipping 5 coins [ Prob(Heads) = 0.5 ] ...
#>
#> T H T T T
#>
#> Number of Heads: 1 [Proportion Heads: 0.2]
str(coin_toss)
#> 'cointoss' int 1
#> - attr(*, "n")= num 5
#> - attr(*, "prob")= num 0.5
#> - attr(*, "sequence")= chr [1:5] "T" "H" "T" "T" ...
#> - attr(*, "verbose")= logi TRUE
The output of rflip
is a cointoss
object, which displays the number of heads alongside the corresponding proportion of heads. We can change the probability of obtaining heads making the coin unfair. For example,
1rflip(n = 10, prob = 0.3)
- 1
- Flip 10 coins, each with probability of 0.3 of landing on heads
#>
#> Flipping 10 coins [ Prob(Heads) = 0.3 ] ...
#>
#> T T T T T T T H T T
#>
#> Number of Heads: 1 [Proportion Heads: 0.1]
we can summarize the above output and return a data.frame
which we can extract information from
<- rflip(n = 10, prob = 0.3, summarize = TRUE) coin_toss_data
#> n heads tails prob
#> 1 10 2 8 0.3
str(coin_toss_data)
#> 'data.frame': 1 obs. of 4 variables:
#> $ n : num 10
#> $ heads: int 2
#> $ tails: num 8
#> $ prob : num 0.3
Combining do()
and rflip()
Consider the following example,
1<- do(2) * rflip(n=5)
result result
- 1
- Randomly flip a fair coin 5 times and replicate this procedure two times
#> n heads tails prop
#> 1 5 1 4 0.2
#> 2 5 1 4 0.2
Recall the output after performing a do()
statement will be a data.frame where each row is a replication of the procedure following the *
operator. The first row shows the number of tosses, how many heads/tails where obtained and the corresponding proportion of heads after n
tosses
#> Classes 'do.data.frame' and 'data.frame': 2 obs. of 4 variables:
#> $ n : num 5 5
#> $ heads: num 1 1
#> $ tails: num 4 4
#> $ prop : num 0.2 0.2
#> - attr(*, "lazy")= language ~rflip(n = 5)
#> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> - attr(*, "culler")=function (object, ...)