MOSAIC Library

Author

Jose Toledo Luna

last updated

October 27, 2023

Installation

The goal of the mosaic package is to make effective computation accessible to university-level students at an introductory level

The following packages from the mosaic suite will be used throughout the labs.

  • {mosaic}
    • {mosaicCore}
    • {mosaicData}

You can the above packages by running the following command in the Console

install.packages('mosaic')
Warning

If you are installing the package for the first time, please be aware that it might take some time since there are numerous dependencies that need to be installed beforehand (this is done automatically)

You can see the remaining packages part of the mosaic suite on the Project MOSAIC Homepage, but they will not be used for this course

If you have successfully installed mosaic package you should be able to run the command library(mosaic) in the console without any errors. However, you will see the following messages

1library(mosaic)
#> Registered S3 method overwritten by 'mosaic':
#>   method                           from   
#>   fortify.SpatialPolygonsDataFrame ggplot2
#> 
#> The 'mosaic' package masks several functions from core packages in order to add 
2#> additional features.  The original behavior of these functions should not be affected by this.
#> 
#> Attaching package: 'mosaic'
#> The following objects are masked from 'package:dplyr':
#>     count, do, tally
#> The following object is masked from 'package:Matrix':
#>     mean
#> The following object is masked from 'package:ggplot2':
#>     stat
#> The following objects are masked from 'package:stats':
#>     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
#>     quantile, sd, t.test, var
#>     
#> The following objects are masked from 'package:base':
#>     max, mean, min, prod, range, sample, sum
1
Make sure to load mosiac package first, you only have to do this once at the beginning of your script. Proceeding, this tutorial will assume you have loaded the mosaic package
2
The output above are simply messages when initially starting up the package warning you about masking of functions with the same name as different packages

For example, the function binom.test() is used in two different packages mosaic and stats. If two packages use the same function name, then the package loaded last will hide the function from earlier packages. This is called masking

While in general, the order of packages being loaded does not matter, if you are using multiple packages which have functions with the exact same name it is better to explicitly call the function name using package_name::function

mosaic::binom.test()
stats::binom.test()

Commonly used functions

For numerical summaries, the following functions are mosaic specific and will be used throughout the course

  • tally(): tabulate categorical data
  • favstats(): numerical summaries including: min, Q1, media, Q3, max, mean, sd, number of observations and missing values
  • diffmean(): difference in means
  • prop(): computes proportions for a single level
  • perc(): computes percents for a single level

While the following functions are in base R, MOSAIC provides equivalent functions with a formula interface

mosaic::mean()
mosaic::median()
mosaic::sd()

If you are going to use say the function mean make sure to specify it is the mosaic version. For example, mosaic::mean(). If you want to know the differences between base R version run ?base::mean() or ?mosaic::mean()

Randomization/Simulation

For randomization or simulations procedures we will primarily use the following mosaic functions

  • rflip(): simulates coin tosses for individuals not yet familiarized with the binomial distribution or just like this syntax and verbosity better.
  • do(): provides a natural syntax for repetition tuned to assist with replication and resampling methods

For a quick overview of the basic of MOSAIC package, refer to Minimal R for Intro Stats tutorial. Note, we will not use all of the functions in the mosaic package

Examples

Numerical Summaries

Consider the following vector containing various types of animals (mostly pets)

animals <- c("fish", "cat",  "fish", "cat"  ,"bird" ,"fish" ,"bird" ,
             "bird" ,"dog"  ,"cat"  ,"dog"  ,"dog"  ,"cat", "dog",  
             "dog",  "cat",  "bird", "cat" , "bird", "fish")

loading the mosaic package and using the tally() function

tally(animals)
#> X
#> bird  cat  dog fish 
#>    5    6    5    4

we can obtain the counts for each animal. By default tally() gives us counts, but we can also display the proportions using the format argument

tally(animals, format = 'proportion')
#> X
#> bird  cat  dog fish 
#> 0.25 0.30 0.25 0.20

which is more readable than using base R

prop.table(table(animals))
#> animals
#> bird  cat  dog fish 
#> 0.25 0.30 0.25 0.20

Reading the documentation ?mosaicCore::tally, format can take one of the following types

c("count", "proportion", "percent", "data.frame", "sparse", "default")
tally(animals, format = 'percent')
#> X
#> bird  cat  dog fish 
#>   25   30   25   20

Simulation

For simulation the do() function from mosaic provides a natural syntax for repetition tuned to assist with replication and resampling methods. Consider the following example

1do_hello <- do(3) * 'hello'
do_hello
1
Interpretation: write ‘hello’ three times
#>   hello
#> 1 hello
#> 2 hello
#> 3 hello

We are going to “do” the word three times do(3) using the operator * not to confuse with multiplication

str(do_hello)
#> Classes 'do.data.frame' and 'data.frame':    3 obs. of  1 variable:
#>  $ hello: chr  "hello" "hello" "hello"
#>  - attr(*, "lazy")= language ~"hello"
#>   ..- attr(*, ".Environment")=<environment: R_EmptyEnv> 
#>  - attr(*, "culler")=function (object, ...)

do returns class do.data.frame and data.frame where each row is the output after performing the task after the * operator one time

do_hello$hello
#> [1] "hello" "hello" "hello"

If you want to extract the values from the generated data.frame, we can use the $ operator

In the next example, we are going to write the numbers 1:4 five times

do(5) * 1:4
#>   V1 V2 V3 V4
#> 1  1  2  3  4
#> 2  1  2  3  4
#> 3  1  2  3  4
#> 4  1  2  3  4
#> 5  1  2  3  4

again the output will be a data.frame where each row is the numbers 1:4, you can think of each row as a replication of the desired task. In some way, it is similar to the base R approach of using replicate

replicate(n=5, 1:4)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    1    1    1
#> [2,]    2    2    2    2    2
#> [3,]    3    3    3    3    3
#> [4,]    4    4    4    4    4

Another function from mosaic to consider is rflip(). The function rflip simulates coin tosses, with heads as success and tails as failures.

1coin_toss <- rflip(n=5)
coin_toss
1
Flip a fair coin five times
#> 
#> Flipping 5 coins [ Prob(Heads) = 0.5 ] ...
#> 
#> T H T T T
#> 
#> Number of Heads: 1 [Proportion Heads: 0.2]
str(coin_toss)
#>  'cointoss' int 1
#>  - attr(*, "n")= num 5
#>  - attr(*, "prob")= num 0.5
#>  - attr(*, "sequence")= chr [1:5] "T" "H" "T" "T" ...
#>  - attr(*, "verbose")= logi TRUE

The output of rflip is a cointoss object, which displays the number of heads alongside the corresponding proportion of heads. We can change the probability of obtaining heads making the coin unfair. For example,

1rflip(n = 10, prob = 0.3)
1
Flip 10 coins, each with probability of 0.3 of landing on heads
#> 
#> Flipping 10 coins [ Prob(Heads) = 0.3 ] ...
#> 
#> T T T T T T T H T T
#> 
#> Number of Heads: 1 [Proportion Heads: 0.1]

we can summarize the above output and return a data.frame which we can extract information from

coin_toss_data <- rflip(n = 10, prob = 0.3, summarize = TRUE)
#>    n heads tails prob
#> 1 10     2     8  0.3
str(coin_toss_data)
#> 'data.frame':    1 obs. of  4 variables:
#>  $ n    : num 10
#>  $ heads: int 2
#>  $ tails: num 8
#>  $ prob : num 0.3

Combining do() and rflip()

Consider the following example,

1result <- do(2) * rflip(n=5)
result
1
Randomly flip a fair coin 5 times and replicate this procedure two times
#>   n heads tails prop
#> 1 5     1     4  0.2
#> 2 5     1     4  0.2

Recall the output after performing a do() statement will be a data.frame where each row is a replication of the procedure following the * operator. The first row shows the number of tosses, how many heads/tails where obtained and the corresponding proportion of heads after n tosses

#> Classes 'do.data.frame' and 'data.frame':    2 obs. of  4 variables:
#>  $ n    : num  5 5
#>  $ heads: num  1 1
#>  $ tails: num  4 4
#>  $ prop : num  0.2 0.2
#>  - attr(*, "lazy")= language ~rflip(n = 5)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>  - attr(*, "culler")=function (object, ...)