Intro to R in Epidemiology (Part Two)

Written by Aashna Uppal @auppal

(The content of this course draws from the EpiR Handbook (Chapters 3-8, 17, 30, 40), which is a free online resource developed by Applied Epi)

https://epirhandbook.com/en

  1. Base R vs. tidyverse

  • The tidyverse is like a package of packages, designed specifically for working with data in a “tidy” way.
  • You can still manipulate data without it – this is called the “base R” way of doing it.
  • The difference between the two can be illustrated by a quick example.

  • In order to bake a cake, you need to
put_in_oven(mix_all_ingredients_together_into_a_cake_batter(combine(flour, egg, baking_soda, butter, sugar)))
  • In order to bake a cake, you need to
flour %>%
add(egg) %>%
add(baking_soda) %>%
add(butter) %>%
add(sugar) %>%
mix_all_ingredients_together_into_a_cake_batter() %>%
put_in_oven()

The pipe operater (%>%)

  • Simply explained, the pipe operator (%>%) passes an intermediate output from one function to the next. You can think of it as saying “then”. Many functions can be linked together with %>%.
  • Piping emphasizes a sequence of actions, not the object the actions are being performed on
  • Pipes come from the package magrittr, which is automatically included in packages dplyr and tidyverse
  • Pipes can make code more clean and easier to read, more intuitive
  1. Importing data
  • When you import a “dataset” into R, you are generally creating a new data frame object in your R environment and defining it as an imported file (e.g. Excel, CSV, TSV, RDS) that is located in your folder directories at a certain file path/address. We will see an example of this later.
  • You can import/export many types of files, including those created by other statistical programs (SAS, STATA, SPSS). You can also connect to relational databases.

The rio package

  • The R package we recommend for importing data is rio.
  • Its functions import() and export() can handle many different file types (e.g. .xlsx, .csv, .rds, .tsv). When you provide a file path to either of these functions (including the file extension like “.csv”), rio will read the extension and use the correct tool to import or export the file.
  • The alternative to using rio is to use functions from many other packages, each of which is specific to a type of file. For example, read.csv() (base R), read.xlsx() (openxlsx package), and write_csv() (readr pacakge), etc.

The here package

  • The package here and its function here() make it easy to tell R where to find and to save your files – in essence, it builds file paths.
  • Used in conjunction with an R project, here allows you to describe the location of files in your R project in relation to the R project’s root directory (the top-level folder). This is useful when the R project may be shared or accessed by multiple people/computers.
  • When the here package is first loaded within the R project, it places a small file called “.here” in the root folder of your R project as a “benchmark” or “anchor”
here::i_am("cleaning_code.R")
linelist <- import(here("data", "ebola_linelist.xlsx"))

Written by Aashna Uppal @auppal

Related Articles

Responses

Your email address will not be published. Required fields are marked *