Introduction to R Programming

Merging Data

rbind() & cbind()

Make sure each dataset has the same variables, same class, and is in the same order.


Merge()

# merge two data frames by variable ID
merge(A, B, by = "ID")

# merge two data frames by variables ID and Country
merge(A, B, by = c("ID". "Country"))  # The default is to use all variables shared by X and Y

#merge two data frames by adding all observations from dataset A
merge(A, B, by = "ID", all.x = TRUE)

#merge two data frames by adding all observations from both datasets
merge(A, B, by = "ID", all = TRUE)

SQL in R

The University of Michigan has provided a great overview — SQL in R

Basics

  • SELECT clause: Used to filter observations.
  • *: Selects all variables and observations in the selected dataset.
  • WHERE clause: Used to filter observations based on specific conditions.
  • LIKE with '%': Retrieves records where variables start or end with a specific character.
  • LIMIT: A clause used to keep the first 'k' observations.
  • GROUP BY: Used to rearrange and group the new dataset by one or several variables.

dplyr

Commonly Used functions in "dplyr" Package

  • filter(): Use this function to select observations by their values.
  • mutate(): Create new variables with functions of existing variables.
  • select(): Pick variables by their name.
  • group_by(): Use this function with the summarize() function to get descriptive statistics.
  • summarize(): Obtain requested summary statistics.
  • arrange(): Reorder the rows.
  • %>%: This operator, also known as the pipe operator, allows you to connect several functions in a sequence.
Marriott Library Eccles Library Quinney Law Library