Introduction to R Programming
Merging Data
rbind()
& cbind()
Make sure each dataset has the same variables, same class, and is in the same order.
Merge()
# merge two data frames by variable ID merge(A, B, by = "ID") # merge two data frames by variables ID and Country merge(A, B, by = c("ID". "Country")) # The default is to use all variables shared by X and Y #merge two data frames by adding all observations from dataset A merge(A, B, by = "ID", all.x = TRUE) #merge two data frames by adding all observations from both datasets merge(A, B, by = "ID", all = TRUE)
SQL in R
The University of Michigan has provided a great overview — SQL in R
Basics
SELECT
clause: Used to filter observations.*
: Selects all variables and observations in the selected dataset.WHERE
clause: Used to filter observations based on specific conditions.LIKE
with '%': Retrieves records where variables start or end with a specific character.LIMIT
: A clause used to keep the first 'k' observations.GROUP BY
: Used to rearrange and group the new dataset by one or several variables.
dplyr
Commonly Used functions in "dplyr" Package
filter()
: Use this function to select observations by their values.mutate()
: Create new variables with functions of existing variables.select()
: Pick variables by their name.group_by()
: Use this function with thesummarize()
function to get descriptive statistics.summarize()
: Obtain requested summary statistics.arrange()
: Reorder the rows.%>%
: This operator, also known as the pipe operator, allows you to connect several functions in a sequence.
- Last Updated: Feb 14, 2025 11:54 AM
- URL: https://campusguides.lib.utah.edu/r-programming
- Print Page