Lessons
- What’s an Open Lab?
- Why R?
- Learning objectives for the semester
- Setup: R, R Studio
- A quick example
Required Data Files
listings.csv
Optional Reading
R for Data Science Chapters 4, 6, 8
- Reproducibility
- Projects in RStudio
- Importing data
- Objects and classes
- Tables for categorical data
- Exploring continuous data
- Missing data
- Saving output
- ggplot (time allowing)
Required Data Files
Five Thousand Wine Reviews
Optional Reading
R for Data Science Chapter 5
- Review: Starting a New Project in R, loading the tidyverse and importing data
- Filtering
- Relational and Assignment Operators
- Reordering Data (arrange)
- Selecting Data (select)
- Renaming Columns
- Adding New Variables
- Summarizing Data
- Piping
Required Data Files
Boston AirBnB Data
Optional Reading
R for Data Science Chapter 7
- What is Exploratory Data Analysis?
- What do we have? – dim, str, and summary
- Frequency – Univariate EDA
- Covariation – Two or more variables
- Categorical vs Categorical Variables
- Categorical vs Continuous Variables
Required Data Files
New York Business Inspections
Optional Reading
R for Data Science Chapters 14 and 15
- Getting Started With Strings
- Combining and Subsetting Strings
- Regular Expressions
- Creating Factors
- Altering Factors
Required Data Files
Brazilian E-Commerce
Optional Reading
R for Data Science Chapters 12 & 13
- Merging / Joining Dataframes
- Reshaping with tidyr
Required Data Files
US Cheese Consumption
Optional Reading
R for Data Science Chapter 27
- R Markdown
- Markdown Syntax
- Creating Reproducible Reports
Required Data Files
Weather in Austin, TX
Optional Reading
R for Data Science Chapter 19
- When you should write a function
- Steps to writing a function
- Naming conventions
- Arguments
- Returns
- Conditionals
- Environment
- Getting started with loops
- Output
- While loops
- Loops with conditionals and functions
- Error handling
- Terminology
- Simple Linear Models with Plots
- Multiple Regression – Formula notation in R
- Modeling
- Simulations
- Reproducible simulations
Extras
Have you ever wanted to change your ggplots with the click of a button? Wouldn’t it be nice to use a drop-down menu to filter your data? R Shiny allows you and others to interact with your code through a graphic web interface.
This extra shows you an easy way to split up your loops over multiple cores on your computer to run in “parallel” and speed up large or long-running loops.
This extra demonstrates two useful tools for handling missing data in statistical models.
The caret
package provides a consistent framework for fitting hundreds of different types of predictive models, then comparing them to select the most effective models using out of sample accuracy.
The stargazer package makes it easy to create publication quality regression tables in html or LateX.
The ggally package provides a function for creating scatterplot matrices. A scatterplot matrix arranges multiple scatterplots on a grid so that they are easy to compare to one another.