R Econometrics
Purpose
This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.
When to Use
- •Running causal inference analyses
- •Estimating treatment effects with panel data
- •Creating publication-ready regression tables
- •Implementing modern econometric methods (two-way fixed effects, event studies)
Instructions
Step 1: Understand the Research Design
Before generating code, ask the user:
- •What is your identification strategy? (IV, DiD, RDD, or simple regression)
- •What is the unit of observation? (individual, firm, country-year, etc.)
- •What fixed effects do you need? (entity, time, two-way)
- •How should standard errors be clustered?
Step 2: Generate Analysis Code
Based on the research design, generate R code that:
- •Uses the
fixestpackage - Modern, fast, and feature-rich for panel data - •Includes proper diagnostics:
- •For IV: First-stage F-statistics, weak instrument tests
- •For DiD: Parallel trends visualization, event study plots
- •For RDD: Bandwidth selection, density tests
- •Uses robust/clustered standard errors appropriate for the data structure
- •Creates publication-ready output using
modelsummaryoretable
Step 3: Structure the Output
Always include:
r
# 1. Setup and packages # 2. Data loading and preparation # 3. Descriptive statistics # 4. Main specification # 5. Robustness checks # 6. Visualization # 7. Export results
Step 4: Add Documentation
Include comments explaining:
- •Why each specification choice was made
- •Interpretation of key coefficients
- •Limitations and assumptions
Example Prompts
- •"Run a DiD analysis with state and year fixed effects, clustering at the state level"
- •"Estimate the effect of X on Y using Z as an instrument"
- •"Create an event study plot showing treatment effects by year"
- •"Run a sharp RDD with optimal bandwidth selection"
Example Output
r
# ============================================
# Difference-in-Differences Analysis
# ============================================
# Setup
library(tidyverse)
library(fixest)
library(modelsummary)
# Load data
df <- read_csv("data.csv")
# Prepare treatment variable
df <- df %>%
mutate(
post = year >= treatment_year,
treated = state %in% treatment_states,
treat_post = treated * post
)
# ----------------------------------------
# Main DiD Specification
# ----------------------------------------
# Two-way fixed effects
did_model <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state
)
# View results
summary(did_model)
# ----------------------------------------
# Event Study
# ----------------------------------------
# Create relative time variable
df <- df %>%
mutate(rel_time = year - treatment_year)
# Event study regression
event_study <- feols(
outcome ~ i(rel_time, treated, ref = -1) | state + year,
data = df,
cluster = ~state
)
# Plot coefficients
iplot(event_study,
main = "Event Study: Effect on Outcome",
xlab = "Years Relative to Treatment")
# ----------------------------------------
# Robustness: Alternative Specifications
# ----------------------------------------
# Different clustering
did_robust <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state + year # Two-way clustering
)
# ----------------------------------------
# Export Results
# ----------------------------------------
modelsummary(
list("Main" = did_model, "Two-way Cluster" = did_robust),
stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
output = "results/did_table.tex"
)
Requirements
Software
- •R 4.0+
Packages
- •
fixest- Fast fixed effects estimation - •
modelsummary- Publication-ready tables - •
tidyverse- Data manipulation - •
ggplot2- Visualization
Install with:
r
install.packages(c("fixest", "modelsummary", "tidyverse"))
Best Practices
- •Always cluster standard errors at the level of treatment assignment
- •Run pre-trend tests for DiD designs
- •Report first-stage F-statistics for IV (should be > 10)
- •Use
feolsoverlmfor panel data (faster and more features) - •Document all specification choices in your code comments
Common Pitfalls
- •❌ Not clustering standard errors at the right level
- •❌ Ignoring weak instruments in IV estimation
- •❌ Using TWFE with staggered treatment timing (use
didorsunab()instead) - •❌ Not reporting robustness checks
References
- •fixest documentation
- •Cunningham (2021) Causal Inference: The Mixtape
- •Angrist & Pischke (2009) Mostly Harmless Econometrics
Changelog
v1.0.0
- •Initial release with IV, DiD, RDD support