Elastic Net, LASSO, and Ridge Regression

Individual Exercise Solution

Use fl2003.RData, which is a cleaned up version of the data from Fearon and Laitin (2003). Fit a model where onset is explained by all variables. Use cv.glmnet() to fit elastic net models for a variety of \(\alpha\) values, using a loss function that is appropriate for the binomial nature of the data. Present plots of the model’s predictive accuracy for different \(\alpha\) values. Fit a model with glmnet() using the \(\alpha\) value you found that minimizes predictive error. Report coefficient estimates for all variables, and plot the changes in coefficient values vs. the L1 norm, log-lambda value, and deviance explained.

Randomly sample five cases where onset = 0 and five where onset = 1. Fit an elastic net with the optimal \(\alpha\) value you found for the whole dataset. Are the most important coefficients the same?

library(dplyr)
library(glmnet)
library(caret)
library(parallel)
load('fl2003.RData')

# create predictors and response
fl_x <- as.matrix(fl[, -1])
fl_y <- as.factor(fl$onset)

# sequences of alpha values to evaluate
alphas <- seq(0, 1, by = .1)

# cross validation elastic nets for different penalty parameters
fits <- mclapply(alphas, function(x) cv.glmnet(fl_x, fl_y, type.measure = 'auc',
                                            alpha = x, family = 'binomial'))

# plot AUC for different penalty parameters
par(mfrow = c(3,1))
plot(fits[[1]])
plot(fits[[6]])
plot(fits[[11]])

# penalty parameter w/ highest AUC
alpha_best <- which.max(sapply(fits, function(x) max(x$cvm)))

# fit elastic net w/ best penalty parameters
best_fit <- glmnet(fl_x, fl_y, family = 'binomial', alpha = alphas[alpha_best])

# plot coefficients
par(mfrow = c(3,1))
plot(best_fit, xvar = 'norm', label = T)
plot(best_fit, xvar = 'lambda', label = T)
plot(best_fit, xvar = 'dev', label = T)

# coefficients from whole dataset
betas <- varImp(best_fit, lambda = best_fit$lambda[length(best_fit$lambda)])

# sample 10 observations for each outcome
fl_samp <- fl %>% group_by(onset) %>% sample_n(10)

# create predictors and response
fl_x_samp <- as.matrix(fl_samp[, -1])
fl_y_samp <- as.factor(fl_samp$onset)

# fit sample data w/ best penalty parameter from entire dataset
best_fit_samp <- glmnet(fl_x_samp, fl_y_samp, family = 'binomial', alpha = alphas[alpha_best])

# plot coefficients
plot(best_fit_samp, xvar = 'norm', label = T)
plot(best_fit_samp, xvar = 'lambda', label = T)
plot(best_fit_samp, xvar = 'dev', label = T)

# coefficients for sampled data
betas_samp <- varImp(best_fit_samp, lambda = best_fit_samp$lambda[length(best_fit_samp$lambda)])

# compare coefficients for both models
kable(data.frame(betas, betas_samp) %>% rename(All = Overall, Sample = Overall.1))

	All	Sample
warl	1.0222	0.0000
gdpenl	0.2353	2.3865
lpopl1	0.2668	7.3504
lmtnest	0.2198	3.8555
ncontig	0.7021	12.0521
Oil	0.6626	1.0948
nwstate	1.6557	0.0000
instab	0.6097	8.2590
polity2l	0.0247	0.8997
ethfrac	0.0407	7.2016
relfrac	0.0010	21.7585
western	1.9047	0.0000
eeurop	0.6906	0.0000
lamerica	0.3673	9.8158
ssafrica	0.0650	1.0215
asia	0.2107	0.2941

Elastic Net, LASSO, and Ridge Regression

Rob Williams

November 15, 2018

Individual Exercise Solution

References