Lasso Regression Example using glmnet package in R
More details please refer to the link below: (https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#lin)
This post shows how to use glmnet
package to fit lasso regression and how to visualize the output. The description of data is shown in here.
1 Visualize the coefficients
1.1 Label the path
The summary table below shows from left to right the number of nonzero coefficients (DF), the percent (of null) deviance explained (%dev) and the value of \lambda (Lambda
).
We can get the actual coefficients at a specific \lambda whin the range of sequence:
Also, it can allow people to make predictions at specific \lambda with new input data:
cv.glmnet
is the function to do cross-validation here.
Plotting the object gives the selected \lambda and corresponding Mean-Square Error.
We can view the selected \lambda’s and the corresponding coefficients, For example,
lambda.min
returns the value of \lambda that gives minimum mean cross-validated error. The other \lambda saved is lambda.lse
, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min
with lambda.lse
above.
⌘+C
# create a function to transform coefficient of glmnet and cvglmnet to data.frame
coeff2dt <- function(fitobject, s) {
coeffs <- coef(fitobject, s)
coeffs.dt <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
# reorder the variables in term of coefficients
return(coeffs.dt[order(coeffs.dt$coefficient, decreasing = T),])
}
coeff2dt(fitobject = cv.fit, s = "lambda.min") %>% head(20)
⌘+C
coeffs.table <- coeff2dt(fitobject = cv.fit, s = "lambda.min")
ggplot(data = coeffs.table) +
geom_col(aes(x = name, y = coefficient, fill = {coefficient > 0})) +
xlab(label = "") +
ggtitle(expression(paste("Lasso Coefficients with ", lambda, " = 0.0275"))) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
2 Elastic net
As an example, we can set \alpha=0.2
According to the default internal settings, the computations stop if either the fractional change in deviance down the path is less than 10^{-5} or the fraction of explained deviance reaches 0.999.