<- readRDS(url("https://s3.amazonaws.com/pbreheny-data-sets/whoari.rds"))
dt attach(dt)
<- glmnet(X, y) fit
More details please refer to the link below: (https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#lin)
This post shows how to use glmnet
package to fit lasso regression and how to visualize the output. The description of data is shown in here.
1 Visualize the coefficients
plot(fit)
1.1 Label the path
plot(fit, label = TRUE)
The summary table below shows from left to right the number of nonzero coefficients (DF), the percent (of null) deviance explained (%dev) and the value of \lambda (Lambda
).
We can get the actual coefficients at a specific \lambda whin the range of sequence:
<- coef(fit, s = 0.1)
coeffs <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
coeffs.dt
# reorder the variables in term of coefficients
order(coeffs.dt$coefficient, decreasing = T),] coeffs.dt[
Also, it can allow people to make predictions at specific \lambda with new input data:
= matrix(rnorm(nrow(dt$X)*ncol(dt$X)), nrow = nrow(dt$X), ncol = ncol(dt$X))
nx <- predict(fit, newx = nx, s = c(0.1, 0.05))
pred head(pred, 20)
cv.glmnet
is the function to do cross-validation here.
<- dt$X
X <- dt$y
y <- cv.glmnet(X, y) cv.fit
Plotting the object gives the selected \lambda and corresponding Mean-Square Error.
plot(cv.fit)
We can view the selected \lambda’s and the corresponding coefficients, For example,
$lambda.min
cv.fit$lambda.1se cv.fit
lambda.min
returns the value of \lambda that gives minimum mean cross-validated error. The other \lambda saved is lambda.lse
, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min
with lambda.lse
above.
# create a function to transform coefficient of glmnet and cvglmnet to data.frame
<- function(fitobject, s) {
coeff2dt <- coef(fitobject, s)
coeffs <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
coeffs.dt
# reorder the variables in term of coefficients
return(coeffs.dt[order(coeffs.dt$coefficient, decreasing = T),])
}
coeff2dt(fitobject = cv.fit, s = "lambda.min") %>% head(20)
<- coeff2dt(fitobject = cv.fit, s = "lambda.min")
coeffs.table ggplot(data = coeffs.table) +
geom_col(aes(x = name, y = coefficient, fill = {coefficient > 0})) +
xlab(label = "") +
ggtitle(expression(paste("Lasso Coefficients with ", lambda, " = 0.0275"))) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
2 Elastic net
As an example, we can set \alpha=0.2
<- glmnet(X, y, alpha = 0.2, weights = c(rep(1, 716), rep(2, 100)), nlambda = 20)
fit2
print(fit2, digits = 3)
According to the default internal settings, the computations stop if either the fractional change in deviance down the path is less than 10^{-5} or the fraction of explained deviance reaches 0.999.
plot(fit2, xvar = "lambda", label = TRUE)
# plot against %deviance
plot(fit2, xvar = "dev", label = TRUE)
predict(fit2, newx = X[1:5, ], type = "response", s = 0.03)