Getting Equivalent Results from `glmnet` and `CVXR`
Introduction
We’ve had several questions of the following type:
When I fit the same model in glmnet
and CVXR
, why are the
results different?
For example, see this.
Obviously, unless one actually solves the same problem in both
places, there’s no reason to expect the same result. The
documentation for glmnet::glmnet
clearly states the optimization
objective and so one just has to ensure that the CVXR
objective also
matches that.
We illustrate below.
Lasso
Consider a simple Lasso fit from the glmnet
example, for a fixed
\(\lambda\).
set.seed(123)
n <- 100; p <- 20; thresh <- 1e-12; lambda <- .05
x <- matrix(rnorm(n * p), n, p); xDesign <- cbind(1, x)
y <- rnorm(n)
fit1 <- glmnet(x,y, lambda = lambda, thresh = thresh)
The glmnet
documentation notes that the objective being maximized,
in the default invocation, is
\[ \frac{1}{2n}\|(y - X\beta)\|_2^2 + \lambda \|\beta_{-1}\|_1, \]
where \(\beta_{-1}\) is the beta vector excluding the first component, the intercept. Yes, the intercept is not penalized in the default invocation!
So we will use this objective with CVXR
in the problem
specification.
beta <- Variable(p + 1)
obj <- sum_squares(y - xDesign %*% beta) / (2 * n) + lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh, verbose = TRUE)
## -----------------------------------------------------------------
## OSQP v0.6.3 - Operator Splitting QP Solver
## (c) Bartolomeo Stellato, Goran Banjac
## University of Oxford - Stanford University 2021
## -----------------------------------------------------------------
## problem: variables n = 141, constraints m = 140
## nnz(P) + nnz(A) = 2380
## settings: linear system solver = qdldl,
## eps_abs = 1.0e-05, eps_rel = 1.0e-05,
## eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
## rho = 1.00e-01 (adaptive),
## sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
## check_termination: on (interval 25),
## scaling: on, scaled_termination: off
## warm start: on, polish: on, time_limit: off
##
## iter objective pri res dua res rho time
## 1 -8.0000e+00 8.00e+00 3.95e+01 1.00e-01 6.01e-04s
## 125 3.7110e-01 8.16e-06 2.07e-08 9.06e-01 1.78e-03s
## plsh 3.7110e-01 3.72e-16 1.07e-16 -------- 2.15e-03s
##
## status: solved
## solution polish: successful
## number of iterations: 125
## optimal objective: 0.3711
## run time: 2.15e-03s
## optimal rho estimate: 2.90e+00
We can print the coefficients side-by-side from glmnet
and CVXR
to
compare. The results below should be close, and any differences are
minor, due to different solver implementations.
est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit1)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
kable_styling("striped") %>%
column_spec(1:3, background = "#ececec")
CVXR.est | GLMNET.est | |
---|---|---|
\(\beta_{0}\) | -0.125 | -0.126 |
\(\beta_{1}\) | -0.022 | -0.028 |
\(\beta_{2}\) | 0.000 | -0.002 |
\(\beta_{3}\) | 0.101 | 0.104 |
\(\beta_{4}\) | 0.000 | 0.000 |
\(\beta_{5}\) | 0.000 | 0.000 |
\(\beta_{6}\) | 0.000 | 0.000 |
\(\beta_{7}\) | 0.000 | 0.000 |
\(\beta_{8}\) | -0.094 | -0.091 |
\(\beta_{9}\) | 0.000 | 0.000 |
\(\beta_{10}\) | 0.000 | 0.000 |
\(\beta_{11}\) | 0.106 | 0.105 |
\(\beta_{12}\) | 0.000 | 0.000 |
\(\beta_{13}\) | -0.057 | -0.063 |
\(\beta_{14}\) | 0.000 | 0.000 |
\(\beta_{15}\) | 0.000 | 0.000 |
\(\beta_{16}\) | 0.000 | 0.000 |
\(\beta_{17}\) | 0.000 | 0.000 |
\(\beta_{18}\) | 0.000 | 0.000 |
\(\beta_{19}\) | 0.000 | 0.000 |
\(\beta_{20}\) | -0.087 | -0.083 |
## Testthat Results: No output is good
A Penalized Logistic Example
We now consider a logistic fit, again with a penalized term with a specified \(\lambda\).
lambda <- .025
y2 <- sample(x = c(0, 1), size = n, replace = TRUE)
fit2 <- glmnet(x, y2, lambda = lambda, thresh = thresh, family = "binomial")
For logistic regression, the glmnet
documentation states that the
objective minimized is the negative log-likelihood divided by \(n\) plus
the penalty term which once again excludes the intercept in the
default invocation. Below is the CVXR
formulation, where we use the
logistic
atom as noted earlier in our other example on logistic
regression.
beta <- Variable(p + 1)
obj2 <- (sum(xDesign[y2 <= 0, ] %*% beta) + sum(logistic(-xDesign %*% beta))) / n +
lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj2))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)
Once again, the results below should be close enough.
est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
kable_styling("striped") %>%
column_spec(1:3, background = "#ececec")
CVXR.est | GLMNET.est | |
---|---|---|
\(\beta_{0}\) | -0.228 | -0.226 |
\(\beta_{1}\) | 0.000 | 0.000 |
\(\beta_{2}\) | 0.044 | 0.048 |
\(\beta_{3}\) | 0.000 | 0.000 |
\(\beta_{4}\) | 0.250 | 0.252 |
\(\beta_{5}\) | 0.000 | 0.000 |
\(\beta_{6}\) | 0.000 | 0.000 |
\(\beta_{7}\) | -0.786 | -0.785 |
\(\beta_{8}\) | 0.000 | 0.000 |
\(\beta_{9}\) | -0.083 | -0.076 |
\(\beta_{10}\) | 0.018 | 0.016 |
\(\beta_{11}\) | 0.091 | 0.084 |
\(\beta_{12}\) | 0.198 | 0.203 |
\(\beta_{13}\) | -0.307 | -0.323 |
\(\beta_{14}\) | 0.266 | 0.269 |
\(\beta_{15}\) | -0.110 | -0.114 |
\(\beta_{16}\) | -0.004 | -0.028 |
\(\beta_{17}\) | 0.000 | 0.000 |
\(\beta_{18}\) | 0.000 | 0.000 |
\(\beta_{19}\) | 0.000 | 0.000 |
\(\beta_{20}\) | 0.000 | 0.000 |
## Testthat Results: No output is good
Session Info
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] glmnet_4.1-8 Matrix_1.7-1 kableExtra_1.4.0 CVXR_1.0-15
## [5] testthat_3.2.1.1 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] shape_1.4.6.1 xfun_0.49 bslib_0.8.0 lattice_0.22-6
## [5] vctrs_0.6.5 tools_4.4.2 Rmosek_10.2.0 fansi_1.0.6
## [9] highr_0.11 desc_1.4.3 assertthat_0.2.1 lifecycle_1.0.4
## [13] compiler_4.4.2 stringr_1.5.1 brio_1.1.5 munsell_0.5.1
## [17] gurobi_11.0-0 codetools_0.2-20 ECOSolveR_0.5.5 htmltools_0.5.8.1
## [21] sass_0.4.9 cccp_0.3-1 yaml_2.3.10 gmp_0.7-5
## [25] pillar_1.9.0 jquerylib_0.1.4 rcbc_0.1.0.9001 Rcplex_0.3-6
## [29] clarabel_0.9.0.1 cachem_1.1.0 iterators_1.0.14 foreach_1.5.2
## [33] digest_0.6.37 stringi_1.8.4 slam_0.1-54 bookdown_0.41
## [37] splines_4.4.2 rprojroot_2.0.4 fastmap_1.2.0 grid_4.4.2
## [41] colorspace_2.1-1 cli_3.6.3 magrittr_2.0.3 utf8_1.2.4
## [45] survival_3.7-0 Rmpfr_0.9-5 scales_1.3.0 bit64_4.5.2
## [49] rmarkdown_2.29 bit_4.5.0 blogdown_1.19 evaluate_1.0.1
## [53] knitr_1.48 Rglpk_0.6-5.1 viridisLite_0.4.2 rlang_1.1.4
## [57] Rcpp_1.0.13-1 glue_1.8.0 xml2_1.3.6 osqp_0.6.3.3
## [61] pkgload_1.4.0 svglite_2.1.3 rstudioapi_0.17.1 jsonlite_1.8.9
## [65] R6_2.5.1 systemfonts_1.1.0