Getting Equivalent Results from `glmnet` and `CVXR`

Introduction

We’ve had several questions of the following type:

When I fit the same model in glmnet and CVXR, why are the results different?

For example, see this.

Obviously, unless one actually solves the same problem in both places, there’s no reason to expect the same result. The documentation for glmnet::glmnet clearly states the optimization objective and so one just has to ensure that the CVXR objective also matches that.

We illustrate below.

Lasso

Consider a simple Lasso fit from the glmnet example, for a fixed \(\lambda\).

set.seed(123)
n <- 100; p <- 20; thresh <- 1e-12; lambda <- .05
x <-  matrix(rnorm(n * p), n, p); xDesign <- cbind(1, x)
y <-  rnorm(n)
fit1 <-  glmnet(x,y, lambda = lambda, thresh = thresh)

The glmnet documentation notes that the objective being maximized, in the default invocation, is

\[ \frac{1}{2n}\|(y - X\beta)\|_2^2 + \lambda \|\beta_{-1}\|_1, \]

where \(\beta_{-1}\) is the beta vector excluding the first component, the intercept. Yes, the intercept is not penalized in the default invocation!

So we will use this objective with CVXR in the problem specification.

beta <- Variable(p + 1)
obj <- sum_squares(y - xDesign %*% beta) / (2 * n) + lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh, verbose = TRUE)
## -----------------------------------------------------------------
##            OSQP v0.6.3  -  Operator Splitting QP Solver
##               (c) Bartolomeo Stellato,  Goran Banjac
##         University of Oxford  -  Stanford University 2021
## -----------------------------------------------------------------
## problem:  variables n = 141, constraints m = 140
##           nnz(P) + nnz(A) = 2380
## settings: linear system solver = qdldl,
##           eps_abs = 1.0e-05, eps_rel = 1.0e-05,
##           eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
##           rho = 1.00e-01 (adaptive),
##           sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
##           check_termination: on (interval 25),
##           scaling: on, scaled_termination: off
##           warm start: on, polish: on, time_limit: off
## 
## iter   objective    pri res    dua res    rho        time
##    1  -8.0000e+00   8.00e+00   3.95e+01   1.00e-01   4.59e-04s
##  125   3.7110e-01   8.16e-06   2.07e-08   9.06e-01   1.52e-03s
## plsh   3.7110e-01   3.72e-16   1.07e-16   --------   1.78e-03s
## 
## status:               solved
## solution polish:      successful
## number of iterations: 125
## optimal objective:    0.3711
## run time:             1.78e-03s
## optimal rho estimate: 2.90e+00

We can print the coefficients side-by-side from glmnet and CVXR to compare. The results below should be close, and any differences are minor, due to different solver implementations.

est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit1)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
    kable_styling("striped") %>%
    column_spec(1:3, background = "#ececec")
CVXR.est GLMNET.est
\(\beta_{0}\) -0.125 -0.126
\(\beta_{1}\) -0.022 -0.028
\(\beta_{2}\) 0.000 -0.002
\(\beta_{3}\) 0.101 0.104
\(\beta_{4}\) 0.000 0.000
\(\beta_{5}\) 0.000 0.000
\(\beta_{6}\) 0.000 0.000
\(\beta_{7}\) 0.000 0.000
\(\beta_{8}\) -0.094 -0.091
\(\beta_{9}\) 0.000 0.000
\(\beta_{10}\) 0.000 0.000
\(\beta_{11}\) 0.106 0.105
\(\beta_{12}\) 0.000 0.000
\(\beta_{13}\) -0.057 -0.063
\(\beta_{14}\) 0.000 0.000
\(\beta_{15}\) 0.000 0.000
\(\beta_{16}\) 0.000 0.000
\(\beta_{17}\) 0.000 0.000
\(\beta_{18}\) 0.000 0.000
\(\beta_{19}\) 0.000 0.000
\(\beta_{20}\) -0.087 -0.083
## Testthat Results: No output is good

A Penalized Logistic Example

We now consider a logistic fit, again with a penalized term with a specified \(\lambda\).

lambda <- .025
y2 <- sample(x = c(0, 1), size = n, replace = TRUE)
fit2 <-  glmnet(x, y2, lambda = lambda, thresh = thresh, family = "binomial")

For logistic regression, the glmnet documentation states that the objective minimized is the negative log-likelihood divided by \(n\) plus the penalty term which once again excludes the intercept in the default invocation. Below is the CVXR formulation, where we use the logistic atom as noted earlier in our other example on logistic regression.

beta <- Variable(p + 1)
obj2 <- (sum(xDesign[y2 <= 0, ] %*% beta) + sum(logistic(-xDesign %*% beta))) / n +
    lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj2))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)

Once again, the results below should be close enough.

est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
    kable_styling("striped") %>%
    column_spec(1:3, background = "#ececec")
CVXR.est GLMNET.est
\(\beta_{0}\) -0.228 -0.226
\(\beta_{1}\) 0.000 0.000
\(\beta_{2}\) 0.044 0.048
\(\beta_{3}\) 0.000 0.000
\(\beta_{4}\) 0.250 0.252
\(\beta_{5}\) 0.000 0.000
\(\beta_{6}\) 0.000 0.000
\(\beta_{7}\) -0.786 -0.785
\(\beta_{8}\) 0.000 0.000
\(\beta_{9}\) -0.083 -0.076
\(\beta_{10}\) 0.018 0.016
\(\beta_{11}\) 0.091 0.084
\(\beta_{12}\) 0.198 0.203
\(\beta_{13}\) -0.307 -0.323
\(\beta_{14}\) 0.266 0.269
\(\beta_{15}\) -0.110 -0.114
\(\beta_{16}\) -0.004 -0.028
\(\beta_{17}\) 0.000 0.000
\(\beta_{18}\) 0.000 0.000
\(\beta_{19}\) 0.000 0.000
\(\beta_{20}\) 0.000 0.000
## Testthat Results: No output is good

Session Info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] glmnet_4.1-8     Matrix_1.7-0     kableExtra_1.4.0 CVXR_1.0-15     
## [5] testthat_3.2.1.1 here_1.0.1      
## 
## loaded via a namespace (and not attached):
##  [1] shape_1.4.6.1     xfun_0.45         bslib_0.7.0       lattice_0.22-6   
##  [5] vctrs_0.6.5       tools_4.4.1       Rmosek_10.2.0     fansi_1.0.6      
##  [9] highr_0.11        desc_1.4.3        assertthat_0.2.1  lifecycle_1.0.4  
## [13] compiler_4.4.1    stringr_1.5.1     brio_1.1.5        munsell_0.5.1    
## [17] gurobi_11.0-0     codetools_0.2-20  ECOSolveR_0.5.5   htmltools_0.5.8.1
## [21] sass_0.4.9        cccp_0.3-1        yaml_2.3.9        gmp_0.7-4        
## [25] pillar_1.9.0      jquerylib_0.1.4   rcbc_0.1.0.9001   Rcplex_0.3-6     
## [29] clarabel_0.9.0    cachem_1.1.0      iterators_1.0.14  foreach_1.5.2    
## [33] digest_0.6.36     stringi_1.8.4     slam_0.1-50       bookdown_0.40    
## [37] splines_4.4.1     rprojroot_2.0.4   fastmap_1.2.0     grid_4.4.1       
## [41] colorspace_2.1-0  cli_3.6.3         magrittr_2.0.3    utf8_1.2.4       
## [45] survival_3.7-0    Rmpfr_0.9-5       scales_1.3.0      bit64_4.0.5      
## [49] rmarkdown_2.27    bit_4.0.5         blogdown_1.19     evaluate_0.24.0  
## [53] knitr_1.48        Rglpk_0.6-5.1     viridisLite_0.4.2 rlang_1.1.4      
## [57] Rcpp_1.0.12       glue_1.7.0        xml2_1.3.6        osqp_0.6.3.3     
## [61] pkgload_1.4.0     svglite_2.1.3     rstudioapi_0.16.0 jsonlite_1.8.8   
## [65] R6_2.5.1          systemfonts_1.1.0

Source

R Markdown