Getting Equivalent Results from `glmnet` and `CVXR`

Introduction

We’ve had several questions of the following type:

When I fit the same model in glmnet and CVXR, why are the results different?

For example, see this.

Obviously, unless one actually solves the same problem in both places, there’s no reason to expect the same result. The documentation for glmnet::glmnet clearly states the optimization objective and so one just has to ensure that the CVXR objective also matches that.

We illustrate below.

Lasso

Consider a simple Lasso fit from the glmnet example, for a fixed \(\lambda\).

set.seed(123)
n <- 100; p <- 20; thresh <- 1e-12; lambda <- .05
x <-  matrix(rnorm(n * p), n, p); xDesign <- cbind(1, x)
y <-  rnorm(n)
fit1 <-  glmnet(x,y, lambda = lambda, thresh = thresh)

The glmnet documentation notes that the objective being maximized, in the default invocation, is

\[ \frac{1}{2n}\|(y - X\beta)\|_2^2 + \lambda \|\beta_{-1}\|_1, \]

where \(\beta_{-1}\) is the beta vector excluding the first component, the intercept. Yes, the intercept is not penalized in the default invocation!

So we will use this objective with CVXR in the problem specification.

beta <- Variable(p + 1)
obj <- sum_squares(y - xDesign %*% beta) / (2 * n) + lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh, verbose = TRUE)
## -----------------------------------------------------------------
##            OSQP v0.6.0  -  Operator Splitting QP Solver
##               (c) Bartolomeo Stellato,  Goran Banjac
##         University of Oxford  -  Stanford University 2019
## -----------------------------------------------------------------
## problem:  variables n = 141, constraints m = 140
##           nnz(P) + nnz(A) = 2380
## settings: linear system solver = qdldl,
##           eps_abs = 1.0e-05, eps_rel = 1.0e-05,
##           eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
##           rho = 1.00e-01 (adaptive),
##           sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
##           check_termination: on (interval 25),
##           scaling: on, scaled_termination: off
##           warm start: on, polish: on, time_limit: off
## 
## iter   objective    pri res    dua res    rho        time
##    1  -8.0000e+00   8.00e+00   3.95e+01   1.00e-01   2.74e-04s
##  125   3.7110e-01   8.16e-06   2.07e-08   9.06e-01   1.34e-03s
## plsh   3.7110e-01   3.72e-16   1.07e-16   --------   1.53e-03s
## 
## status:               solved
## solution polish:      successful
## number of iterations: 125
## optimal objective:    0.3711
## run time:             1.53e-03s
## optimal rho estimate: 2.90e+00

We can print the coefficients side-by-side from glmnet and CVXR to compare. The results below should be close, and any differences are minor, due to different solver implementations.

est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit1)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
    kable_styling("striped") %>%
    column_spec(1:3, background = "#ececec")
CVXR.est GLMNET.est
\(\beta_{0}\) -0.125 -0.126
\(\beta_{1}\) -0.022 -0.028
\(\beta_{2}\) 0.000 -0.002
\(\beta_{3}\) 0.101 0.104
\(\beta_{4}\) 0.000 0.000
\(\beta_{5}\) 0.000 0.000
\(\beta_{6}\) 0.000 0.000
\(\beta_{7}\) 0.000 0.000
\(\beta_{8}\) -0.094 -0.091
\(\beta_{9}\) 0.000 0.000
\(\beta_{10}\) 0.000 0.000
\(\beta_{11}\) 0.106 0.105
\(\beta_{12}\) 0.000 0.000
\(\beta_{13}\) -0.057 -0.063
\(\beta_{14}\) 0.000 0.000
\(\beta_{15}\) 0.000 0.000
\(\beta_{16}\) 0.000 0.000
\(\beta_{17}\) 0.000 0.000
\(\beta_{18}\) 0.000 0.000
\(\beta_{19}\) 0.000 0.000
\(\beta_{20}\) -0.087 -0.083

A Penalized Logistic Example

We now consider a logistic fit, again with a penalized term with a specified \(\lambda\).

lambda <- .025
y2 <- sample(x = c(0, 1), size = n, replace = TRUE)
fit2 <-  glmnet(x, y2, lambda = lambda, thresh = thresh, family = "binomial")

For logistic regression, the glmnet documentation states that the objective minimized is the negative log-likelihood divided by \(n\) plus the penalty term which once again excludes the intercept in the default invocation. Below is the CVXR formulation, where we use the logistic atom as noted earlier in our other example on logistic regression.

beta <- Variable(p + 1)
obj2 <- (sum(xDesign[y2 <= 0, ] %*% beta) + sum(logistic(-xDesign %*% beta))) / n +
    lambda * p_norm(beta[-1], 1)
prob <- Problem(Minimize(obj2))
result <- solve(prob, FEASTOL = thresh, RELTOL = thresh, ABSTOL = thresh)

Once again, the results below should be close enough.

est.table <- data.frame("CVXR.est" = result$getValue(beta), "GLMNET.est" = as.vector(coef(fit2)))
rownames(est.table) <- paste0("$\\beta_{", 0:p, "}$")
knitr::kable(est.table, format = "html", digits = 3) %>%
    kable_styling("striped") %>%
    column_spec(1:3, background = "#ececec")
CVXR.est GLMNET.est
\(\beta_{0}\) -0.228 -0.226
\(\beta_{1}\) 0.000 0.000
\(\beta_{2}\) 0.044 0.048
\(\beta_{3}\) 0.000 0.000
\(\beta_{4}\) 0.250 0.252
\(\beta_{5}\) 0.000 0.000
\(\beta_{6}\) 0.000 0.000
\(\beta_{7}\) -0.786 -0.785
\(\beta_{8}\) 0.000 0.000
\(\beta_{9}\) -0.083 -0.076
\(\beta_{10}\) 0.018 0.016
\(\beta_{11}\) 0.091 0.084
\(\beta_{12}\) 0.198 0.203
\(\beta_{13}\) -0.307 -0.323
\(\beta_{14}\) 0.266 0.269
\(\beta_{15}\) -0.110 -0.114
\(\beta_{16}\) -0.004 -0.028
\(\beta_{17}\) 0.000 0.000
\(\beta_{18}\) 0.000 0.000
\(\beta_{19}\) 0.000 0.000
\(\beta_{20}\) 0.000 0.000

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin19.5.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.10_1/lib/libopenblasp-r0.3.10.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] glmnet_4.0-2     Matrix_1.2-18    kableExtra_1.1.0 CVXR_1.0-9      
## 
## loaded via a namespace (and not attached):
##  [1] shape_1.4.4       xfun_0.15         slam_0.1-47       splines_4.0.2    
##  [5] lattice_0.20-41   Rmosek_9.2.3      colorspace_1.4-1  vctrs_0.3.2      
##  [9] htmltools_0.5.0   viridisLite_0.3.0 yaml_2.2.1        gmp_0.6-0        
## [13] survival_3.1-12   rlang_0.4.7       pillar_1.4.6      glue_1.4.1       
## [17] Rmpfr_0.8-1       Rcplex_0.3-3      bit64_0.9-7       foreach_1.5.0    
## [21] lifecycle_0.2.0   stringr_1.4.0     munsell_0.5.0     blogdown_0.19    
## [25] gurobi_9.0.3.1    rvest_0.3.5       codetools_0.2-16  evaluate_0.14    
## [29] knitr_1.28        cccp_0.2-4        highr_0.8         Rcpp_1.0.5       
## [33] readr_1.3.1       scales_1.1.1      osqp_0.6.0.3      webshot_0.5.2    
## [37] bit_1.1-15.2      hms_0.5.3         digest_0.6.25     stringi_1.4.6    
## [41] bookdown_0.19     Rglpk_0.6-4       grid_4.0.2        ECOSolveR_0.5.3  
## [45] tools_4.0.2       magrittr_1.5      tibble_3.0.3      crayon_1.3.4     
## [49] pkgconfig_2.0.3   ellipsis_0.3.1    rcbc_0.1.0.9001   xml2_1.3.2       
## [53] assertthat_0.2.1  rmarkdown_2.3     httr_1.4.2        rstudioapi_0.11  
## [57] iterators_1.0.12  R6_2.4.1          compiler_4.0.2

Source

R Markdown