Logistic Regression with l1 Regularization

Author

CVXPY Developers and Balasubramanian Narasimhan

Introduction

In this example, we use CVXR to train a logistic regression classifier with 1 regularization. We are given data (xi,yi), i=1,,m. The xiRn are feature vectors, while the yi{0,1} are associated boolean classes.

Our goal is to construct a linear classifier y^=1[βTx>0], which is 1 when βTx is positive and 0 otherwise. We model the posterior probabilities of the classes given the data linearly, with

logPr(Y=1X=x)Pr(Y=0X=x)=βTx.

This implies that

Pr(Y=1X=x)=exp(βTx)1+exp(βTx),Pr(Y=0X=x)=11+exp(βTx).

We fit β by maximizing the log-likelihood of the data, plus a regularization term λβ1 with λ>0:

(β)=i=1myiβTxilog(1+exp(βTxi))λβ1.

Because is a concave function of β, this is a convex optimization problem.

Generating Data

In the following code we generate data with n=50 features by randomly choosing xi and supplying a sparse βtrueRn. We then set yi=1[βtrueTxi+zi>0], where the zi are i.i.d. normal random variables. We divide the data into training and test sets with m=50 examples each.

set.seed(1)
n <- 50
m <- 50

sigmoid <- function(z) 1 / (1 + exp(-z))

beta_true <- c(1, 0.5, -0.5, rep(0, n - 3))
X <- (matrix(runif(m * n), nrow = m, ncol = n) - 0.5) * 10
Y <- round(sigmoid(X %*% beta_true + rnorm(m, sd = 0.5)))

X_test <- (matrix(runif(2 * m * n), nrow = 2 * m, ncol = n) - 0.5) * 10
Y_test <- round(sigmoid(X_test %*% beta_true + rnorm(2 * m, sd = 0.5)))

Formulating the Problem

We next formulate the optimization problem using CVXR. The logistic atom in CVXR computes f(z)=log(1+ez), which we use to express the log-likelihood.

beta <- Variable(n)
log_likelihood <- sum_entries(
    multiply(Y, X %*% beta) - logistic(X %*% beta)
)
Warning: `multiply()` is deprecated. Use `x * y` instead.
This warning is displayed once per session.

Fitting the Model

We solve the optimization problem for a range of λ to compute a trade-off curve. We then plot the train and test error over the trade-off curve. A reasonable choice of λ is the value that minimizes the test error.

## Classification error function
calc_error <- function(scores, labels) {
    preds <- ifelse(scores > 0, 1, 0)
    sum(abs(preds - labels)) / length(labels)
}
trials <- 100
train_error <- numeric(trials)
test_error  <- numeric(trials)
lambda_vals <- 10^seq(-2, 0, length.out = trials)
beta_vals   <- vector("list", trials)

for (i in seq_len(trials)) {
    problem <- Problem(Maximize(log_likelihood / m -
                                lambda_vals[i] * p_norm(beta, 1)))
    result <- psolve(problem)
    check_solver_status(problem)
    beta_hat <- value(beta)
    train_error[i] <- calc_error(X %*% beta_hat, Y)
    test_error[i]  <- calc_error(X_test %*% beta_hat, Y_test)
    beta_vals[[i]] <- beta_hat
}
Warning: Solution may be inaccurate. Try another solver, adjusting the solver settings,
or solve with `verbose = TRUE` for more information.
df_error <- data.frame(lambda = lambda_vals,
                       Train = train_error,
                       Test = test_error) |>
    pivot_longer(-lambda, names_to = "Set", values_to = "Error")
ggplot(df_error, aes(x = lambda, y = Error, color = Set)) +
    geom_line() +
    scale_x_log10() +
    scale_color_manual(values = c(Train = "blue", Test = "red")) +
    labs(x = expression(lambda), y = "Classification Error",
         title = "Train vs. Test Error") +
    theme_minimal()

Regularization Path

We also plot the regularization path, or the βi versus λ. Notice that a few features remain non-zero longer for larger λ than the rest, which suggests that these features are the most important.

beta_mat <- do.call(cbind, beta_vals)
df_path <- data.frame(lambda = lambda_vals, t(beta_mat)) |>
    pivot_longer(-lambda, names_to = "Coefficient", values_to = "Value")
ggplot(df_path, aes(x = lambda, y = Value, color = Coefficient)) +
    geom_line() +
    scale_x_log10() +
    labs(x = expression(lambda), y = expression(beta[i]),
         title = "Regularization Path") +
    theme_minimal() +
    guides(color = "none")

Comparing True and Reconstructed Coefficients

We plot the true β versus the reconstructed β, as chosen to minimize error on the test set. The non-zero coefficients are reconstructed with good accuracy. There are a few values in the reconstructed β that are non-zero but should be zero.

idx <- which.min(test_error)
df_coef <- data.frame(i = seq_along(beta_true),
                      True = beta_true,
                      Reconstructed = drop(beta_vals[[idx]])) |>
    pivot_longer(-i, names_to = "Type", values_to = "Value")
ggplot(df_coef, aes(x = i, y = Value, color = Type)) +
    geom_line() +
    scale_color_manual(values = c(True = "blue", Reconstructed = "red")) +
    labs(x = "i", y = expression(beta[i]),
         title = "True vs. Reconstructed Coefficients") +
    theme_minimal()

Session Info

R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.3.2   ggplot2_4.0.2 CVXR_1.8.1   

loaded via a namespace (and not attached):
 [1] gmp_0.7-5.1        generics_0.1.4     clarabel_0.11.2    slam_0.1-55       
 [5] lattice_0.22-9     digest_0.6.39      magrittr_2.0.4     evaluate_1.0.5    
 [9] grid_4.5.2         RColorBrewer_1.1-3 fastmap_1.2.0      rprojroot_2.1.1   
[13] jsonlite_2.0.0     Matrix_1.7-4       ECOSolveR_0.6.1    backports_1.5.0   
[17] scs_3.2.7          purrr_1.2.1        Rmosek_11.1.1      scales_1.4.0      
[21] codetools_0.2-20   cli_3.6.5          rlang_1.1.7        Rglpk_0.6-5.1     
[25] withr_3.0.2        yaml_2.3.12        otel_0.2.0         tools_4.5.2       
[29] osqp_1.0.0         Rcplex_0.3-8       checkmate_2.3.4    dplyr_1.2.0       
[33] here_1.0.2         gurobi_13.0-1      vctrs_0.7.1        R6_2.6.1          
[37] lifecycle_1.0.5    htmlwidgets_1.6.4  pkgconfig_2.0.3    cccp_0.3-3        
[41] pillar_1.11.1      gtable_0.3.6       glue_1.8.0         Rcpp_1.1.1        
[45] xfun_0.56          tibble_3.3.1       tidyselect_1.2.1   knitr_1.51        
[49] dichromat_2.0-0.1  highs_1.12.0-3     farver_2.1.2       htmltools_0.5.9   
[53] labeling_0.4.3     rmarkdown_2.30     piqp_0.6.2         compiler_4.5.2    
[57] S7_0.2.1          

References