Logistic Regression with l1 Regularization

Author

CVXPY Developers and Balasubramanian Narasimhan

Introduction

In this example, we use CVXR to train a logistic regression classifier with \(\ell_1\) regularization. We are given data \((x_i, y_i)\), \(i = 1, \ldots, m\). The \(x_i \in \mathbf{R}^n\) are feature vectors, while the \(y_i \in \{0, 1\}\) are associated boolean classes.

Our goal is to construct a linear classifier \(\hat{y} = \mathbb{1}[\beta^T x > 0]\), which is \(1\) when \(\beta^T x\) is positive and \(0\) otherwise. We model the posterior probabilities of the classes given the data linearly, with

\[ \log \frac{\Pr(Y = 1 \mid X = x)}{\Pr(Y = 0 \mid X = x)} = \beta^T x. \]

This implies that

\[ \Pr(Y = 1 \mid X = x) = \frac{\exp(\beta^T x)}{1 + \exp(\beta^T x)}, \quad \Pr(Y = 0 \mid X = x) = \frac{1}{1 + \exp(\beta^T x)}. \]

We fit \(\beta\) by maximizing the log-likelihood of the data, plus a regularization term \(\lambda \|\beta\|_1\) with \(\lambda > 0\):

\[ \ell(\beta) = \sum_{i=1}^{m} y_i \beta^T x_i - \log(1 + \exp(\beta^T x_i)) - \lambda \|\beta\|_1. \]

Because \(\ell\) is a concave function of \(\beta\), this is a convex optimization problem.

Generating Data

In the following code we generate data with \(n = 50\) features by randomly choosing \(x_i\) and supplying a sparse \(\beta_{\text{true}} \in \mathbf{R}^n\). We then set \(y_i = \mathbb{1}[\beta_{\text{true}}^T x_i + z_i > 0]\), where the \(z_i\) are i.i.d. normal random variables. We divide the data into training and test sets with \(m = 50\) examples each.

set.seed(1)
n <- 50
m <- 50

sigmoid <- function(z) 1 / (1 + exp(-z))

beta_true <- c(1, 0.5, -0.5, rep(0, n - 3))
X <- (matrix(runif(m * n), nrow = m, ncol = n) - 0.5) * 10
Y <- round(sigmoid(X %*% beta_true + rnorm(m, sd = 0.5)))

X_test <- (matrix(runif(2 * m * n), nrow = 2 * m, ncol = n) - 0.5) * 10
Y_test <- round(sigmoid(X_test %*% beta_true + rnorm(2 * m, sd = 0.5)))

Formulating the Problem

We next formulate the optimization problem using CVXR. The logistic atom in CVXR computes \(f(z) = \log(1 + e^z)\), which we use to express the log-likelihood.

beta <- Variable(n)
log_likelihood <- sum_entries(
    multiply(Y, X %*% beta) - logistic(X %*% beta)
)
Warning: `multiply()` is deprecated. Use `x * y` instead.
This warning is displayed once per session.

Fitting the Model

We solve the optimization problem for a range of \(\lambda\) to compute a trade-off curve. We then plot the train and test error over the trade-off curve. A reasonable choice of \(\lambda\) is the value that minimizes the test error.

## Classification error function
calc_error <- function(scores, labels) {
    preds <- ifelse(scores > 0, 1, 0)
    sum(abs(preds - labels)) / length(labels)
}
trials <- 100
train_error <- numeric(trials)
test_error  <- numeric(trials)
lambda_vals <- 10^seq(-2, 0, length.out = trials)
beta_vals   <- vector("list", trials)

for (i in seq_len(trials)) {
    problem <- Problem(Maximize(log_likelihood / m -
                                lambda_vals[i] * p_norm(beta, 1)))
    result <- psolve(problem)
    check_solver_status(problem)
    beta_hat <- value(beta)
    train_error[i] <- calc_error(X %*% beta_hat, Y)
    test_error[i]  <- calc_error(X_test %*% beta_hat, Y_test)
    beta_vals[[i]] <- beta_hat
}
Warning: Solution may be inaccurate. Try another solver, adjusting the solver settings,
or solve with `verbose = TRUE` for more information.
df_error <- data.frame(lambda = lambda_vals,
                       Train = train_error,
                       Test = test_error) |>
    pivot_longer(-lambda, names_to = "Set", values_to = "Error")
ggplot(df_error, aes(x = lambda, y = Error, color = Set)) +
    geom_line() +
    scale_x_log10() +
    scale_color_manual(values = c(Train = "blue", Test = "red")) +
    labs(x = expression(lambda), y = "Classification Error",
         title = "Train vs. Test Error") +
    theme_minimal()

Regularization Path

We also plot the regularization path, or the \(\beta_i\) versus \(\lambda\). Notice that a few features remain non-zero longer for larger \(\lambda\) than the rest, which suggests that these features are the most important.

beta_mat <- do.call(cbind, beta_vals)
df_path <- data.frame(lambda = lambda_vals, t(beta_mat)) |>
    pivot_longer(-lambda, names_to = "Coefficient", values_to = "Value")
ggplot(df_path, aes(x = lambda, y = Value, color = Coefficient)) +
    geom_line() +
    scale_x_log10() +
    labs(x = expression(lambda), y = expression(beta[i]),
         title = "Regularization Path") +
    theme_minimal() +
    guides(color = "none")

Comparing True and Reconstructed Coefficients

We plot the true \(\beta\) versus the reconstructed \(\beta\), as chosen to minimize error on the test set. The non-zero coefficients are reconstructed with good accuracy. There are a few values in the reconstructed \(\beta\) that are non-zero but should be zero.

idx <- which.min(test_error)
df_coef <- data.frame(i = seq_along(beta_true),
                      True = beta_true,
                      Reconstructed = drop(beta_vals[[idx]])) |>
    pivot_longer(-i, names_to = "Type", values_to = "Value")
ggplot(df_coef, aes(x = i, y = Value, color = Type)) +
    geom_line() +
    scale_color_manual(values = c(True = "blue", Reconstructed = "red")) +
    labs(x = "i", y = expression(beta[i]),
         title = "True vs. Reconstructed Coefficients") +
    theme_minimal()

Session Info

R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.3.2   ggplot2_4.0.2 CVXR_1.8.1   

loaded via a namespace (and not attached):
 [1] gmp_0.7-5.1        generics_0.1.4     clarabel_0.11.2    slam_0.1-55       
 [5] lattice_0.22-9     digest_0.6.39      magrittr_2.0.4     evaluate_1.0.5    
 [9] grid_4.5.3         RColorBrewer_1.1-3 fastmap_1.2.0      rprojroot_2.1.1   
[13] jsonlite_2.0.0     Matrix_1.7-4       ECOSolveR_0.6.1    backports_1.5.0   
[17] scs_3.2.7          purrr_1.2.1        Rmosek_11.1.1      scales_1.4.0      
[21] codetools_0.2-20   cli_3.6.5          rlang_1.1.7        Rglpk_0.6-5.1     
[25] withr_3.0.2        yaml_2.3.12        otel_0.2.0         tools_4.5.3       
[29] osqp_1.0.0         checkmate_2.3.4    dplyr_1.2.0        here_1.0.2        
[33] gurobi_13.0-1      vctrs_0.7.1        R6_2.6.1           lifecycle_1.0.5   
[37] htmlwidgets_1.6.4  pkgconfig_2.0.3    cccp_0.3-3         pillar_1.11.1     
[41] gtable_0.3.6       glue_1.8.0         Rcpp_1.1.1         xfun_0.56         
[45] tibble_3.3.1       tidyselect_1.2.1   knitr_1.51         dichromat_2.0-0.1 
[49] highs_1.12.0-3     farver_2.1.2       htmltools_0.5.9    labeling_0.4.3    
[53] rmarkdown_2.30     piqp_0.6.2         compiler_4.5.3     S7_0.2.1          

References