set.seed(1)
n <- 20
m <- 1000
TEST <- m
DENSITY <- 0.2
beta_true <- rnorm(n)
idxs <- sample.int(n, size = as.integer((1 - DENSITY) * n), replace = FALSE)
beta_true[idxs] <- 0
offset <- 0
sigma <- 45
X <- matrix(rnorm(m * n, sd = 5), nrow = m, ncol = n)
Y <- as.vector(sign(X %*% beta_true + offset + rnorm(m, sd = sigma)))
X_test <- matrix(rnorm(TEST * n, sd = 5), nrow = TEST, ncol = n)
Y_test <- as.vector(sign(X_test %*% beta_true + offset + rnorm(TEST, sd = sigma)))Support Vector Machine with l1 Regularization
Introduction
In this example we use CVXR to train an SVM classifier with \(\ell_1\)-regularization. We are given data \((x_i, y_i)\), \(i = 1, \ldots, m\). The \(x_i \in \mathbf{R}^n\) are feature vectors, while the \(y_i \in \{\pm 1\}\) are associated boolean outcomes. Our goal is to construct a good linear classifier \(\hat{y} = \text{sign}(\beta^T x - v)\). We find the parameters \(\beta, v\) by minimizing the (convex) function
\[ f(\beta, v) = \frac{1}{m} \sum_i \left(1 - y_i (\beta^T x_i - v)\right)_+ + \lambda \|\beta\|_1. \]
The first term is the average hinge loss. The second term shrinks the coefficients in \(\beta\) and encourages sparsity. The scalar \(\lambda \geq 0\) is a (regularization) parameter. Minimizing \(f(\beta, v)\) simultaneously selects features and fits the classifier.
Example
In the following code we generate data with \(n = 20\) features by randomly choosing \(x_i\) and a sparse \(\beta_{\text{true}} \in \mathbf{R}^n\). We then set \(y_i = \text{sign}(\beta_{\text{true}}^T x_i - v_{\text{true}} - z_i)\), where the \(z_i\) are i.i.d. normal random variables. We divide the data into training and test sets with \(m = 1000\) examples each.
We next formulate the optimization problem using CVXR.
beta <- Variable(n)
v <- Variable()
loss <- sum_entries(pos(1 - multiply(Y, X %*% beta - v)))Warning: `multiply()` is deprecated. Use `x * y` instead.
This warning is displayed once per session.
reg <- p_norm(beta, 1)We solve the optimization problem for a range of \(\lambda\) to compute a trade-off curve. We then plot the train and test error over the trade-off curve. A reasonable choice of \(\lambda\) is the value that minimizes the test error.
TRIALS <- 100
train_error <- numeric(TRIALS)
test_error <- numeric(TRIALS)
lambda_vals <- 10^seq(-2, 0, length.out = TRIALS)
beta_vals <- vector("list", TRIALS)
for (i in seq_len(TRIALS)) {
prob <- Problem(Minimize(loss / m + lambda_vals[i] * reg))
result <- psolve(prob)
check_solver_status(prob)
beta_hat <- drop(value(beta))
v_hat <- drop(value(v))
train_error[i] <- sum(sign(X %*% beta_true + offset) !=
sign(X %*% beta_hat - v_hat)) / m
test_error[i] <- sum(sign(X_test %*% beta_true + offset) !=
sign(X_test %*% beta_hat - v_hat)) / TEST
beta_vals[[i]] <- beta_hat
}df_error <- data.frame(lambda = lambda_vals,
Train = train_error,
Test = test_error) |>
pivot_longer(-lambda, names_to = "Set", values_to = "Error")
ggplot(df_error, aes(x = lambda, y = Error, color = Set)) +
geom_line() +
scale_x_log10() +
scale_color_manual(values = c(Train = "blue", Test = "red")) +
labs(x = expression(lambda), y = "Classification Error",
title = "Train vs. Test Error") +
theme_minimal()
Regularization Path
We also plot the regularization path, or the \(\beta_i\) versus \(\lambda\). Notice that the \(\beta_i\) do not necessarily decrease monotonically as \(\lambda\) increases. Four features remain non-zero longer for larger \(\lambda\) than the rest, which suggests that these features are the most important. In fact \(\beta_{\text{true}}\) had 4 non-zero values.
beta_mat <- do.call(cbind, lapply(beta_vals, as.vector))
df_path <- data.frame(lambda = lambda_vals, t(beta_mat)) |>
pivot_longer(-lambda, names_to = "Coefficient", values_to = "Value")
ggplot(df_path, aes(x = lambda, y = Value, color = Coefficient)) +
geom_line() +
scale_x_log10() +
labs(x = expression(lambda), y = expression(beta[i]),
title = "Regularization Path") +
theme_minimal() +
guides(color = "none")
Session Info
R version 4.5.3 (2026-03-11)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_1.3.2 ggplot2_4.0.2 CVXR_1.8.1
loaded via a namespace (and not attached):
[1] gmp_0.7-5.1 generics_0.1.4 clarabel_0.11.2 slam_0.1-55
[5] lattice_0.22-9 digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5
[9] grid_4.5.3 RColorBrewer_1.1-3 fastmap_1.2.0 rprojroot_2.1.1
[13] jsonlite_2.0.0 Matrix_1.7-4 ECOSolveR_0.6.1 backports_1.5.0
[17] scs_3.2.7 purrr_1.2.1 Rmosek_11.1.1 scales_1.4.0
[21] codetools_0.2-20 cli_3.6.5 rlang_1.1.7 Rglpk_0.6-5.1
[25] withr_3.0.2 yaml_2.3.12 otel_0.2.0 tools_4.5.3
[29] osqp_1.0.0 checkmate_2.3.4 dplyr_1.2.0 here_1.0.2
[33] gurobi_13.0-1 vctrs_0.7.1 R6_2.6.1 lifecycle_1.0.5
[37] htmlwidgets_1.6.4 pkgconfig_2.0.3 cccp_0.3-3 pillar_1.11.1
[41] gtable_0.3.6 glue_1.8.0 Rcpp_1.1.1 xfun_0.56
[45] tibble_3.3.1 tidyselect_1.2.1 knitr_1.51 dichromat_2.0-0.1
[49] highs_1.12.0-3 farver_2.1.2 htmltools_0.5.9 labeling_0.4.3
[53] rmarkdown_2.30 piqp_0.6.2 compiler_4.5.3 S7_0.2.1