loss_fn <- function(X, Y, beta) p_norm(X %*% beta - Y, 2)^2
regularizer <- function(beta) p_norm(beta, 2)^2
objective_fn <- function(X, Y, beta, lambd) {
loss_fn(X, Y, beta) + lambd * regularizer(beta)
}
mse <- function(X, Y, beta) {
(1.0 / nrow(X)) * value(loss_fn(X, Y, beta))
}Ridge Regression
Introduction
Ridge regression is a regression technique that is quite similar to unadorned least squares linear regression: simply adding an
Our goal is to find an assignment to
where
Ridge regression is an example of a shrinkage method: compared to least squares, it shrinks the parameter estimates in the hopes of reducing variance, improving prediction accuracy, and aiding interpretation.
In this example, we show how to fit a ridge regression model using CVXR, how to evaluate the model, and how to tune the hyperparameter
Writing the Objective Function
We can decompose the objective function as the sum of a least squares loss function and an
Generating Data
Because ridge regression encourages the parameter estimates to be small, it tends to lead to models with less variance than those fit with vanilla linear regression. We generate a small dataset that will illustrate this.
set.seed(1)
m <- 100
n <- 20
sigma <- 5
## Generate true coefficients
beta_star <- rnorm(n)
## Generate an ill-conditioned data matrix
X <- matrix(rnorm(m * n), nrow = m, ncol = n)
## Corrupt the observations with additive Gaussian noise
Y <- X %*% beta_star + rnorm(m, sd = sigma)
## Split into train and test sets
X_train <- X[1:50, ]
Y_train <- Y[1:50]
X_test <- X[51:100, ]
Y_test <- Y[51:100]Fitting the Model
All we need to do to fit the model is create a CVXR problem where the objective is to minimize the objective function defined above. We solve this for many values of
beta <- Variable(n)
lambd_values <- 10^seq(-2, 2, length.out = 50)
train_errors <- numeric(length(lambd_values))
test_errors <- numeric(length(lambd_values))
beta_values <- vector("list", length(lambd_values))
for (i in seq_along(lambd_values)) {
problem <- Problem(Minimize(objective_fn(X_train, Y_train, beta,
lambd_values[i])))
result <- psolve(problem, solver = "CLARABEL")
check_solver_status(problem)
train_errors[i] <- mse(X_train, Y_train, beta)
test_errors[i] <- mse(X_test, Y_test, beta)
beta_values[[i]] <- value(beta)
}Warning: Solution may be inaccurate. Try another solver, adjusting the solver settings,
or solve with `verbose = TRUE` for more information.
Solution may be inaccurate. Try another solver, adjusting the solver settings,
or solve with `verbose = TRUE` for more information.
Solution may be inaccurate. Try another solver, adjusting the solver settings,
or solve with `verbose = TRUE` for more information.
Evaluating the Model
Notice that, up to a point, penalizing the size of the parameters reduces test error at the cost of increasing the training error, trading off higher bias for lower variance; in other words, this indicates that, for our example, a properly tuned ridge regression generalizes better than a least squares linear regression.
df_error <- data.frame(lambda = lambd_values,
Train = train_errors,
Test = test_errors) |>
pivot_longer(-lambda, names_to = "Set", values_to = "MSE")
ggplot(df_error, aes(x = lambda, y = MSE, color = Set)) +
geom_line() +
scale_x_log10() +
scale_color_manual(values = c(Train = "blue", Test = "red")) +
labs(x = expression(lambda), y = "Mean Squared Error (MSE)",
title = "Mean Squared Error (MSE)") +
theme_minimal()
Regularization Path
As expected, increasing
beta_mat <- do.call(cbind, beta_values)
df_path <- data.frame(lambda = lambd_values, t(beta_mat)) |>
pivot_longer(-lambda, names_to = "Coefficient", values_to = "Value")
ggplot(df_path, aes(x = lambda, y = Value, color = Coefficient)) +
geom_line() +
scale_x_log10() +
labs(x = expression(lambda), y = expression(beta[i]),
title = "Regularization Path") +
theme_minimal() +
guides(color = "none")
Session Info
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_1.3.2 ggplot2_4.0.2 CVXR_1.8.1
loaded via a namespace (and not attached):
[1] gmp_0.7-5.1 generics_0.1.4 clarabel_0.11.2 slam_0.1-55
[5] lattice_0.22-9 digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5
[9] grid_4.5.2 RColorBrewer_1.1-3 fastmap_1.2.0 rprojroot_2.1.1
[13] jsonlite_2.0.0 Matrix_1.7-4 ECOSolveR_0.6.1 backports_1.5.0
[17] scs_3.2.7 purrr_1.2.1 Rmosek_11.1.1 scales_1.4.0
[21] codetools_0.2-20 cli_3.6.5 rlang_1.1.7 Rglpk_0.6-5.1
[25] withr_3.0.2 yaml_2.3.12 otel_0.2.0 tools_4.5.2
[29] osqp_1.0.0 Rcplex_0.3-8 checkmate_2.3.4 dplyr_1.2.0
[33] here_1.0.2 gurobi_13.0-1 vctrs_0.7.1 R6_2.6.1
[37] lifecycle_1.0.5 htmlwidgets_1.6.4 pkgconfig_2.0.3 cccp_0.3-3
[41] pillar_1.11.1 gtable_0.3.6 glue_1.8.0 Rcpp_1.1.1
[45] xfun_0.56 tibble_3.3.1 tidyselect_1.2.1 knitr_1.51
[49] dichromat_2.0-0.1 highs_1.12.0-3 farver_2.1.2 htmltools_0.5.9
[53] labeling_0.4.3 rmarkdown_2.30 piqp_0.6.2 compiler_4.5.2
[57] S7_0.2.1