# A Gentle Introduction to `CVXR`

## Introduction

Welcome to `CVXR`

: a modeling language for describing and solving
convex optimization problems that follows the natural, mathematical
notation of convex optimization rather than the requirements of any
particular solver. The purpose of this document is both to introduce
the reader to `CVXR`

and to generate excitement for its possibilities in
the field of statistics.

Convex optimization is a powerful and very general tool. As a
practical matter, the set of convex optimization problems includes
almost every optimization problem that can be solved exactly and
efficiently (i.e. without requiring an exhaustive search). If an
optimization problem can be solved, it is probably convex. This family
of problems becomes even larger if you include those that can be
solved *approximately* and efficiently. To learn more about the
mathematics and applications of convex optimization,
see Boyd and Vandenberghe 2009.

Convex optimization systems written in other languages are already
widely used in practical applications. These
include
YALMIP and
CVX
(Matlab), CVXPY (Python),
and Convex.jl
(Julia). `CVXR`

Shares a lot of its code base
with
CVXcanon and
CVXPY. As far as we know, this is the first full-featured general
convex optimization package for R.

One of the great headaches of conventional numerical optimization is
the process of deciding which algorithm to use and how to set its
parameters. In convex optimization, the particular algorithm matters
much less. So while a user of `CVXR`

is still free to choose from a
number of different algorithms and to set algorithm parameters as they
please, the vast majority of users will not need to do this. `CVXR`

will just work.

The uses for convex optimization in statistics are many and varied. A large number of parameter-fitting methods are convex, including least-squares, ridge, lasso, and isotonic regression, as well as many other kinds of problems such as maximum entropy or minimum Kullback-Leibler divergence over a finite set.

All of these examples, at least in their most basic forms, are
established enough that they already have well-crafted R packages
devoted to them. If you use `CVXR`

to solve these problems, it will
work. It will probably be slower than a custom-built algorithm—for
example, glmnet for fitting lasso or ridge regression models—but it
will work. However, this is not the true purpose of `CVXR`

. If you
want to build a well-established model, you should use one of the
well-established packages for doing so. If you want to build your
*own* model—one that is a refinement of an existing method, or
perhaps even something that has never been tried before—then `CVXR`

is the package to do it. The advantage of `CVXR`

over glmnet and the
like comes from its flexibility: A few lines of code can transform a
problem from commonplace to state-of-the-art, and can often do the
work of an entire package in the process.

This document is meant to familiarize you with the `CVXR`

package. It
assumes basic knowledge of convex optimization and statistics as well
as proficiency with R. A potential user of `CVXR`

should read this entire
document and then refer to the tutorial examples.

Happy optimizing!

## Convex Optimization

A convex optimization problem has the following form:

\[ \begin{array}{ll} \mbox{minimize} & f_0(x)\\ \mbox{subject to} & f_i(x) \leq 0, \quad i=1,\ldots,m\\ & g_i(x) = 0, \quad i=1,\ldots,p \end{array} \]

where \(x\) is the variable, \(f_0\) and \(f_1,...,f_m\) are convex, and \(g_1,...,g_p\) are affine. \(f_0\) is called the objective function, \(f_i \leq 0\) are called the inequality constraints, and \(g_i = 0\) are called the equality constraints.

In `CVXR`

, you will specify convex optimization problems in a more
convenient format than the one above.

A convex function is one that is upward curving. A concave function is downward curving. An affine function is flat, and is thus both convex and concave.

A convex optimization problem is one that attempts to minimize a convex function (or maximize a concave function) over a convex set of input points.

You can learn much more about convex optimization via Boyd and Vandenberghe (2004) as well as the CVX101 MOOC.

## ‘Hello World’

We begin with one of the simplest possible problems that presents all three of these features:

\[ \begin{array}{ll} \mbox{minimize} & x^2 + y^2 \\ \mbox{subject to} & x \geq 0, \quad 2x + y = 1 \end{array} \]

with scalar variables \(x\) and \(y\). This is a convex optimization problem with objective \(f_0(x,y) = x^2 + y^2\) and constraint functions \(f_1(x,y) = -x\) and \(g_1(x,y) = 2x - y - 1\).

Note that this problem is simple enough to be solved analytically, so
we can confirm that `CVXR`

has produced the correct answer. Here’s how
we formulate the problem in `CVXR`

.

```
# Variables minimized over
x <- Variable(1)
y <- Variable(1)
# Problem definition
objective <- Minimize(x^2 + y^2)
constraints <- list(x >= 0, 2*x + y == 1)
prob2.1 <- Problem(objective, constraints)
# Problem solution
solution2.1 <- solve(prob2.1)
solution2.1$status
```

`## [1] "optimal"`

`solution2.1$value`

`## [1] 0.2`

`solution2.1$getValue(x)`

`## [1] 0.4`

`solution2.1$getValue(y)`

`## [1] 0.2`

`# The world says 'hi' back.`

We now turn to a careful explanation of the code. The first lines
create two Variable objects, `x`

and `y`

, both of length 1
(i.e. scalar variables).

```
x <- Variable(1)
y <- Variable(1)
```

`x`

and `y`

represent what we are allowed to adjust in our problem in
order to obtain the optimal solution. They don’t have values yet, and
they won’t until after we solve the problem. For now, they are just
placeholders.

Next, we define the problem objective.

`objective <- Minimize(x^2 + y^2)`

This call to `Minimize()`

does *not* return the minimum value of the
expression `x^2 + y^2`

the way a call to the native R function `min()`

would do (after all, `x`

and `y`

don’t have values yet). Instead,
`Minimize()`

creates an Objective object, which defines the goal of
the optimization we will perform, namely to find values for `x`

and
`y`

which produce the smallest possible value of `x^2 + y^2`

.

The next line defines two constraints—an inequality constraint and an equality constraint, respectively.

`constraints <- list(x >= 0, 2*x + y == 1)`

Again, counter to what you might ordinarily expect, the expression `x >= 0`

does not return `TRUE`

or `FALSE`

the way `1.3 >= 0`

would. Instead, the `==`

and `>=`

operators have been overloaded to
return Constraint objects, which will be used by the solver to enforce
the problem’s constraints. (Without them, the solution to our problem
would simply be \(x = y = 0\).)

Next, we define our Problem object, which takes our Objective object and our two Constraint objects as inputs.

`prob2.1 <- Problem(objective, constraints)`

Problem objects are very flexible in that they can have 0 or more
constraints, and their objective can be to `Minimize()`

a convex
expression (as shown above) *or* to `Maximize()`

a concave
expression.

The call to `Problem()`

still does not actually *solve* our
optimization problem. That only happens with the call to `solve()`

.

`solution2.1 <- solve(prob2.1)`

Behind the scenes, this call translates the problem into a format that
a convex solver can understand, feeds the problem to the solver, and
then returns the results in a list. For this problem, the
list will contain among other things the optimal value of
the objective function `x^2 + y^2`

, values for `x`

and `y`

that
achieve that optimal objective value (along with a function
`solution2.1$getValue`

to retrieve them conveniently), and some
accompanying metadata such as `solution2.1$status`

, which confirms
that the solution was indeed `"optimal"`

.

`solution2.1$status`

`## [1] "optimal"`

`solution2.1$value`

`## [1] 0.2`

`solution2.1$getValue(x)`

`## [1] 0.4`

`solution2.1$getValue(y)`

`## [1] 0.2`

In general, when you apply the `solve()`

method to a Problem, several
things can happen:

`solution$status == "optimal"`

: The problem was solved. Values for the optimization variables were found, which satisfy all of the constraints and minimize/maximize the objective.`solution$status == "infeasible"`

: The problem was*not*solved because no combination of input variables exists that can satisfy all of the constraints. For a trivial example of when this might happen, consider a problem with optimization variable`x`

, and constraints`x >= 1`

and`x <= 0`

. Obviously, no value of`x`

exists that can satisfy both constraints. In this case,`solution$value`

is`+Inf`

for a minimization problem and`-Inf`

for a maximization problem, indicating infinite dissatisfaction with the result. The values for the input variables are`NA`

.`solution$status == "unbounded"`

: The problem was*not*solved because the objective can be made arbitrarily small for a minimization problem or arbitrarily large for a maximization problem. Hence there is no optimal solution because for any given solution, it is always possible to find something even more optimal. In this case,`solution$opt.val`

is`-Inf`

for a minimization problem and`+Inf`

for a maximization problem, indicating infinite satisfaction with the result. Again, the values of the the input variables are`NA`

.

### Modifying a CVXR Problem

Like any normal R object, the `Problem`

, `Minimize`

, `Maximize`

, and
`Constraint`

objects can all be modified and computed upon after
creation. Here is an example where we modify the problem we created
above by changing its objective and adding a constraint, print the
modified problem, check whether it is still convex, and then solve the
modified problem:

```
# Modify the problem from example 1
prob2.2 <- prob2.1
objective(prob2.2) <- Minimize(x^2 + y^2 + abs(x-y))
constraints(prob2.2) <- c(constraints(prob2.1), y <= 1)
# Solve the modified problem
solution2.2 <- solve(prob2.2)
# Examine the solution
solution2.2$status
```

`## [1] "optimal"`

`solution2.2$value`

`## [1] 0.2222222`

`solution2.2$getValue(x)`

`## [1] 0.3333333`

`solution2.2$getValue(y)`

`## [1] 0.3333333`

### An Invalid Problem

Unfortunately, you can’t just type any arbitrary problem you like into
`CVXR`

. There are restrictions on what kinds of problems can be
handled. For example, if we tried to `Maximize()’ the objective from
example 2.1, we get an error:

```
prob2.3 <- prob2.1
objective(prob2.3) <- Maximize(x^2 + y^2)
solve(prob2.3)
```

`## Error in construct_intermediate_chain(object, candidate_solvers, gp = gp): Problem does not follow DCP rules.`

We would get a similar error if we tried to add the constraint
`p_norm(x,2) == 1`

. This is because `CVXR`

uses a strict set of rules
called Disciplined Convex Programming (DCP) to evaluate the convexity
of any given problem. If you follow these rules, you are guaranteed
that your problem is convex. If you don’t follow these rules, `CVXR`

will throw an exception. See the last section for further information
on DCP.

## Simple Examples

We begin by showing what a standard linear regression problem looks
like in `CVXR`

:

### Ordinary Least Squares

For illustration, we generate some synthetic data for use in this example.

```
set.seed(1)
s <- 1
n <- 10
m <- 300
mu <- rep(0, 9)
Sigma <- cbind(c(1.6484, -0.2096, -0.0771, -0.4088, 0.0678, -0.6337, 0.9720, -1.2158, -1.3219),
c(-0.2096, 1.9274, 0.7059, 1.3051, 0.4479, 0.7384, -0.6342, 1.4291, -0.4723),
c(-0.0771, 0.7059, 2.5503, 0.9047, 0.9280, 0.0566, -2.5292, 0.4776, -0.4552),
c(-0.4088, 1.3051, 0.9047, 2.7638, 0.7607, 1.2465, -1.8116, 2.0076, -0.3377),
c(0.0678, 0.4479, 0.9280, 0.7607, 3.8453, -0.2098, -2.0078, -0.1715, -0.3952),
c(-0.6337, 0.7384, 0.0566, 1.2465, -0.2098, 2.0432, -1.0666, 1.7536, -0.1845),
c(0.9720, -0.6342, -2.5292, -1.8116, -2.0078, -1.0666, 4.0882, -1.3587, 0.7287),
c(-1.2158, 1.4291, 0.4776, 2.0076, -0.1715, 1.7536, -1.3587, 2.8789, 0.4094),
c(-1.3219, -0.4723, -0.4552, -0.3377, -0.3952, -0.1845, 0.7287, 0.4094, 4.8406))
X <- MASS::mvrnorm(m, mu, Sigma)
X <- cbind(rep(1, m), X)
trueBeta <- c(0, 0.8, 0, 1, 0.2, 0, 0.4, 1, 0, 0.7)
y <- X %*% trueBeta + rnorm(m, 0, s)
```

```
beta <- Variable(n)
objective <- Minimize(sum_squares(y - X %*% beta))
prob3.1 <- Problem(objective)
```

Here, `y`

is the response, `X`

is the matrix of predictors, `n`

is the
number of predictors, and `beta`

is a vector of coefficients on the
predictors. The Ordinary Least-Squares (OLS) solution for `beta`

minimizes the \(l_2\)-norm of the residuals (i.e. the
root-mean-squared error). As we can see below, `CVXR`

’s solution
matches the solution obtained by using `lm`

.

```
CVXR_solution3.1 <- solve(prob3.1)
lm_solution3.1 <- lm(y ~ 0 + X)
```

Obviously, if all you want to do is least-squares linear regression,
you should simply use `lm`

. The chief advantage of `CVXR`

is its
flexibility, as we will demonstrate in the rest of this section.

### Non-Negative Least Squares

Looking at example 3.1, you may notice that the OLS regression problem
has an objective, but no constraints. In many contexts, we can greatly
improve our model by constraining the solution to reflect our prior
knowledge. For example, we may know that the coefficients `beta`

must
be non-negative.

```
prob3.2 <- prob3.1
constraints(prob3.2) <- list(beta >= 0)
solution3.2 <- solve(prob3.2)
```

As we can see in the figure above, adding that one constraint produced a massive improvement in the accuracy of the estimates. Not only are the non-negative least-squares estimates much closer to the true coefficients than the OLS estimates, they have even managed to recover the correct sparsity structure in this case.

Like with OLS, there are already R packages available which implement
non-negative least squares, such as `nnls`

. But that is
actually an excellent demonstration of the power of `CVXR`

: A single
line of code here, namely `prob3.2$constraints <- list(beta >= 0)`

, is
doing the work of an entire package.

### Support Vector Classifiers

Another common statistical tool is the support vector classifier (SVC). The SVC is an affine function (hyperplane) that separates two sets of points by the widest margin. When the sets are not linearly separable, the SVC is determined by a trade-off between the width of the margin and the number of points that are misclassified.

For the binary case, where the response \(y_i \in \{-1,1\}\), the SVC is obtained by solving

\[ \begin{array}{ll} \mbox{minimize} & \frac{1}{2}\Vert\beta\Vert^2 + C\sum_{i=1}^m \xi_i \\ \mbox{subject to} & \xi_i \geq 0, \quad y_i(x_i^T\beta + \beta_0) \geq 1 - \xi_i, \quad i = 1,\ldots,m \end{array} \]

with variables \((\beta,\xi)\). Here, \(\xi\) is the amount by which a point can violate the separating hyperplane, and \(C > 0\) is a user-chosen penalty on the total violation. As \(C\) increases, fewer misclassifications will be allowed.

Below, we fit a SVC in `CVXR`

with \(C = 10\).

```
## Generate data
set.seed(10)
n <- 2
m <- 50
X <- matrix(rnorm(m*n), nrow = m, ncol = n)
y <- c(rep(-1, m/2), rep(1, m/2))
X[y == 1,] = X[y == 1,] + 1
```

```
## Define variables
cost <- 10
beta0 <- Variable()
beta <- Variable(n)
slack <- Variable(m)
# Form problem
objective <- (1/2) * sum_squares(vstack(beta, beta0)) + cost * sum(slack)
constraints <- list(y * (X %*% beta + beta0) >= 1 - slack, slack >= 0)
prob3.3 <- Problem(Minimize(objective), constraints)
solution3.3 <- solve(prob3.3)
```

## Disciplined Convex Programming (DCP)

Disciplined convex programming (DCP) is a system for constructing
mathematical expressions with known sign and curvature from a given library of
base functions. `CVXR`

uses DCP to ensure that the specified
optimization problems are convex.

The user may find it helpful to read about how the DCP rules are applied in other languages such as Python, Matlab, and Julia.

`CVXR`

implements the same rules, and a short introduction
is available here. The set of DCP functions are
described in Function Reference.

## Session Info

`sessionInfo()`

```
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin19.5.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.10_1/lib/libopenblasp-r0.3.10.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] MASS_7.3-51.6 tidyr_1.1.0 ggplot2_3.3.2 CVXR_1.0-9
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.15 slam_0.1-47 purrr_0.3.4
## [5] lattice_0.20-41 Rmosek_9.2.3 colorspace_1.4-1 vctrs_0.3.2
## [9] generics_0.0.2 htmltools_0.5.0 yaml_2.2.1 gmp_0.6-0
## [13] rlang_0.4.7 pillar_1.4.6 glue_1.4.1 Rmpfr_0.8-1
## [17] withr_2.2.0 Rcplex_0.3-3 bit64_0.9-7 lifecycle_0.2.0
## [21] stringr_1.4.0 munsell_0.5.0 blogdown_0.19 gtable_0.3.0
## [25] gurobi_9.0.3.1 codetools_0.2-16 evaluate_0.14 labeling_0.3
## [29] knitr_1.28 cccp_0.2-4 Rcpp_1.0.5 scales_1.1.1
## [33] osqp_0.6.0.3 farver_2.0.3 bit_1.1-15.2 digest_0.6.25
## [37] stringi_1.4.6 bookdown_0.19 dplyr_1.0.0 grid_4.0.2
## [41] Rglpk_0.6-4 tools_4.0.2 magrittr_1.5 tibble_3.0.3
## [45] crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 rcbc_0.1.0.9001
## [49] Matrix_1.2-18 assertthat_0.2.1 rmarkdown_2.3 R6_2.4.1
## [53] compiler_4.0.2
```

## Source

## References

Boyd, S., and L. Vandenberghe. 2004. *Convex Optimization*. Cambridge University Press.