Sensitivity Analysis and Gradients

Author

CVXPY Developers and Balasubramanian Narasimhan

Introduction

An optimization problem can be viewed as a function mapping parameters to solutions. This solution map is sometimes differentiable. CVXR has built-in support for computing the derivative of the optimal variable values of a problem with respect to small perturbations of the parameters (i.e., the Parameter instances appearing in a problem).

The Problem class exposes two methods related to computing the derivative:

derivative() evaluates the derivative given perturbations to the parameters. This lets you calculate how the solution to a problem would change given small changes to the parameters, without re-solving the problem.
backward() evaluates the adjoint of the derivative, computing the gradient of the solution with respect to the parameters. This can be useful when combined with automatic differentiation software.

The derivative() and backward() methods are only meaningful when the problem contains parameters. In order for a problem to be differentiable, it must be DPP-compliant. CVXR can compute the derivative of any DPP-compliant DCP or DGP problem. At non-differentiable points, CVXR computes a heuristic quantity.

Example: A Trivial Quadratic

As a first example, we solve a trivial problem with an analytical solution, to illustrate the usage of backward() and derivative(). We construct a problem with a scalar variable x and a scalar parameter p. The problem is to minimize the quadratic \((x - 2p)^2\).

x <- Variable()
p <- Parameter()
quadratic <- power(x - 2 * p, 2)
problem <- Problem(Minimize(quadratic))

Next, we solve the problem for the particular value of \(p = 3\). Note that when solving the problem, we supply the keyword argument requires_grad = TRUE to psolve().

value(p) <- 3.0
result <- psolve(problem, requires_grad = TRUE)
cat("Optimal value:", result, "\n")

Optimal value: 7.888609e-31

cat("x:", value(x), "\n")

x: 6

Using `backward()`

Having solved the problem with requires_grad = TRUE, we can now use backward() to differentiate through the problem. We compute the gradient of the solution with respect to its parameter by calling backward(). As a side-effect, backward() populates the gradient() attribute on all parameters with the gradient of the solution with respect to that parameter.

backward(problem)
cat("The gradient is", gradient(p), "\n")

The gradient is 2

In this case, the problem has the trivial analytical solution \(x^* = 2p\), and the gradient \(dx^*/dp\) is therefore just \(2\). So, as expected, the gradient is 2.0.

Using `derivative()`

Next, we use derivative() to see how a small change in p would affect the solution x. We perturb p by \(10^{-5}\) by setting delta(p) <- 1e-5, and calling derivative() will populate the delta() attribute of x with the change in x predicted by a first-order approximation (which is \(\frac{dx}{dp} \cdot \Delta p\)).

delta(p) <- 1e-5
derivative(problem)
cat("x delta is", delta(x), "\n")

x delta is 2e-05

In this case the solution is trivial and its derivative is just \(2p\), so we know that the delta in \(x\) should be \(2 \times 10^{-5}\). As expected, the output confirms this.

We emphasize that this example is trivial, because it has a trivial analytical solution with a trivial derivative. The backward() and derivative() methods are useful because the vast majority of convex optimization problems do not have analytical solutions: in these cases, CVXR can compute solutions and their derivatives, even though it would be impossible to derive them by hand.

When to Use `backward()` vs. `derivative()`

backward() should be used when you need the gradient of (a scalar-valued function of) the solution with respect to the parameters.
derivative() should be used for sensitivity analysis, i.e., when you want to know how the solution would change if one or more parameters were changed.

When there are multiple variables, it is much more efficient to compute sensitivities using derivative() than it would be to compute the entire Jacobian (which can be done by calling backward() multiple times, once for each standard basis vector).

A Note on `backward()` with Multiple Variables

In this simple example, the variable x was a scalar, so backward() computed the gradient of x with respect to p. When there is more than one scalar variable, by default, backward() computes the gradient of the sum of the optimal variable values with respect to the parameters.

More generally, backward() can be used to compute the gradient of a scalar-valued function \(f\) of the optimal variables, with respect to the parameters. If \(x(p)\) denotes the optimal value of the variable (which might be a vector or a matrix) for a particular value of the parameter \(p\) and \(f(x(p))\) is a scalar, then backward() can be used to compute the gradient of \(f\) with respect to \(p\). Let \(x^* = x(p)\), and say the derivative of \(f\) with respect to \(x^*\) is \(dx\). To compute the derivative of \(f\) with respect to \(p\), before calling backward(problem), just set gradient(x) <- dx.

Example: Least Squares with Regularization

Here we demonstrate sensitivity analysis on a more practical problem: a regularized least-squares problem where we want to understand how the solution changes with respect to the regularization parameter.

set.seed(42)
n <- 5
m <- 3
A <- matrix(rnorm(n * m), n, m)
b <- rnorm(n)

x <- Variable(m)
lam <- Parameter(nonneg = TRUE)

objective <- sum_squares(A %*% x - b) + lam * sum_squares(x)
problem <- Problem(Minimize(objective))

value(lam) <- 1.0
result <- psolve(problem, requires_grad = TRUE)
cat("Optimal value:", result, "\n")

Optimal value: 6.729336

cat("x:", value(x), "\n")

x: -0.1424264 -0.9913704 0.7241533

Now compute the gradient of the solution with respect to the regularization parameter:

backward(problem)
cat("Gradient of solution w.r.t. lambda:", gradient(lam), "\n")

Gradient of solution w.r.t. lambda: 0.07611712

And use derivative() to predict the effect of a small perturbation in \(\lambda\):

delta(lam) <- 0.01
derivative(problem)
cat("Predicted change in x:", delta(x), "\n")

Predicted change in x: 0.0001866804 0.001829381 -0.00125489

Session Info

R version 4.6.0 (2026-04-24)
Platform: aarch64-apple-darwin23
Running under: macOS Tahoe 26.5.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] CVXR_1.9.1

loaded via a namespace (and not attached):
 [1] backports_1.5.1   digest_0.6.39     fastmap_1.2.0     xfun_0.58        
 [5] Matrix_1.7-5      lattice_0.22-9    osqp_1.0.0        knitr_1.51       
 [9] gmp_0.7-5.1       htmltools_0.5.9   rmarkdown_2.31    cli_3.6.6        
[13] S7_0.2.2          clarabel_0.11.2   grid_4.6.0        scs_3.2.7        
[17] compiler_4.6.0    highs_1.14.0-2    tools_4.6.0       checkmate_2.3.4  
[21] evaluate_1.0.5    diffcp_0.1.1      Rcpp_1.1.1-1.1    yaml_2.3.12      
[25] otel_0.2.0        rlang_1.2.0       jsonlite_2.0.0    htmlwidgets_1.6.4

References

Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W. M. (2019). Differentiating through a cone program. Journal of Applied and Numerical Optimization, 1(2), 107–115.
Agrawal, A., Verschueren, R., Diamond, S., Boyd, S. (2018). A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1), 42–60.

Introduction

Example: A Trivial Quadratic

Using backward()

Using derivative()

When to Use backward() vs. derivative()

A Note on backward() with Multiple Variables