Derivatives Fundamentals

Author

CVXPY Developers and Balasubramanian Narasimhan

Introduction

Warning

The derivative functionality (backward, derivative, gradient, delta) is not implemented in CVXR at this time. The code below illustrates the intended API based on CVXPY but is not executable.

This example introduces the fundamentals of computing the derivative of the solution map to optimization problems. The derivative can be used for sensitivity analysis, to see how a solution would change given small changes to the parameters, and to compute gradients of scalar-valued functions of the solution.

We will consider a simple disciplined geometric program (DGP). The geometric program under consideration is

minimize1/(xyz)subject toa(xy+xz+yz)bxyc,

where xR++, yR++, and zR++ are the variables, and aR++, bR++, and cR are the parameters. The vector

α=[abc]

is the vector of parameters.

Setting Up the Problem

x <- Variable(pos = TRUE)
y <- Variable(pos = TRUE)
z <- Variable(pos = TRUE)

a <- Parameter(pos = TRUE)
b <- Parameter(pos = TRUE)
cc <- Parameter()  # 'c' is reserved in R

objective_fn <- 1 / (x * y * z)
objective <- Minimize(objective_fn)
constraints <- list(a * (x * y + x * z + y * z) <= b, x >= y^cc)
problem <- Problem(objective, constraints)

We verify that the problem is a DGP.

is_dgp(problem)
[1] TRUE

Solving the Problem

Next, we solve the problem, setting the parameters a, b, and c to 2, 1, and 0.5.

value(a) <- 2.0
value(b) <- 1.0
value(cc) <- 0.5
result <- psolve(problem, gp = TRUE, requires_grad = TRUE)

cat(sprintf("x = %f\n", value(x)))
cat(sprintf("y = %f\n", value(y)))
cat(sprintf("z = %f\n", value(z)))

Notice the keyword argument requires_grad = TRUE; this is necessary to subsequently compute derivatives.

Solution Map

The solution map of the above problem is a function

S:R++2×RR++3

which maps the parameter vector to the vector of optimal solutions

S(α)=[x(α)y(α)z(α)].

Here, x(α), y(α), and z(α) are the optimal values of the variables corresponding to the parameter vector.

As an example, we just saw that

S((2.0,1.0,0.5))[0.56120.31500.3690].

Sensitivity Analysis

When the solution map is differentiable, we can use its derivative

DS(α)R3×3

to perform a sensitivity analysis, which studies how the solution would change given small changes to the parameters.

Suppose we perturb the parameters by a vector of small magnitude dαR3. We can approximate the change Δ in the solution due to the perturbation using the derivative, as

Δ=S(α+dα)S(α)DS(α)dα.

We can compute this in CVXR as follows. Partition the perturbation as

dα=[dadbdc].

We set the delta attributes of the parameters to their perturbations, and then call the derivative method.

da <- 1e-2
db <- 1e-2
dc <- 1e-2

delta(a) <- da
delta(b) <- db
delta(cc) <- dc
derivative(problem)

The derivative method populates the delta attributes of the variables as a side-effect, with the predicted change in the variable. We can compare the predictions to the actual solution of the perturbed problem.

x_hat <- value(x) + delta(x)
y_hat <- value(y) + delta(y)
z_hat <- value(z) + delta(z)

value(a) <- value(a) + da
value(b) <- value(b) + db
value(cc) <- value(cc) + dc

psolve(problem, gp = TRUE)
cat(sprintf("x: predicted %.5f  actual %.5f\n", x_hat, value(x)))
cat(sprintf("y: predicted %.5f  actual %.5f\n", y_hat, value(y)))
cat(sprintf("z: predicted %.5f  actual %.5f\n", z_hat, value(z)))

## Restore original parameter values
value(a) <- value(a) - da
value(b) <- value(b) - db
value(cc) <- value(cc) - dc

In this case, the predictions and the actual solutions are fairly close.

Gradient

We can compute the gradient of a scalar-valued function of the solution with respect to the parameters. Let f:R3R, and suppose we want to compute the gradient of the composition fS. By the chain rule,

f(S(α))=DTS(α)[dxdydz],

where DTS is the adjoint (or transpose) of the derivative operator, and dx, dy, and dz are the partial derivatives of f with respect to its arguments.

We can compute the gradient in CVXR, using the backward method. As an example, suppose

f(x,y,z)=12(x2+y2+z2),

so that dx=x, dy=y, and dz=z. Let dα=f(S(α)), and suppose we subtract ηdα from the parameter, where η is a positive constant. Using the following code, we can compare f(S(αηdα)) with the value predicted by the gradient,

f(S(αηdα))f(S(α))ηdαTdα.

result <- psolve(problem, gp = TRUE, requires_grad = TRUE)

f_val <- function(xv, yv, zv) 0.5 * (xv^2 + yv^2 + zv^2)

original <- f_val(value(x), value(y), value(z))

gradient(x) <- value(x)
gradient(y) <- value(y)
gradient(z) <- value(z)
backward(problem)

eta <- 0.5
da_grad <- gradient(a)
db_grad <- gradient(b)
dc_grad <- gradient(cc)
dalpha <- c(da_grad, db_grad, dc_grad)
predicted <- original - eta * sum(dalpha^2)

value(a) <- value(a) - eta * da_grad
value(b) <- value(b) - eta * db_grad
value(cc) <- value(cc) - eta * dc_grad
psolve(problem, gp = TRUE)
actual <- f_val(value(x), value(y), value(z))

cat(sprintf("original %.5f  predicted %.5f  actual %.5f\n",
            original, predicted, actual))

Session Info

R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] CVXR_1.8.0.9213

loaded via a namespace (and not attached):
 [1] backports_1.5.0   digest_0.6.39     fastmap_1.2.0     xfun_0.56        
 [5] Matrix_1.7-4      lattice_0.22-9    osqp_1.0.0        knitr_1.51       
 [9] gmp_0.7-5.1       htmltools_0.5.9   rmarkdown_2.30    cli_3.6.5        
[13] S7_0.2.1          clarabel_0.11.2   grid_4.5.2        scs_3.2.7        
[17] compiler_4.5.2    highs_1.12.0-3    tools_4.5.2       checkmate_2.3.4  
[21] evaluate_1.0.5    Rcpp_1.1.1        yaml_2.3.12       otel_0.2.0       
[25] rlang_1.1.7       jsonlite_2.0.0    htmlwidgets_1.6.4

References