Getting Faster Results

Warning

The solution described below is useful when you mathematically know a problem is DCP-compliant and none of your data inputs will change the nature of the problem. We recommend that users check the DCP-compliance of a problem (via a call to is_dcp(prob) for example) at least once to ensure this is the case. Not verifying DCP-compliance may result in garbage!
Note also that the large speed gains in previous versions are no longer evident in the version 1.x because the new reduction framework has really made CVXR faster.

Introduction

As was remarked in the introduction to CVXR, its chief advantage is flexibility: you can specify a problem in close to mathematical form and CVXR solves it for you, if it can. Behind the scenes, CVXR compiles the domain specific language and verifies the convexity of the problem before sending it off to solvers. If the problem violates the rules of Disciplined Convex Programming it is rejected.

Therefore, it is generally slower than tailor-made solutions to a given problem.

An Example

To understand the speed issues, let us consider the global warming data from the Carbon Dioxide Information Analysis Center (CDIAC) again. The data points are the annual temperature anomalies relative to the 1961–1990 mean. We will fit the nearly-isotonic approximation \(\beta \in {\mathbf R}^m\) by solving

\[ \begin{array}{ll} \underset{\beta}{\mbox{Minimize}} & \frac{1}{2}\sum_{i=1}^m (y_i - \beta_i)^2 + \lambda \sum_{i=1}^{m-1}(\beta_i - \beta_{i+1})_+, \end{array} \] where \(\lambda \geq 0\) is a penalty parameter and \(x_+ =\max(x,0)\).

This can be solved as follows.

suppressMessages(suppressWarnings(library(CVXR)))
data(cdiac)
y <- cdiac$annual
m <- length(y)
lambda <- 0.44
beta <- Variable(m)
obj <- 0.5 * sum((y - beta)^2) + lambda * sum(pos(diff(beta)))
prob <- Problem(Minimize(obj))
soln <- solve(prob, solver = "ECOS")
betaHat <- soln$getValue(beta)

This is the recommended way to solve a problem.

However, suppose we wished to construct bootstrap confidence intervals for the estimate using 100 resamples. It is clear that this computation time can quickly become limiting .

Below, we show how one can get at the problem data and directly call a solver to get faster results.

Profile the code

Profiling a single fit to the model is useful to figure out where most of the time is spent.

library(profvis)
y <- cdiac$annual
profvis({
    beta <- Variable(m)
    obj <- Minimize(0.5 * sum((y - beta)^2) + lambda * sum(pos(diff(beta))))
    prob <- Problem(obj)
    soln <- solve(prob, solver = "ECOS")
    betaHat <- soln$getValue(beta)
})

It is especially instructive to click on the data tab and open up the tree for solve to see the sequence of calls and cumulative time used.

The profile shows that most of the total time (2400ms for one of our runs) time is spent in the call to is_dcp generic (about 2000ms). This generic is responsible to ensuring that all the problem is DCP-compliant by checking the nature of each of the components that make up the problem. The actual solving took a much smaller fraction of the time.

Directly Calling the Solver

We are mathematically certain that the above is convex and so we can avoid the is_dcp hit. We can obtain the the problem data for a particular solver (like OSQP, or ECOS or SCS) using the function get_problem_data and directly hand that data to the solver to get the solution.

prob_data <- get_problem_data(prob, solver = "ECOS")

ASIDE: How did we know ECOS was the solver to use? Future versions will provide a function to match a solver to a problem. (Actually, it is available already, but not exported yet!). For now, a single call to solve with the verbose option set to TRUE can provide that information.

soln <- solve(prob, verbose = TRUE)

Now that we have the problem data and know which solver to use, we can call the ECOS solver with the right arguments. (The ECOS solver is provided by the package ECOSolveR, which CVXR imports.)

if (packageVersion("CVXR") > "0.99-7") {
    ECOS_dims <- ECOS.dims_to_solver_dict(prob_data$data[["dims"]])
} else {
    ECOS_dims <- prob_data$data[["dims"]]
}
solver_output <- ECOSolveR::ECOS_csolve(c = prob_data$data[["c"]],
                                        G = prob_data$data[["G"]],
                                        h = prob_data$data[["h"]],
                                        dims = ECOS_dims,
                                        A = prob_data$data[["A"]],
                                        b = prob_data$data[["b"]])

Finally, we can obtain the results by asking CVXR to unpack the solver results for us. (See ?unpack_results for further examples.)

if (packageVersion("CVXR") > "0.99-7") {
    direct_soln <- unpack_results(prob, solver_output, prob_data$chain, prob_data$inverse_data)
} else {
    direct_soln <- unpack_results(prob, "ECOS", solver_output)
}

Profile the Direct Call

We can profile this direct call now.

profvis({
    beta <- Variable(m)
    obj <- Minimize(0.5 * sum((y - beta)^2) + lambda * sum(pos(diff(beta))))
    prob <- Problem(obj)
    prob_data <- get_problem_data(prob, solver = "ECOS")
    if (packageVersion("CVXR") > "0.99-7") {
        ECOS_dims <- ECOS.dims_to_solver_dict(prob_data$data[["dims"]])
    } else {
        ECOS_dims <- prob_data$data[["dims"]]
    }
    solver_output <- ECOSolveR::ECOS_csolve(c = prob_data$data[["c"]],
                                            G = prob_data$data[["G"]],
                                            h = prob_data$data[["h"]],
                                            dims = ECOS_dims,
                                            A = prob_data$data[["A"]],
                                            b = prob_data$data[["b"]])
    if (packageVersion("CVXR") > "0.99-7") {
        direct_soln <- unpack_results(prob, solver_output, prob_data$chain, prob_data$inverse_data)
    } else {
        direct_soln <- unpack_results(prob, "ECOS", solver_output)
    }
})

For one of our runs, the total time went down from \(2400ms\) to \(690ms\), more than a 3-fold speedup! In cases where the objective function and constraints are more complex, the speedup can be more than 10-fold.

Same Answer?

Of course, we should also verify that the results obtained in both cases are same.

identical(betaHat, direct_soln$getValue(beta))

## [1] TRUE

Session Info

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] profvis_0.3.8 CVXR_1.0-15  
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-0      bit_4.0.5         jsonlite_1.8.8    Rmpfr_0.9-5      
##  [5] compiler_4.4.1    Rcpp_1.0.12       slam_0.1-50       stringr_1.5.1    
##  [9] rcbc_0.1.0.9001   assertthat_0.2.1  cccp_0.3-1        jquerylib_0.1.4  
## [13] yaml_2.3.9        fastmap_1.2.0     clarabel_0.9.0    lattice_0.22-6   
## [17] R6_2.5.1          knitr_1.48        htmlwidgets_1.6.4 Rcplex_0.3-6     
## [21] gurobi_11.0-0     bookdown_0.40     bslib_0.7.0       rlang_1.1.4      
## [25] stringi_1.8.4     cachem_1.1.0      xfun_0.45         sass_0.4.9       
## [29] bit64_4.0.5       cli_3.6.3         magrittr_2.0.3    Rglpk_0.6-5.1    
## [33] digest_0.6.36     grid_4.4.1        gmp_0.7-4         lifecycle_1.0.4  
## [37] ECOSolveR_0.5.5   vctrs_0.6.5       glue_1.7.0        evaluate_0.24.0  
## [41] blogdown_1.19     codetools_0.2-20  Rmosek_10.2.0     purrr_1.0.2      
## [45] rmarkdown_2.27    tools_4.4.1       htmltools_0.5.8.1

Source

R Markdown