Isotonic Regression
Introduction
Isotonic regression is
regression with monotonic constraints. There are several packages in R
to fit isotonic regression models. In this example, we
consider isotone
which
uses a pooled-adjacent-violators algorithm (PAVA) and active set
methods to perform the fit.
Pituitary Data Example
We will use data from the isotone
package (see de Leeuw, Hornik, and Mair (2009)) on the size
of pituitary fissures for 11 subjects between 8 and 14 years of age.
data("pituitary")
str(pituitary)
## 'data.frame': 11 obs. of 2 variables:
## $ age : num 8 8 8 10 10 10 12 12 12 14 ...
## $ size: num 21 23.5 23 24 21 25 21.5 22 19 23.5 ...
Since the size is expected to increase with age, an isotonic fit is suggested.
res_p <- with(pituitary, gpava(age, size))
The CVXR
formulation expresses this pretty much in the mathematical form.
x_p <- with(pituitary, {
n <- length(size)
x <- Variable(n)
objective <- Minimize(p_norm(size - x, 2))
constraint <- list(diff(x) >= 0)
problem <- Problem(objective, constraint)
result <- solve(problem)
result$getValue(x)
})
We define a variable x
of size n
, the number of observations. The objective to be minimized is the least-squares error (p_norm
). The constraint specifies monotonicity and it uses the diff
function familiar to R users (see ?diff
), except that it now refers to elements of the x
. A problem is created and called via solve
. The resulting optimal value for x
is retrieved using getValue(x)
.
As the output below shows, the results are very close.
cbind(res_p$x, x_p)
## [,1] [,2]
## [1,] 21.000 20.99998
## [2,] 22.375 22.37499
## [3,] 22.375 22.37499
## [4,] 22.375 22.37499
## [5,] 22.375 22.37499
## [6,] 22.375 22.37499
## [7,] 22.375 22.37499
## [8,] 22.375 22.37499
## [9,] 22.375 22.37499
## [10,] 23.500 23.50003
## [11,] 25.000 25.00004
Handling Ties
Package isotone
provides additional methods for handling tied data besides the default ties = "primary"
method; ties = "secondary"
enforces equality within ties, and ties = "tertiary"
enforces monotonicity on the means. (The latter may cause individual fits to be non-monotonic.)
res_s <- with(pituitary, gpava(age, size, ties = "secondary"))
res_t <- with(pituitary, gpava(age, size, ties = "tertiary"))
In CVXR
, the secondary method just requires an additional constraint to enforce
equality within tied values; no other modification is necessary. We do that below by figuring out the tied observation indices using base::split
and force those x
values to be equal (i.e. diff
== 0).
x_s <- with(pituitary, {
n <- length(size)
x <- Variable(n)
objective <- Minimize(p_norm(size - x, 2))
secondary_constraints <- lapply(base::split(x = seq_len(n),
f = age),
function(i) diff(x[i]) == 0)
constraint <- c(diff(x) >= 0,
secondary_constraints)
problem <- Problem(objective, constraint)
solve(problem)$getValue(x)
})
Finally, the tertiary method requires computing the block means to be computed for use in enforcing monotonicity.
x_t <- with(pituitary, {
n <- length(size)
x <- Variable(n)
objective <- Minimize(p_norm(size - x, 2))
blocks <- base::split(x = seq_len(n),
f = pituitary$age)
block_means <- lapply(blocks, function(i) {
v <- numeric(n)
v[i] <- 1.0 / length(i)
matrix(v, nrow = 1) %*% x
})
block_mean_vector <- do.call(vstack, block_means)
constraint <- list(diff(block_mean_vector) >= 0)
problem <- Problem(objective, constraint)
solve(problem)$getValue(x)
})
In the above, we make use of the vstack
function to create a single vector of the block means.
We can now compare all three solutions.
Isotone (P) | CVXR (P) | Isotone (S) | CVXR(S) | Isotone (T) | CVXR (T) |
---|---|---|---|---|---|
21.00 | 21.00 | 22.22 | 22.22 | 20.72 | 20.72 |
22.38 | 22.37 | 22.22 | 22.22 | 23.22 | 23.22 |
22.38 | 22.37 | 22.22 | 22.22 | 22.72 | 22.72 |
22.38 | 22.37 | 22.22 | 22.22 | 22.89 | 22.89 |
22.38 | 22.37 | 22.22 | 22.22 | 19.89 | 19.89 |
22.38 | 22.37 | 22.22 | 22.22 | 23.89 | 23.89 |
22.38 | 22.37 | 22.22 | 22.22 | 22.89 | 22.89 |
22.38 | 22.37 | 22.22 | 22.22 | 23.39 | 23.39 |
22.38 | 22.37 | 22.22 | 22.22 | 20.39 | 20.39 |
23.50 | 23.50 | 24.25 | 24.25 | 23.50 | 23.50 |
25.00 | 25.00 | 24.25 | 24.25 | 25.00 | 25.00 |
## Testthat Results: No output is good
Notes
Other arbitrary desired behavior can be specified for tied observations using the above as a guide.
Session Info
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] kableExtra_1.4.0 isotone_1.1-1 CVXR_1.0-15 testthat_3.2.1.1
## [5] here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] gmp_0.7-5 utf8_1.2.4 clarabel_0.9.0.1 sass_0.4.9
## [5] xml2_1.3.6 slam_0.1-54 blogdown_1.19 stringi_1.8.4
## [9] lattice_0.22-6 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
## [13] grid_4.4.2 bookdown_0.41 pkgload_1.4.0 fastmap_1.2.0
## [17] rprojroot_2.0.4 jsonlite_1.8.9 Matrix_1.7-1 ECOSolveR_0.5.5
## [21] brio_1.1.5 fansi_1.0.6 Rmosek_10.2.0 viridisLite_0.4.2
## [25] scales_1.3.0 codetools_0.2-20 jquerylib_0.1.4 cli_3.6.3
## [29] Rmpfr_0.9-5 rlang_1.1.4 Rglpk_0.6-5.1 bit64_4.5.2
## [33] munsell_0.5.1 cachem_1.1.0 yaml_2.3.10 tools_4.4.2
## [37] Rcplex_0.3-6 rcbc_0.1.0.9001 colorspace_2.1-1 gurobi_11.0-0
## [41] assertthat_0.2.1 vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
## [45] stringr_1.5.1 bit_4.5.0 desc_1.4.3 cccp_0.3-1
## [49] pillar_1.9.0 bslib_0.8.0 glue_1.8.0 Rcpp_1.0.13-1
## [53] systemfonts_1.1.0 highr_0.11 xfun_0.49 rstudioapi_0.17.1
## [57] knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.29 svglite_2.1.3
## [61] compiler_4.4.2 nnls_1.6