Isotonic Regression

Introduction

Isotonic regression is regression with monotonic constraints. There are several packages in R to fit isotonic regression models. In this example, we consider isotone which uses a pooled-adjacent-violators algorithm (PAVA) and active set methods to perform the fit.

Pituitary Data Example

We will use data from the isotone package (see de Leeuw, Hornik, and Mair (2009)) on the size of pituitary fissures for 11 subjects between 8 and 14 years of age.

library(isotone)
data("pituitary")
str(pituitary)
## 'data.frame':    11 obs. of  2 variables:
##  $age : num 8 8 8 10 10 10 12 12 12 14 ... ##$ size: num  21 23.5 23 24 21 25 21.5 22 19 23.5 ...

Since the size is expected to increase with age, an isotonic fit is suggested.

res_p <- with(pituitary, gpava(age, size))

The CVXR formulation expresses this pretty much in the mathematical form.

suppressMessages(suppressWarnings(library(CVXR)))
x_p <- with(pituitary, {
n <- length(size)
x <- Variable(n)
objective <- Minimize(p_norm(size - x, 2))
constraint <- list(diff(x) >= 0)
problem <- Problem(objective, constraint)
result <- solve(problem)
result$getValue(x) }) We define a variable x of size n, the number of observations. The objective to be minimized is the least-squares error (p_norm). The constraint specifies monotonicity and it uses the diff function familiar to R users (see ?diff), except that it now refers to elements of the x. A problem is created and called via solve. The resulting optimal value for x is retrieved using getValue(x). As the output below shows, the results are very close. cbind(res_p$x, x_p)
##         [,1]     [,2]
##  [1,] 21.000 20.99998
##  [2,] 22.375 22.37499
##  [3,] 22.375 22.37499
##  [4,] 22.375 22.37499
##  [5,] 22.375 22.37499
##  [6,] 22.375 22.37499
##  [7,] 22.375 22.37499
##  [8,] 22.375 22.37499
##  [9,] 22.375 22.37499
## [10,] 23.500 23.50003
## [11,] 25.000 25.00004

Handling Ties

Package isotone provides additional methods for handling tied data besides the default ties = "primary" method; ties = "secondary" enforces equality within ties, and ties = "tertiary" enforces monotonicity on the means. (The latter may cause individual fits to be non-monotonic.)

res_s <- with(pituitary, gpava(age, size, ties = "secondary"))
res_t <- with(pituitary, gpava(age, size, ties = "tertiary"))

In CVXR, the secondary method just requires an additional constraint to enforce equality within tied values; no other modification is necessary. We do that below by figuring out the tied observation indices using base::split and force those x values to be equal (i.e. diff == 0).

x_s <- with(pituitary, {
n <- length(size)
x <- Variable(n)
objective <- Minimize(p_norm(size - x, 2))
secondary_constraints <- lapply(base::split(x = seq_len(n),
f = age),
function(i) diff(x[i]) == 0)
constraint <- c(diff(x) >= 0,
secondary_constraints)
problem <- Problem(objective, constraint)
solve(problem)$getValue(x) }) Finally, the tertiary method requires computing the block means to be computed for use in enforcing monotonicity. x_t <- with(pituitary, { n <- length(size) x <- Variable(n) objective <- Minimize(p_norm(size - x, 2)) blocks <- base::split(x = seq_len(n), f = pituitary$age)
block_means <- lapply(blocks, function(i) {
v <- numeric(n)
v[i] <- 1.0 / length(i)
matrix(v, nrow = 1) %*% x
})
block_mean_vector <- do.call(vstack, block_means)
constraint <- list(diff(block_mean_vector) >= 0)
problem <- Problem(objective, constraint)
solve(problem)\$getValue(x)
})

In the above, we make use of the vstack function to create a single vector of the block means.

We can now compare all three solutions.

Table 1: P = primary, S = Secondary, T = Tertiary
Isotone (P) CVXR (P) Isotone (S) CVXR(S) Isotone (T) CVXR (T)
21.00 21.00 22.22 22.22 20.72 20.72
22.38 22.37 22.22 22.22 23.22 23.22
22.38 22.37 22.22 22.22 22.72 22.72
22.38 22.37 22.22 22.22 22.89 22.89
22.38 22.37 22.22 22.22 19.89 19.89
22.38 22.37 22.22 22.22 23.89 23.89
22.38 22.37 22.22 22.22 22.89 22.89
22.38 22.37 22.22 22.22 23.39 23.39
22.38 22.37 22.22 22.22 20.39 20.39
23.50 23.50 24.25 24.25 23.50 23.50
25.00 25.00 24.25 24.25 25.00 25.00

Notes

Other arbitrary desired behavior can be specified for tied observations using the above as a guide.

Session Info

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base
##
## other attached packages:
## [1] kableExtra_0.9.0 CVXR_0.99        isotone_1.1-0
##
## loaded via a namespace (and not attached):
##  [1] gmp_0.5-13.1      Rcpp_0.12.17      highr_0.6
##  [4] plyr_1.8.4        compiler_3.5.0    pillar_1.2.2
##  [7] R.methodsS3_1.7.1 R.utils_2.6.0     tools_3.5.0
## [10] digest_0.6.15     bit_1.1-13        viridisLite_0.3.0
## [13] evaluate_0.10.1   tibble_1.4.2      lattice_0.20-35
## [16] pkgconfig_2.0.1   rlang_0.2.0       Matrix_1.2-14
## [19] rstudioapi_0.7    yaml_2.1.19       blogdown_0.6.3
## [22] xfun_0.1          xml2_1.2.0        httr_1.3.1
## [25] Rmpfr_0.7-0       ECOSolveR_0.4     stringr_1.3.1
## [28] knitr_1.20        hms_0.4.2         rprojroot_1.3-2
## [31] bit64_0.9-7       grid_3.5.0        R6_2.2.2
## [34] rmarkdown_1.9.14  bookdown_0.7      readr_1.1.1
## [37] magrittr_1.5      scales_0.5.0      backports_1.1.2
## [40] htmltools_0.3.6   scs_1.1-1         nnls_1.4
## [43] rvest_0.3.2       colorspace_1.3-2  stringi_1.2.2
## [46] munsell_0.4.3     R.oo_1.22.0

R Markdown

References

de Leeuw, J., K. Hornik, and P. Mair. 2009. “Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods.” Journal of Statistical Software 32 (5): 1–24.