Quantile Regression

Introduction

Quantile regression is another variation on least squares . The loss is the tilted \(l_1\) function,

\[ \phi(u) = \tau\max(u,0) - (1-\tau)\max(-u,0) = \frac{1}{2}|u| + \left(\tau - \frac{1}{2}\right)u, \]

where \(\tau \in (0,1)\) specifies the quantile. The problem as before is to minimize the total residual loss. This model is commonly used in ecology, healthcare, and other fields where the mean alone is not enough to capture complex relationships between variables. CVXR allows us to create a function to represent the loss and integrate it seamlessly into the problem definition, as illustrated below.

Example

We will use an example from the quantreg package. The vignette provides an example of the estimation and plot.

suppressMessages(suppressWarnings(library(quantreg)))
data(engel)
library(ggplot2)
p <- ggplot(data = engel) +
    geom_point(mapping = aes(x = income, y = foodexp), color = "blue")
taus <- c(0.1, 0.25, 0.5, 0.75, 0.90, 0.95)
fits <- data.frame(
    coef(lm(foodexp ~ income, data = engel)),
    sapply(taus, function(x) coef(rq(formula = foodexp ~ income, data = engel, tau = x))))
names(fits) <- c("OLS", sprintf("$\\tau_{%0.2f}$", taus))

nf <- ncol(fits)
colors <- colorRampPalette(colors = c("black", "red"))(nf)
p <- p + geom_abline(intercept = fits[1, 1], slope = fits[2, 1], color = colors[1], size = 1.5)
for (i in seq_len(nf)[-1]) {
    p <- p + geom_abline(intercept = fits[1, i], slope = fits[2, i], color = colors[i])
}
p

The above plot shows the quantile regression fits for \(\tau = (0.1, 0.25, 0.5, 0.75, 0.90, 0.95)\). The OLS fit is the thick black line.

The following is a table of the estimates.

library(kableExtra)
knitr::kable(fits, format = "html", caption = "Fits from OLS and `quantreg`") %>%
    kable_styling("striped") %>%
    column_spec(1:8, background = "#ececec")
Table 1: Fits from OLS and quantreg
OLS \(\tau_{0.10}\) \(\tau_{0.25}\) \(\tau_{0.50}\) \(\tau_{0.75}\) \(\tau_{0.90}\) \(\tau_{0.95}\)
(Intercept) 147.4753885 110.1415742 95.4835396 81.4822474 62.3965855 67.3508721 64.1039632
income 0.4851784 0.4017658 0.4741032 0.5601806 0.6440141 0.6862995 0.7090685

The CVXR formulation follows. Note we make use of model.matrix to get the intercept column painlessly.

suppressMessages(suppressWarnings(library(CVXR)))
X <- model.matrix(foodexp ~ income, data = engel)
y <- matrix(engel[, "foodexp"], ncol = 1)
beta <- Variable(2)
quant_loss <- function(u, tau) { 0.5 * abs(u) + (tau - 0.5) * u }
solutions <- sapply(taus, function(tau) {
    obj <- sum(quant_loss(y - X %*% beta, t = tau))
    prob <- Problem(Minimize(obj))
    solve(prob)$getValue(beta)
})
solutions <- data.frame(coef(lm(foodexp ~ income, data = engel)),
                        solutions)
names(fits) <- c("OLS", sprintf("$\\tau_{%0.2f}$", taus))

Here is a table similar to the above with the OLS estimate added in for easy comparison.

knitr::kable(fits, format = "html", caption = "Fits from OLS and `CVXR`") %>%
    kable_styling("striped") %>%
    column_spec(1:8, background = "#ececec")
Table 2: Fits from OLS and CVXR
OLS \(\tau_{0.10}\) \(\tau_{0.25}\) \(\tau_{0.50}\) \(\tau_{0.75}\) \(\tau_{0.90}\) \(\tau_{0.95}\)
(Intercept) 147.4753885 110.1415742 95.4835396 81.4822474 62.3965855 67.3508721 64.1039632
income 0.4851784 0.4017658 0.4741032 0.5601806 0.6440141 0.6862995 0.7090685

The results match.

Session Info

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.2
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices datasets  utils     base     
## 
## other attached packages:
## [1] CVXR_0.94-4      kableExtra_0.6.1 ggplot2_2.2.1    quantreg_5.34   
## [5] SparseM_1.77    
## 
## loaded via a namespace (and not attached):
##  [1] gmp_0.5-13.1       Rcpp_0.12.14       highr_0.6         
##  [4] compiler_3.4.2     plyr_1.8.4         R.methodsS3_1.7.1 
##  [7] R.utils_2.6.0      tools_3.4.2        bit_1.1-12        
## [10] digest_0.6.12      viridisLite_0.2.0  evaluate_0.10.1   
## [13] tibble_1.3.4       gtable_0.2.0       lattice_0.20-35   
## [16] pkgconfig_2.0.1    rlang_0.1.4        Matrix_1.2-12     
## [19] yaml_2.1.14        blogdown_0.3       Rmpfr_0.6-1       
## [22] ECOSolveR_0.3-2    stringr_1.2.0      httr_1.3.1        
## [25] knitr_1.17         xml2_1.1.1         MatrixModels_0.4-1
## [28] hms_0.4.0          bit64_0.9-7        rprojroot_1.2     
## [31] grid_3.4.2         R6_2.2.2           rmarkdown_1.8     
## [34] bookdown_0.5       readr_1.1.1        magrittr_1.5      
## [37] scs_1.1-1          backports_1.1.1    scales_0.5.0      
## [40] htmltools_0.3.6    rvest_0.3.2        colorspace_1.3-2  
## [43] labeling_0.3       stringi_1.1.6      lazyeval_0.2.1    
## [46] munsell_0.4.3      R.oo_1.21.0

Source

R Markdown

References

comments powered by Disqus