# Huber Regression

## Introduction

Huber regression (Huber 1964) is a regression technique that is robust to outliers. The idea is to use a different loss function rather than the traditional least-squares; we solve

$\begin{array}{ll} \underset{\beta}{\mbox{minimize}} & \sum_{i=1}^m \phi(y_i - x_i^T\beta) \end{array}$

for variable $$\beta \in {\mathbf R}^n$$, where the loss $$\phi$$ is the Huber function with threshold $$M > 0$$, $\phi(u) = \begin{cases} u^2 & \mbox{if } |u| \leq M \\ 2Mu - M^2 & \mbox{if } |u| > M. \end{cases}$

This function is identical to the least squares penalty for small residuals, but on large residuals, its penalty is lower and increases linearly rather than quadratically. It is thus more forgiving of outliers.

## Example

We generate some problem data.

n <- 1
m <- 450
M <- 1      ## Huber threshold
p <- 0.1    ## Fraction of responses with sign flipped

## Generate problem data
set.seed(1289)
beta_true <- 5 * matrix(stats::rnorm(n), nrow = n)
X <- matrix(stats::rnorm(m * n), nrow = m, ncol = n)
y_true <- X %*% beta_true
eps <- matrix(stats::rnorm(m), nrow = m)

We will randomly flip the sign of some responses to illustrate the robustness.

factor <- 2*stats::rbinom(m, size = 1, prob = 1-p) - 1
y <- factor * y_true + eps

We can solve this problem both using ordinary least squares and huber regression to compare.

suppressMessages(suppressWarnings(library(CVXR)))
beta <- Variable(n)
rel_err <- norm(beta - beta_true, "F") / norm(beta_true, "F")

## OLS
obj <- sum((y - X %*% beta)^2)
prob <- Problem(Minimize(obj))
result <- solve(prob)
beta_ols <- result$getValue(beta) err_ols <- result$getValue(rel_err)

## Solve Huber regression problem
obj <- sum(CVXR::huber(y - X %*% beta, M))
prob <- Problem(Minimize(obj))
result <- solve(prob)
beta_hub <- result$getValue(beta) err_hub <- result$getValue(rel_err)

Finally, we also solve the OLS problem assuming we know the flipped signs.

## Solve ordinary least squares assuming sign flips known
obj <- sum((y - factor*(X %*% beta))^2)
prob <- Problem(Minimize(obj))
result <- solve(prob)
beta_prs <- result$getValue(beta) err_prs <- result$getValue(rel_err)

We can now plot the fit against the measured responses.

library(ggplot2)
d1 <- data.frame(X = X, y = y, sign = as.factor(factor))
d2 <- data.frame(X = rbind(X, X, X),
yHat = rbind(X %*% beta_ols,
X %*% beta_hub,
X %*% beta_prs),
Estimate = c(rep("OLS", m),
rep("Huber", m),
rep("Prescient", m)))
ggplot() +
geom_point(data = d1, mapping = aes(x = X, y = y, color = sign)) +
geom_line(data = d2, mapping = aes(x = X, y = yHat, color = Estimate))

As can be seen, the Huber line is closer to the prescient line.

## Session Info

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base
##
## other attached packages:
## [1] ggplot2_2.2.1 CVXR_0.99
##
## loaded via a namespace (and not attached):
##  [1] gmp_0.5-13.1      Rcpp_0.12.17      pillar_1.2.2
##  [4] compiler_3.5.0    plyr_1.8.4        R.methodsS3_1.7.1
##  [7] R.utils_2.6.0     tools_3.5.0       digest_0.6.15
## [10] bit_1.1-13        evaluate_0.10.1   tibble_1.4.2
## [13] gtable_0.2.0      lattice_0.20-35   rlang_0.2.0
## [16] Matrix_1.2-14     yaml_2.1.19       blogdown_0.6.3
## [19] xfun_0.1          Rmpfr_0.7-0       ECOSolveR_0.4
## [22] stringr_1.3.1     knitr_1.20        rprojroot_1.3-2
## [25] bit64_0.9-7       grid_3.5.0        R6_2.2.2
## [28] rmarkdown_1.9.14  bookdown_0.7      magrittr_1.5
## [31] backports_1.1.2   scales_0.5.0      htmltools_0.3.6
## [34] scs_1.1-1         colorspace_1.3-2  labeling_0.3
## [37] stringi_1.2.2     lazyeval_0.2.1    munsell_0.4.3
## [40] R.oo_1.22.0

R Markdown

## References

Huber, P. J. 1964. “Robust Estimation of a Location Parameter.” Annals of Mathematical Statistics 35 (1): 73–101.