Given a set of data points \(y \in {\mathbf R}^m\), Tibshirani, Hoefling, and Tibshirani (2011) fit a nearly-isotonic approximation \(\beta
\in {\mathbf R}^m\) by solving
where \(\lambda \geq 0\) is a penalty parameter and \(x_+
=\max(x,0)\). This can be directly formulated in CVXR. As an example, we use global warming data from the Carbon Dioxide Information Analysis Center (CDIAC). The data points are the annual temperature anomalies relative to the 1961–1990 mean.
data(cdiac)str(cdiac)
'data.frame': 166 obs. of 14 variables:
$ year : int 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 ...
$ jan : num -0.702 -0.303 -0.308 -0.177 -0.36 -0.176 -0.119 -0.512 -0.532 -0.307 ...
$ feb : num -0.284 -0.362 -0.477 -0.33 -0.28 -0.4 -0.373 -0.344 -0.707 -0.192 ...
$ mar : num -0.732 -0.485 -0.505 -0.318 -0.284 -0.303 -0.513 -0.434 -0.55 -0.334 ...
$ apr : num -0.57 -0.445 -0.559 -0.352 -0.349 -0.217 -0.371 -0.646 -0.517 -0.203 ...
$ may : num -0.325 -0.302 -0.209 -0.268 -0.23 -0.336 -0.119 -0.567 -0.651 -0.31 ...
$ jun : num -0.213 -0.189 -0.038 -0.179 -0.215 -0.16 -0.288 -0.31 -0.58 -0.25 ...
$ jul : num -0.128 -0.215 -0.016 -0.059 -0.228 -0.268 -0.297 -0.544 -0.324 -0.285 ...
$ aug : num -0.233 -0.153 -0.195 -0.148 -0.163 -0.159 -0.305 -0.327 -0.28 -0.104 ...
$ sep : num -0.444 -0.108 -0.125 -0.409 -0.115 -0.339 -0.459 -0.393 -0.339 -0.575 ...
$ oct : num -0.452 -0.063 -0.216 -0.359 -0.188 -0.211 -0.384 -0.467 -0.2 -0.255 ...
$ nov : num -0.19 -0.03 -0.187 -0.256 -0.369 -0.212 -0.608 -0.665 -0.644 -0.316 ...
$ dec : num -0.268 -0.067 0.083 -0.444 -0.232 -0.51 -0.44 -0.356 -0.3 -0.363 ...
$ annual: num -0.375 -0.223 -0.224 -0.271 -0.246 -0.271 -0.352 -0.46 -0.466 -0.286 ...
Since we plan to fit the regression and also get some idea of the standard errors, we write a function that computes the fit for use in bootstrapping. We directly call the solver in this instance, to save computational time in bootstrapping. For more on this, see Getting Faster Results.
The pos atom evaluates \(x_+\) elementwise on the input expression.
The boot library provides all the tools for bootstrapping but requires a statistic function that takes particular arguments: a data frame, followed by the bootstrap indices and any other arguments (\(\lambda\) for instance). This is shown below.
NOTE In what follows, we use a very small number of bootstrap samples as the fits are time consuming.
neariso_fit_stat <-function(data, index, lambda) { sample <- data[index,] # Bootstrap sample of rows sample <- sample[order(sample$year),] # Order ascending by yearneariso_fit(sample$annual, lambda)}
This replaces the first difference term with an approximation to the second derivative at \(\beta_{i+1}\). In CVXR, the only change necessary is the penalty line: replacing diff(x) by diff(x, differences = 2).
The resulting curve for the near convex fit is depicted below with 95% confidence bands generated from \(R = 5\) samples. Note the jagged staircase pattern has been smoothed out.
(plot.nearconvex <-ggplot(data = data.nearconvex) +geom_point(mapping =aes(year, annual), color ="red") +geom_line(mapping =aes(year, est), color ="blue") +geom_ribbon(mapping =aes(x = year, ymin = lower,ymax = upper),alpha=0.3) +labs(x ="Year", y ="Temperature Anomalies"))
Notes
We can easily extend this example to higher-order differences or lags. To make this easy, the function diff takes an argument differences that is 1 by default; a third-order difference is specified as diff(x, differences = 3).