data(api)
design_api <- svydesign(id = ~dnum, weights = ~pw, data = apisrs)
formula <- ~stype + sch.wide
T <- apply(model.matrix(object = formula, data = apipop),
2,
sum)
cal_api <- calibrate(design_api, formula, population = T, calfun = cal.raking)
w_survey <- weights(cal_api)Survey Calibration
Introduction
Calibration is a widely used technique in survey sampling. Suppose
Let
with respect to
Raking
A common calibration technique is raking, which uses the penalty function
We illustrate with the California Academic Performance Index data in the survey package (T. Lumley (2018)) which also supplies facilities for calibration via the function calibrate. Both the population dataset (apipop) and a simple random sample of apisrs) are provided. Suppose that we wish to reweight the observations in the sample using known totals for two variables from the population: stype, the school type (elementary, middle or high) and sch.wide, whether the school met the yearly target or not. This reweighting would make the sample more representative of the general population.
The code below estimates the weights using survey::calibrate.
The CVXR formulation follows.
di <- apisrs$pw
X <- model.matrix(object = formula, data = apisrs)
A <- di * X
n <- nrow(apisrs)
g <- Variable(n)
constraints <- list(t(A) %*% g == T)
## Raking
Phi_R <- Minimize(sum(di * (-entr(g) - g + 1)))
p <- Problem(Phi_R, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr <- di * value(g)We compare the results below in a table which show them to be identical.
## Using functions in the *un echoed* preamble of this document...
build_table(d1 = build_df(apisrs, "Survey", w_survey),
d2 = build_df(apisrs, "CVXR", w_cvxr),
title = "Calibration weights from Raking")| stype | sch.wide | Survey wts. | Frequency | CVXR wts. | Frequency |
|---|---|---|---|---|---|
| E | No | 28.911 | 15 | 28.911 | 15 |
| E | Yes | 31.396 | 127 | 31.396 | 127 |
| H | No | 29.003 | 13 | 29.003 | 13 |
| H | Yes | 31.497 | 12 | 31.497 | 12 |
| M | No | 29.033 | 9 | 29.033 | 9 |
| M | Yes | 31.529 | 24 | 31.529 | 24 |
Other Calibration Metrics
Two other penalty functions commonly used are:
Quadratic
Logit
It is again easy to incorporate these in our example and compare to survey results.
Quadratic
The survey function for this calibration is invoked as cal.linear.
## Quadratic
Phi_Q <- Minimize(sum_squares(g - 1) / 2)
p <- Problem(Phi_Q, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr_q <- di * value(g)
w_survey_q <- weights(calibrate(design_api, formula, population = T, calfun = cal.linear))N.B. Different solvers may produce slightly different numbers of unique weights. Such differences are not unheard of among solvers! One can try different solvers (e.g., solver = "SCS") to compare results.
| stype | sch.wide | Survey wts. | Frequency | CVXR wts. | Frequency |
|---|---|---|---|---|---|
| E | No | 28.907 | 15 | 28.907 | 15 |
| E | Yes | 31.397 | 127 | 31.397 | 127 |
| H | No | 29.005 | 13 | 29.005 | 13 |
| H | Yes | 31.495 | 12 | 31.495 | 12 |
| M | No | 29.037 | 9 | 29.037 | 9 |
| M | Yes | 31.528 | 24 | 31.528 | 24 |
Logistic
Finally, the logistic, which requires bounds
u <- 1.10; l <- 0.90
w_survey_l <- weights(calibrate(design_api, formula, population = T, calfun = cal.linear,
bounds = c(l, u)))
Phi_L <- Minimize(sum(-entr((g - l) / (u - l)) -
entr((u - g) / (u - l))))
p <- Problem(Phi_L, c(constraints, list(l <= g, g <= u)))
res <- psolve(p)
check_solver_status(p)
w_cvxr_l <- di * value(g)| stype | sch.wide | Survey wts. | Frequency | CVXR wts. | Frequency |
|---|---|---|---|---|---|
| E | No | 28.907 | 15 | 28.929 | 15 |
| E | Yes | 31.397 | 127 | 31.394 | 127 |
| H | No | 29.005 | 13 | 28.995 | 13 |
| H | Yes | 31.495 | 12 | 31.505 | 12 |
| M | No | 29.037 | 9 | 29.014 | 9 |
| M | Yes | 31.528 | 24 | 31.536 | 24 |
Further Metrics
Following examples in survey::calibrate, we can try a few other metrics.
First, the hellinger distance.
hellinger <- make.calfun(Fm1 = function(u, bounds) ((1 - u / 2)^-2) - 1,
dF= function(u, bounds) (1 -u / 2)^-3 ,
name = "Hellinger distance")
w_survey_h <- weights(calibrate(design_api, formula, population = T, calfun = hellinger))
Phi_h <- Minimize(sum((1 - g / 2)^(-2)))
p <- Problem(Phi_h, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr_h <- di * value(g)| stype | sch.wide | Survey wts. | Frequency | CVXR wts. | Frequency |
|---|---|---|---|---|---|
| E | No | 28.913 | 15 | 28.890 | 15 |
| E | Yes | 31.396 | 127 | 31.399 | 127 |
| H | No | 29.002 | 13 | 29.011 | 13 |
| H | Yes | 31.498 | 12 | 31.488 | 12 |
| M | No | 29.031 | 9 | 29.056 | 9 |
| M | Yes | 31.530 | 24 | 31.521 | 24 |
Next, the derivative of the inverse hyperbolic sine.
w_survey_s <- weights(calibrate(design_api, formula, population = T, calfun = cal.sinh,
bounds = c(l, u)))
Phi_s <- Minimize(sum( 0.5 * (exp(g) + exp(-g))))
p <- Problem(Phi_s, c(constraints, list(l <= g, g <= u)))
res <- psolve(p)
check_solver_status(p)
w_cvxr_s <- di * value(g)| stype | sch.wide | Survey wts. | Frequency | CVXR wts. | Frequency |
|---|---|---|---|---|---|
| E | No | 28.911 | 15 | 28.904 | 15 |
| E | Yes | 31.396 | 127 | 31.397 | 127 |
| H | No | 29.003 | 13 | 29.006 | 13 |
| H | Yes | 31.497 | 12 | 31.494 | 12 |
| M | No | 29.033 | 9 | 29.040 | 9 |
| M | Yes | 31.529 | 24 | 31.527 | 24 |
Session Info
R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] survey_4.5 survival_3.8-6 Matrix_1.7-4 magrittr_2.0.4
[5] dplyr_1.2.0 kableExtra_1.4.0 CVXR_1.8.0.9215
loaded via a namespace (and not attached):
[1] xfun_0.56 htmlwidgets_1.6.4 lattice_0.22-9 vctrs_0.7.1
[5] tools_4.5.2 Rmosek_11.1.1 generics_0.1.4 tibble_3.3.1
[9] highs_1.12.0-3 pkgconfig_2.0.3 piqp_0.6.2 checkmate_2.3.4
[13] RColorBrewer_1.1-3 S7_0.2.1 lifecycle_1.0.5 compiler_4.5.2
[17] farver_2.1.2 stringr_1.6.0 textshaping_1.0.4 gurobi_13.0-1
[21] mitools_2.4 codetools_0.2-20 ECOSolveR_0.6.1 htmltools_0.5.9
[25] cccp_0.3-3 yaml_2.3.12 gmp_0.7-5.1 pillar_1.11.1
[29] MASS_7.3-65 Rcplex_0.3-8 clarabel_0.11.2 tidyselect_1.2.1
[33] digest_0.6.39 stringi_1.8.7 slam_0.1-55 splines_4.5.2
[37] rprojroot_2.1.1 fastmap_1.2.0 here_1.0.2 cli_3.6.5
[41] dichromat_2.0-0.1 scales_1.4.0 backports_1.5.0 rmarkdown_2.30
[45] scs_3.2.7 otel_0.2.0 evaluate_1.0.5 knitr_1.51
[49] Rglpk_0.6-5.1 viridisLite_0.4.3 rlang_1.1.7 Rcpp_1.1.1
[53] glue_1.8.0 DBI_1.3.0 xml2_1.5.2 osqp_1.0.0
[57] svglite_2.2.2 rstudioapi_0.18.0 jsonlite_2.0.0 R6_2.6.1
[61] systemfonts_1.3.1