Survey Calibration

Author

Anqi Fu and Balasubramanian Narasimhan

Introduction

Calibration is a widely used technique in survey sampling. Suppose \(m\) sampling units in a survey have been assigned initial weights \(d_i\) for \(i = 1,\ldots,m\), and furthermore, there are \(n\) auxiliary variables whose values in the sample are known. Calibration seeks to improve the initial weights \(d_i\) by finding new weights \(w_i\) that incorporate this auxiliary information while perturbing the initial weights as little as possible, , the ratio \(g_i = w_i/d_i\) must be close to one. Such reweighting improves precision of estimates (Chapter 7, Lumley (2010)).

Let \(X \in {\mathbf R}^{m \times n}\) be the matrix of survey samples, with each column corresponding to an auxiliary variable. Reweighting can be expressed as the optimization problem (see Davies et al. (2016)):

\[ \begin{array}{ll} \mbox{minimize} & \sum_{i=1}^m d_i\phi(g_i) \\ \mbox{subject to} & A^Tg = r \end{array} \]

with respect to \(g \in {\mathbf R}^m\), where \(\phi:{\mathbf R} \rightarrow {\mathbf R}\) is a strictly convex function with \(\phi(1) = 0\), \(r \in {\mathbf R}^n\) are the known population totals of the auxiliary variables, and \(A \in {\mathbf R}^{m \times n}\) is related to \(X\) by \(A_{ij} = d_iX_{ij}\) for \(i = 1,\ldots,m\) and \(j = 1,\ldots,n\).

Raking

A common calibration technique is raking, which uses the penalty function \(\phi(g_i) = g_i\log(g_i) - g_i + 1\) as the calibration metric.

We illustrate with the California Academic Performance Index data in the survey package (Lumley (2018)) which also supplies facilities for calibration via the function calibrate. Both the population dataset (apipop) and a simple random sample of \(m = 200\) (apisrs) are provided. Suppose that we wish to reweight the observations in the sample using known totals for two variables from the population: stype, the school type (elementary, middle or high) and sch.wide, whether the school met the yearly target or not. This reweighting would make the sample more representative of the general population.

The code below estimates the weights using survey::calibrate.

data(api)
design_api <- svydesign(id = ~dnum, weights = ~pw, data = apisrs)
formula <- ~stype + sch.wide
T <- apply(model.matrix(object = formula, data = apipop),
           2,
           sum)

cal_api <- calibrate(design_api, formula, population = T, calfun = cal.raking)
w_survey <- weights(cal_api)

The CVXR formulation follows.

di <- apisrs$pw
X <- model.matrix(object = formula, data = apisrs)
A <- di * X
n <- nrow(apisrs)
g <- Variable(n)
constraints <- list(t(A) %*% g == T)

## Raking
Phi_R <- Minimize(sum(di * (-entr(g) - g + 1)))
p <- Problem(Phi_R, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr <- di * value(g)

We compare the results below in a table which show them to be identical.

## Using functions in the *un echoed* preamble of this document...
build_table(d1 = build_df(apisrs, "Survey", w_survey),
            d2 = build_df(apisrs, "CVXR", w_cvxr),
            title = "Calibration weights from Raking")

Calibration weights from Raking
stype	sch.wide	Survey wts.	Frequency	CVXR wts.	Frequency
E	No	28.911	15	28.911	15
E	Yes	31.396	127	31.396	127
H	No	29.003	13	29.003	13
H	Yes	31.497	12	31.497	12
M	No	29.033	9	29.033	9
M	Yes	31.529	24	31.529	24

Other Calibration Metrics

Two other penalty functions commonly used are:

Quadratic

\[ \phi^{Q}(g) = \frac{1}{2}(g-1)^2; \]

Logit

\[ \phi^{L}(g; l, u) = \frac{1}{C}\biggl[ (g-l)\log\left(\frac{g-l}{1-l}\right) + (u-g)\log\left(\frac{u-g}{u-1}\right) \biggr] \mbox{ for } C = \frac{u-l}{(1-l)(u-1)}. \]

It is again easy to incorporate these in our example and compare to survey results.

Quadratic

The survey function for this calibration is invoked as cal.linear.

## Quadratic
Phi_Q  <- Minimize(sum_squares(g - 1) / 2)
p <- Problem(Phi_Q, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr_q <- di * value(g)
w_survey_q <- weights(calibrate(design_api, formula, population = T, calfun = cal.linear))

N.B. Different solvers may produce slightly different numbers of unique weights. Such differences are not unheard of among solvers! One can try different solvers (e.g., solver = "SCS") to compare results.

Calibration weights from Quadratic metric
stype	sch.wide	Survey wts.	Frequency	CVXR wts.	Frequency
E	No	28.907	15	28.907	15
E	Yes	31.397	127	31.397	127
H	No	29.005	13	29.005	13
H	Yes	31.495	12	31.495	12
M	No	29.037	9	29.037	9
M	Yes	31.528	24	31.528	24

Logistic

Finally, the logistic, which requires bounds \(l\) and \(u\) on the coefficients; we use \(l=0.9\) and \(u=1.1\).

u <- 1.10; l <- 0.90
w_survey_l <- weights(calibrate(design_api, formula, population = T, calfun = cal.linear,
                                bounds = c(l, u)))
Phi_L <- Minimize(sum(-entr((g - l) / (u - l))  -
                      entr((u - g) / (u - l)))) 
p <- Problem(Phi_L, c(constraints, list(l <= g, g <= u)))
res <- psolve(p)
check_solver_status(p)
w_cvxr_l <- di * value(g)

Calibration weights from Logit metric
stype	sch.wide	Survey wts.	Frequency	CVXR wts.	Frequency
E	No	28.907	15	28.929	15
E	Yes	31.397	127	31.394	127
H	No	29.005	13	28.995	13
H	Yes	31.495	12	31.505	12
M	No	29.037	9	29.014	9
M	Yes	31.528	24	31.536	24

Further Metrics

Following examples in survey::calibrate, we can try a few other metrics.

First, the hellinger distance.

hellinger <- make.calfun(Fm1 = function(u, bounds)  ((1 - u / 2)^-2) - 1,
                         dF= function(u, bounds) (1 -u / 2)^-3 ,
                         name = "Hellinger distance")
w_survey_h <- weights(calibrate(design_api, formula, population = T, calfun = hellinger))
Phi_h <- Minimize(sum((1 - g / 2)^(-2)))
p <- Problem(Phi_h, constraints)
res <- psolve(p)
check_solver_status(p)
w_cvxr_h <- di * value(g)

Calibration weights from Hellinger distance metric
stype	sch.wide	Survey wts.	Frequency	CVXR wts.	Frequency
E	No	28.913	15	28.890	15
E	Yes	31.396	127	31.399	127
H	No	29.002	13	29.011	13
H	Yes	31.498	12	31.488	12
M	No	29.031	9	29.056	9
M	Yes	31.530	24	31.521	24

Next, the derivative of the inverse hyperbolic sine.

w_survey_s <- weights(calibrate(design_api, formula, population = T, calfun = cal.sinh,
                                bounds = c(l, u)))
Phi_s <- Minimize(sum( 0.5 * (exp(g) + exp(-g))))
p <- Problem(Phi_s, c(constraints, list(l <= g, g <= u)))
res <- psolve(p)
check_solver_status(p)
w_cvxr_s <- di * value(g)

Calibration weights from derivative of sinh metric
stype	sch.wide	Survey wts.	Frequency	CVXR wts.	Frequency
E	No	28.911	15	28.904	15
E	Yes	31.396	127	31.397	127
H	No	29.003	13	29.006	13
H	Yes	31.497	12	31.494	12
M	No	29.033	9	29.040	9
M	Yes	31.529	24	31.527	24

Session Info

R version 4.6.0 (2026-04-24)
Platform: aarch64-apple-darwin23
Running under: macOS Tahoe 26.5.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] survey_4.5       survival_3.8-6   Matrix_1.7-5     magrittr_2.0.5  
[5] dplyr_1.2.1      kableExtra_1.4.0 CVXR_1.9.1      

loaded via a namespace (and not attached):
 [1] gmp_0.7-5.1        generics_0.1.4     clarabel_0.11.2    slam_0.1-55       
 [5] xml2_1.5.2         stringi_1.8.7      lattice_0.22-9     digest_0.6.39     
 [9] evaluate_1.0.5     RColorBrewer_1.1-3 fastmap_1.2.0      rprojroot_2.1.1   
[13] jsonlite_2.0.0     ECOSolveR_0.6.1    backports_1.5.1    DBI_1.3.0         
[17] scs_3.2.7          viridisLite_0.4.3  scales_1.4.0       codetools_0.2-20  
[21] textshaping_1.0.5  cli_3.6.6          mitools_2.4        rlang_1.2.0       
[25] Rglpk_0.6-5.1      splines_4.6.0      yaml_2.3.12        otel_0.2.0        
[29] tools_4.6.0        osqp_1.0.0         checkmate_2.3.4    here_1.0.2        
[33] vctrs_0.7.3        R6_2.6.1           lifecycle_1.0.5    stringr_1.6.0     
[37] htmlwidgets_1.6.4  MASS_7.3-65        cccp_0.3-3         pkgconfig_2.0.3   
[41] pillar_1.11.1      glue_1.8.1         Rcpp_1.1.1-1.1     systemfonts_1.3.2 
[45] xfun_0.58          tibble_3.3.1       tidyselect_1.2.1   rstudioapi_0.18.0 
[49] knitr_1.51         dichromat_2.0-0.1  highs_1.14.0-2     farver_2.1.2      
[53] htmltools_0.5.9    rmarkdown_2.31     svglite_2.2.2      piqp_0.6.2        
[57] compiler_4.6.0     S7_0.2.2

References

Davies, G., J. Gillard, and A. Zhigljavsky. 2016. “Comparative Study of Different Penalty Functions and Algorithms in Survey Calibration.” In Advances in Stochastic and Deterministic Global Optimization. Springer International Publishing. https://doi.org/10.1007/978-3-319-29975-4_6.

Lumley, Thomas. 2018. Survey: Analysis of Complex Survey Samples.

Lumley, Thomas S. 2010. Complex Surveys: A Guide to Analysis Using r. Wiley Publishing.