2.2 Regression analysis

Linear regression analysis and logistic regression analysis are commonly conducted by researchers in health sciences. Survey weighted regression analysis focuses on finite population regression coefficients and also provides valid results for the model parameters under the assumed regression model. For simplicity of notation, we assume that the covariates $x$ contain $1$ as the first component and the regression model has an intercept. The finite population regression coefficients $β β_{N}$ are the solution to the so-called census estimating equations, $U_{N} (β β) = H \sum h = 1 N_{h} \sum i_{h} = 1 x x_{i_{h}} {y_{i_{h}} - μ (x x_{i_{h}}, β β)} = 0,$ where $μ (x x_{i_{h}}, β β) = E (y_{i_{h}} ∣ x x_{i_{h}})$ is the mean function under the assumed regression model. For linear regression analysis, we have ; for logistic regression analysis where is a binary variable, we have

The survey weighted estimator of , denoted as , is the solution to the survey weighted estimating equations, Under the linear regression model, the estimator has a closed form expression. Under the logistic regression model, it requires an iterative computational procedure to find the solution . The variance estimator for is derived based on the theory of estimating equations and has the well-known sandwich form (Binder 1983), where and is the estimated variance-covariance matrix of the Horvitz-Thompson estimator , with and being replaced by for enumerations. The variance estimator given in (2.5)is used again as the default option for most survey software packages on regression analysis. With the vector form of , the estimator of the variance-covariance matrix is given by Chapter 7 of (Wu and Thompson 2020) contains detailed discussions on regression analysis using survey data.

In survey sampling, estimation of regression coefficients or other parameters related to a model is often referred to as analytic use of survey data. It is apparent from the estimating equation system given in (2.6) and the sandwich variance estimator specified in (2.8) that rescaling the design weights by a constant does not change the point estimator or the variance estimator. Survey agencies sometimes provide the so-called analytic weights as part of the survey datasets. These weights are rescaled from the original design weights such that the sum of the analytic weights equals to the sample size.

Reference

Binder, David A. 1983. “On the Variances of Asymptotically Normal Estimators from Complex Surveys.” International Statistical Review / Revue Internationale de Statistique 51 (3): 279. https://doi.org/10.2307/1402588.

Wu, Changbao, and Mary E. Thompson. 2020. Sampling Theory and Practice. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-44246-0.