2.1 Estimation of population means

In survey sampling, basic inferential procedures are developed for the estimation of finite population means. For the study variable yy, the population mean under stratification is given by

μy=1NHh=1Nhih=1yih.μy=1NHh=1Nhih=1yih.(2.1)

Estimation of a disease prevalence is a special case of estimating a population proportion with a binary study variable yy. The design-unbiased Horvitz-Thompson estimator of μyμy is given by
ˆμyHT=1NHh=1ihShwihyih.^μyHT=1NHh=1ihShwihyih.(2.2) The stratum design weight wih=π1ihwih=π1ih is often interpreted as the number of units in the population represented by the unit ihih in the sample (Lohr 2010; Wu and Thompson 2020). The Horvitz-Thompson estimator for the population total Ty=Hh=1Nhih=1yihTy=Hh=1Nhih=1yih is given by ˆTyHT=Hh=1ihShwihyih^TyHT=Hh=1ihShwihyih, which is also called the expansion estimator. The design weight wihwih is also called the inflation weight. The population size NN is sometimes unknown to data users. An unbiased estimator of NN is given by ˆN=Hh=1ihShwih^N=Hh=1ihShwih. The resulting estimator of μyμy is the so-called Hˊa´ajek estimator given by ˆμyH=ˆTyHT/ˆN^μyH=^TyHT/^N.

The theoretical design-based variance of the Horvitz-Thompson estimator given in (2.2) involves both the first-order and the second-order sample inclusion probabilities πihπih and πihih. Under stratified sampling, the stratum samples Sh, h=1,,H are independent. The general theoretical variance formula is given by Var(ˆμyHT)=N2Hh=1Var(ihShyihπih)=N2Hh=1[Nhih=1Nhih=1(πihihπihπih)yihπihyihπih]. The conventional unbiased variance estimator for the Horvitz-Thompson estimator is given by ^Var(ˆμyHT)=N2Hh=1[ihShihShπihihπihπihπihihyihπihyihπih].

In practice, complex survey datasets, such as the CLSA datasets used in this paper, usually do not provide the joint inclusion probabilities πihih which are required for computing the variance estimator given in (2.3). Most statistical software packages for survey data analyses use approximate variance estimators to bypass this difficulty.

When sampled units are drawn with-replacement, with selection probabilities zih, ih=1,,Nh for each selection, the Hansen-Hurwitz estimator (Hansen and Hurwitz 1943) of μy has the same algebraic form of the Horvitz-Thompson estimator if we let πih=nhzih. The unbiased variance estimator for the Hansen-Hurwitz estimator is given by ˆV0=N2Hh=11nh(nh1)ihSh(nhwihyihihShwihyih)2. The variance estimator ˆV0 does not involve second-order inclusion probabilities and provides a good approximation to the variance estimator given in ((2.3)) if the sampling fractions fh=nh/Nh are small for the original without-replacement survey design. When the sampling fractions are not small, an ad hoc adjustment to (2.4) is to apply the finite population correction factor 1fh within each stratum. The resulting variance estimator is given by ˆV=N2Hh=11fhnh(nh1)ihSh(nhwihyihihShwihyih)2.

The variance estimator ˆV given in (2.5) is exactly design-unbiased for stratified simple random sampling. For general stratified unequal probability sampling, the performance of ˆV varies depending on the original survey design. There exist other approximate variance formulas not involving second-order inclusion probabilities and performing better for certain designs. See, for instance, (Haziza, Mecatti, and Rao 2008) for further details. The variance estimator ˆV is the default option for most survey packages, including R,\ SAS,\ SPSS and Stata.

Stratified multi-stage sampling can use approximate variance estimators similar to ˆV if sampling fractions for the first stage clusters are small within each stratum. For cases where N is unknown and the Hˊajek estimator ˆμyH is used, the variance estimator ˆV given in (2.5) needs to be modified with N being replaced by ˆN and the study variable yih being substituted by the residual variable eih=yihˆμyH for computing the variance estimator. Further details can be found in (Wu and Thompson 2020).

Reference

Hansen, Morris H., and William N. Hurwitz. 1943. On the Theory of Sampling from Finite Populations.” The Annals of Mathematical Statistics 14 (4): 333–62. https://doi.org/10.1214/aoms/1177731356.
Haziza, D., F. Mecatti, and J. N. K. Rao. 2008. “Evaluation of Some Approximate Variance Estimators Under the Rao-Sampford Unequal Probability Sampling Design.” Metron 66 (1): 91–108. https://econpapers.repec.org/article/mtnancoec/080105.htm.
Lohr, Sharon L. 2010. Sampling: Design and Analysis (Advanced Series). 2nd ed. Boston, MA: Richard Stratton.
Wu, Changbao, and Mary E. Thompson. 2020. Sampling Theory and Practice. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-44246-0.