4.1 R preparation

In R, the command $read.csv$ is used for importing datasets from CSV files:

CLSAData <- read.csv("[Path]/CLSARealExample.csv", header=TRUE, sep = ",")

Then, we can specify the age groups and declare the survey design with the package $survey$ :

library (survey)
CLSAData$StraVar <-  CLSAData$GEOSTRAT_TRM   
CLSA.design<- svydesign( ids= ~ entity_id,  strata  = ~ StraVar, 
               weights = ~ WGHTS_INFLATION_TRM, data= CLSAData, nest =TRUE )

The option $weights$ specifies the sampling weights. We use the inflation weights for tracking cohort with name “ $WGHTS_INFLATION_TRM$ ”. The analyses of different cohorts would have different weight variables: for analysis involving comprehensive cohort, the label for the inflation weights is “ $WGHTS_INFLATION_COM$ ,” and the label for the strata variable is “ $GEOSTRAT_COM$ ”; for analysis involving combined cohort, the label for the inflation weights is “ $WGHTS_INFLATION_CLSAM$ ,” and the label for the strata variable is “ $GEOSTRAT_CLSAM$ .”

Most proprietary statistical packages would assume single PSU strata to have no contribution to the variance by default (Bruin 2011). We would add the following option:

options(survey.lonely.psu = "certainty")

Reference

Bruin, J. 2011. “Newtest: Command to Compute New Test .” 2011. https://stats.idre.ucla.edu/stata/ado/analysis/.