4.1 R preparation

In R, the command read.csv is used for importing datasets from CSV files:

CLSAData <- read.csv("[Path]/CLSARealExample.csv", header=TRUE, sep = ",")

Then, we can specify the age groups and declare the survey design with the package survey:

library (survey)
CLSAData$StraVar <-  CLSAData$GEOSTRAT_TRM   
CLSA.design<- svydesign( ids= ~ entity_id,  strata  = ~ StraVar, 
               weights = ~ WGHTS_INFLATION_TRM, data= CLSAData, nest =TRUE )

The option weights specifies the sampling weights. We use the inflation weights for tracking cohort with name “WGHTS_INFLATION_TRM”. The analyses of different cohorts would have different weight variables: for analysis involving comprehensive cohort, the label for the inflation weights is “WGHTS_INFLATION_COM,” and the label for the strata variable is “GEOSTRAT_COM”; for analysis involving combined cohort, the label for the inflation weights is “WGHTS_INFLATION_CLSAM,” and the label for the strata variable is “GEOSTRAT_CLSAM.”

Most proprietary statistical packages would assume single PSU strata to have no contribution to the variance by default (Bruin 2011). We would add the following option:

options(survey.lonely.psu = "certainty")  

Reference

Bruin, J. 2011. Newtest: Command to Compute New Test .” 2011. https://stats.idre.ucla.edu/stata/ado/analysis/.