4.1 R preparation

In R, the command \(\texttt{read.csv}\) is used for importing datasets from CSV files:

CLSAData <- read.csv("[Path]/CLSARealExample.csv", header=TRUE, sep = ",")

Then, we can specify the age groups and declare the survey design with the package \(\texttt{survey}\):

library (survey)
CLSAData$StraVar <-  CLSAData$GEOSTRAT_TRM   
CLSA.design<- svydesign( ids= ~ entity_id,  strata  = ~ StraVar, 
               weights = ~ WGHTS_INFLATION_TRM, data= CLSAData, nest =TRUE )

The option \(\texttt{weights}\) specifies the sampling weights. We use the inflation weights for tracking cohort with name “\(\texttt{WGHTS_INFLATION_TRM}\)”. The analyses of different cohorts would have different weight variables: for analysis involving comprehensive cohort, the label for the inflation weights is “\(\texttt{WGHTS_INFLATION_COM}\),” and the label for the strata variable is “\(\texttt{GEOSTRAT_COM}\)”; for analysis involving combined cohort, the label for the inflation weights is “\(\texttt{WGHTS_INFLATION_CLSAM}\),” and the label for the strata variable is “\(\texttt{GEOSTRAT_CLSAM}\).”

Most proprietary statistical packages would assume single PSU strata to have no contribution to the variance by default (Bruin 2011). We would add the following option:

options(survey.lonely.psu = "certainty")  

Reference

Bruin, J. 2011. Newtest: Command to Compute New Test .” 2011. https://stats.idre.ucla.edu/stata/ado/analysis/.