6.2 Linear regression analysis

The linear regression model generally works with continuous responses. However, most of the variables in the CLSA Tracking cohort are categorical, ordinal, and counts. The only two numerical variables suitable for such analysis are self-reported height and weight, so we select them in the statistical analysis for illustrative purposes only. Let us consider a regression model with the self-reported height as the response variable and the weight as the key predictor. We include province, sex, age group, and education level in the initial model.

LinearReg<-svyglm(HWT_DHT_M_TRM~ HWT_WGHT_KG_TRM + SEX_ASK_TRM + 
                  Age_group_5 +  Education + WGHTS_PROV_TRM, 
                  family = "gaussian", design = CLSA.design.anly)
summary(LinearReg)

SAS

PROC SURVEYREG data = CLSAData order = formatted;
CLASS  WGHTS_PROV_TRM(ref = "AB") Age_group_5(ref = "45-48") 
       SEX_ASK_TRM(ref ="F") Education(ref = "Low Education");
STRATA GEOSTRAT_TRM ;
MODEL  HWT_DHT_M_TRM = HWT_WGHT_KG_TRM  SEX_ASK_TRM  Age_group_5 
       Education WGHTS_PROV_TRM / solution ;
WEIGHT WGHTS_ANALYTIC_TRM;
STORE out = LinearReg; 
RUN;

SPSS

Analyze \(\rightarrow\) Complex Samples \(\rightarrow\) General Linear Model… \(\rightarrow\) Select the file “\(\texttt{CLSADesignAnyl.csaplan}\)” in the Plan panel \(\rightarrow\) click “Continue” \(\rightarrow\) select the corresponding variables to the “Dependent Variable”, “Factor” and “Covariate” panels \(\rightarrow\) click “Statistics…” \(\rightarrow\) select “Estimate” and “Standard error” \(\rightarrow\) click “Continue”\(\rightarrow\) click “Save”\(\rightarrow\) click enter the path and file name “\(\texttt{LinearReg.xml}\)” under “Export Model as XML \(\rightarrow\) click”Continue” \(\rightarrow\) Click “OK”.

Stata

svy linearized : regress HWT_DHT_M_TRM HWT_WGHT_KG_TRM SEX_ASK_TRM
                  i.Age_group_5 i.Education  i.WGHTS_PROV_TRM 
estimates save "[Path]\LinearReg.ster", replace

Result comparison

	R		SAS		SPSS		Stata
Population Est.	Coeff.	SE	Coeff.	SE	Coeff.	SE	Coeff.	SE
(Intercept)	1.5427	0.0210	1.5427	0.0212	1.5427	0.0210	1.5427	0.0210
HWT_WGHT_KG_TRM	0.0010	0.0002	0.0010	0.0002	0.0010	0.0002	0.0010	0.0002
SEX_ASK_TRM=“M”	0.0838	0.0062	0.0838	0.0063	0.0838	0.0062	0.0838	0.0062
Age Groups: relative to Age_Gpr0: Age 45-48
Age_Gpr1:Age 49-54	0.0107	0.0097	0.0107	0.0098	0.0107	0.0097	0.0107	0.0097
Age_Gpr2:Age 55-64	0.0144	0.0092	0.0144	0.0093	0.0144	0.0092	0.0144	0.0092
Age_Gpr3:Age 65-74	0.0068	0.0098	0.0068	0.0099	0.0068	0.0098	0.0068	0.0098
Age_Gpr4:Age 75+	0.0131	0.0101	0.0131	0.0102	0.0131	0.0101	0.0131	0.0101
Education Levels: relative to Lower Education
Medium Education	0.0075	0.0083	0.0075	0.0084	0.0075	0.0083	0.0075	0.0083
Higher Education lower	0.0156	0.0094	0.0156	0.0095	0.0156	0.0094	0.0156	0.0094
Higher Education upper	0.0128	0.0085	0.0128	0.0085	0.0128	0.0085	0.0128	0.0085
Provinces: relative to Alberta
British Columbia	-0.0054	0.0107	-0.0054	0.0108	-0.0054	0.0107	-0.0054	0.0107
Manitoba	0.0132	0.0147	0.0132	0.0149	0.0132	0.0147	0.0132	0.0147
New Brunswick	0.0027	0.0135	0.0027	0.0136	0.0027	0.0135	0.0027	0.0135
Newfoundland & Labrador	0.0085	0.0119	0.0085	0.0120	0.0085	0.0119	0.0085	0.0119
Nova Scotia	-0.0114	0.0125	-0.0114	0.0126	-0.0114	0.0125	-0.0114	0.0125
Ontario	0.0001	0.0100	0.0001	0.0101	0.0001	0.0100	0.0001	0.0100
Prince Edward Island	-0.0080	0.0146	-0.0080	0.0147	-0.0080	0.0146	-0.0080	0.0146
Quebec	-0.0069	0.0105	-0.0069	0.0106	-0.0069	0.0105	-0.0069	0.0105
Saskatchewan	-0.0059	0.0119	-0.0059	0.0121	-0.0059	0.0119	-0.0059	0.0119

The fitted model can be used to predict the response at a new data entry. We take the first entry of the dataset as an example for illustration and compare the predicted height and confidence intervals.

Pred1 <- predict( LinearReg, newdata = CLSAData[1, ], se.fit = TRUE)
Pred1; confint(Pred1)

SAS

PROC PLM source = LinearReg ALPHA = 0.05 ;
SCORE data = CLSAData(obs = 1) out = testout predicted STDERR LCLM UCLM / ilink;
RUN;    

PROC PRINT data =testout; 
VAR    predicted STDERR LCLM UCLM ;
FORMAT predicted 10.9 STDERR 10.9 LCLM 10.9 UCLM 10.9;
RUN;

SPSS

Open a new dataset \(\rightarrow\) Utilities \(\rightarrow\) Scoring Wizard \(\rightarrow\) Click “Browse” and enter the corresponding path and select “\(\texttt{LinearReg.xml}\)” \(\rightarrow\) Click “Next >” \(\rightarrow\) Match the variable with model fields \(\rightarrow\) Click “Next >” \(\rightarrow\) Click “Finish”.

Stata

estimates use "[Path]\LinearReg.ster"
predict  p1  if entity_id == 724976,  xb
predict  se1 if entity_id == 724976,  stdp 
display  p1
display  se1
display  p1 - invnormal(0.975)*se1
display  p1 + invnormal(0.975)*se1

Result comparison

	R	SAS	SPSS	Stata
Predicted value	1.5827	1.5827	1.5827	1.5827
Standard error	0.0133	0.0134	0.0133	0.0133
95% lower confidence limit	1.5567	1.5564	Not provided	1.5567
95% upper confidence limit	1.6087	1.6090	Not provided	1.6087