6.2 Linear regression analysis
The linear regression model generally works with continuous responses. However, most of the variables in the CLSA Tracking cohort are categorical, ordinal, and counts. The only two numerical variables suitable for such analysis are self-reported height and weight, so we select them in the statistical analysis for illustrative purposes only. Let us consider a regression model with the self-reported height as the response variable and the weight as the key predictor. We include province, sex, age group, and education level in the initial model.
R
SAS
PROC SURVEYREG data = CLSAData order = formatted;
CLASS WGHTS_PROV_TRM(ref = "AB") Age_group_5(ref = "45-48")
SEX_ASK_TRM(ref ="F") Education(ref = "Low Education");
STRATA GEOSTRAT_TRM ;
MODEL HWT_DHT_M_TRM = HWT_WGHT_KG_TRM SEX_ASK_TRM Age_group_5
Education WGHTS_PROV_TRM / solution ;
WEIGHT WGHTS_ANALYTIC_TRM;
STORE out = LinearReg;
RUN;
SPSS
Analyze \(\rightarrow\) Complex Samples \(\rightarrow\) General Linear Model… \(\rightarrow\) Select the file “\(\texttt{CLSADesignAnyl.csaplan}\)” in the Plan panel \(\rightarrow\) click “Continue” \(\rightarrow\) select the corresponding variables to the “Dependent Variable”, “Factor” and “Covariate” panels \(\rightarrow\) click “Statistics…” \(\rightarrow\) select “Estimate” and “Standard error” \(\rightarrow\) click “Continue”\(\rightarrow\) click “Save”\(\rightarrow\) click enter the path and file name “\(\texttt{LinearReg.xml}\)” under “Export Model as XML \(\rightarrow\) click”Continue” \(\rightarrow\) Click “OK”.
Stata
Result comparison
Population Est. | Coeff. | SE | Coeff. | SE | Coeff. | SE | Coeff. | SE |
---|---|---|---|---|---|---|---|---|
(Intercept) | 1.5427 | 0.0210 | 1.5427 | 0.0212 | 1.5427 | 0.0210 | 1.5427 | 0.0210 |
HWT_WGHT_KG_TRM | 0.0010 | 0.0002 | 0.0010 | 0.0002 | 0.0010 | 0.0002 | 0.0010 | 0.0002 |
SEX_ASK_TRM=“M” | 0.0838 | 0.0062 | 0.0838 | 0.0063 | 0.0838 | 0.0062 | 0.0838 | 0.0062 |
Age Groups: relative to Age_Gpr0: Age 45-48 | ||||||||
Age_Gpr1:Age 49-54 | 0.0107 | 0.0097 | 0.0107 | 0.0098 | 0.0107 | 0.0097 | 0.0107 | 0.0097 |
Age_Gpr2:Age 55-64 | 0.0144 | 0.0092 | 0.0144 | 0.0093 | 0.0144 | 0.0092 | 0.0144 | 0.0092 |
Age_Gpr3:Age 65-74 | 0.0068 | 0.0098 | 0.0068 | 0.0099 | 0.0068 | 0.0098 | 0.0068 | 0.0098 |
Age_Gpr4:Age 75+ | 0.0131 | 0.0101 | 0.0131 | 0.0102 | 0.0131 | 0.0101 | 0.0131 | 0.0101 |
Education Levels: relative to Lower Education | ||||||||
Medium Education | 0.0075 | 0.0083 | 0.0075 | 0.0084 | 0.0075 | 0.0083 | 0.0075 | 0.0083 |
Higher Education lower | 0.0156 | 0.0094 | 0.0156 | 0.0095 | 0.0156 | 0.0094 | 0.0156 | 0.0094 |
Higher Education upper | 0.0128 | 0.0085 | 0.0128 | 0.0085 | 0.0128 | 0.0085 | 0.0128 | 0.0085 |
Provinces: relative to Alberta | ||||||||
British Columbia | -0.0054 | 0.0107 | -0.0054 | 0.0108 | -0.0054 | 0.0107 | -0.0054 | 0.0107 |
Manitoba | 0.0132 | 0.0147 | 0.0132 | 0.0149 | 0.0132 | 0.0147 | 0.0132 | 0.0147 |
New Brunswick | 0.0027 | 0.0135 | 0.0027 | 0.0136 | 0.0027 | 0.0135 | 0.0027 | 0.0135 |
Newfoundland & Labrador | 0.0085 | 0.0119 | 0.0085 | 0.0120 | 0.0085 | 0.0119 | 0.0085 | 0.0119 |
Nova Scotia | -0.0114 | 0.0125 | -0.0114 | 0.0126 | -0.0114 | 0.0125 | -0.0114 | 0.0125 |
Ontario | 0.0001 | 0.0100 | 0.0001 | 0.0101 | 0.0001 | 0.0100 | 0.0001 | 0.0100 |
Prince Edward Island | -0.0080 | 0.0146 | -0.0080 | 0.0147 | -0.0080 | 0.0146 | -0.0080 | 0.0146 |
Quebec | -0.0069 | 0.0105 | -0.0069 | 0.0106 | -0.0069 | 0.0105 | -0.0069 | 0.0105 |
Saskatchewan | -0.0059 | 0.0119 | -0.0059 | 0.0121 | -0.0059 | 0.0119 | -0.0059 | 0.0119 |
The fitted model can be used to predict the response at a new data entry. We take the first entry of the dataset as an example for illustration and compare the predicted height and confidence intervals.
SAS
PROC PLM source = LinearReg ALPHA = 0.05 ;
SCORE data = CLSAData(obs = 1) out = testout predicted STDERR LCLM UCLM / ilink;
RUN;
PROC PRINT data =testout;
VAR predicted STDERR LCLM UCLM ;
FORMAT predicted 10.9 STDERR 10.9 LCLM 10.9 UCLM 10.9;
RUN;
SPSS
Open a new dataset \(\rightarrow\) Utilities \(\rightarrow\) Scoring Wizard \(\rightarrow\) Click “Browse” and enter the corresponding path and select “\(\texttt{LinearReg.xml}\)” \(\rightarrow\) Click “Next >” \(\rightarrow\) Match the variable with model fields \(\rightarrow\) Click “Next >” \(\rightarrow\) Click “Finish”.
Stata
R | SAS | SPSS | Stata | |
---|---|---|---|---|
Predicted value | 1.5827 | 1.5827 | 1.5827 | 1.5827 |
Standard error | 0.0133 | 0.0134 | 0.0133 | 0.0133 |
95% lower confidence limit | 1.5567 | 1.5564 | Not provided | 1.5567 |
95% upper confidence limit | 1.6087 | 1.6090 | Not provided | 1.6087 |