6.5 Ordinal logistic regression analysis

Many variables in complex surveys are ordinal, where the response variable has a well-defined order for the categories. Ordinal logistic regression analysis is suitable to model the relationship between the probabilities of the cumulative categories and the key factors. In the CLSA Tracking cohort, the participants are asked the following question:

How do you feel about your local area, that is, everywhere within a 20 minute walk or about a kilometer from your home? Please tell me how strongly you agree or disagree with the following statements: People would be afraid to walk alone after dark in local area.

There are 4 levels of responses: Strongly Agree (\(d=1\)), Agree (\(d=2\)), Disagree (\(d=3\)), and Strongly Disagree(\(d=4\)) and the results are stored as the variable \(\texttt{ENV_AFRDWLK_MCQ}\). Suppose we are interested in the relationship between the response categories of \(\texttt{ENV_AFRDWLK_MCQ}\) and sex and age. The cumulative logistic regression model has the form

\[\begin{align*} g(\Pr(Y<d \mid \pmb x)) = \alpha_d + \pmb{x}' \pmb \beta, & \;\;\; \mbox{(for } \texttt{SAS}\mbox{)}\\ g(\Pr(Y<d \mid \pmb x)) = \alpha_d - \pmb{x}' \pmb \eta, & \;\;\; \mbox{(for }\texttt{R}, \texttt{SPSS} \mbox{ and }\texttt{Stata} \mbox{)} \end{align*}\]

with \(d=2, 3, 4\), where \(g(\cdot)\) is a link function. We usually specify \(g(\cdot)\) as the logit function, and the model becomes the cumulative logit model (also known as the proportional odds model). For comparison purposes, we multiply \(-1\) to the \(\texttt{SAS}\) coefficient estimates output for \(\pmb \beta\).

R

summary(svyolr(formula = ENV_AFRDWLK_MCQ ~ SEX_ASK_TRM + Age_group_5+ 
              Education + WGHTS_PROV_TRM,  design = CLSA.design.anly,
               na.action = na.omit, method = c("logistic")) )

SAS

PROC SURVEYLOGISTIC data = CLSAData;
CLASS ENV_AFRDWLK_MCQ WGHTS_PROV_TRM(ref = 'AB') Age_group_5(ref = '45-48') 
      SEX_ASK_TRM(ref ='F') Education(ref = 'Low Education')/param = ref;
MODEL ENV_AFRDWLK_MCQ = SEX_ASK_TRM Age_group_5  Education WGHTS_PROV_TRM /clodds ;
STRATA GEOSTRAT_TRM;                           
WEIGHT WGHTS_ANALYTIC_TRM;  
RUN; 

SPSS

Transform \(\rightarrow\) Recode into Different Variables \(\rightarrow\) select \(\texttt{ENV_AFRDWLK_MCQ}\) \(\rightarrow\) Under Output Variable, enter \(\texttt{ENV_AFRDWLK_Num}\) in the Name field and click change \(\rightarrow\) Click “Old and New Values” \(\rightarrow\) Enter “Strongly Agree” under “Old Value” and enter “1” under “New Value” in the “Value fields” , then click “Add” \(\rightarrow\) repeat the process for converting “Agree” to “2”, “Disagree” to “3” and “Strongly Disagree” to “4” \(\rightarrow\) “Continue” \(\rightarrow\) “OK”.

Analyze \(\rightarrow\) Complex Samples \(\rightarrow\) Ordinal Regression… \(\rightarrow\) in the “Plan” panel, select the file “\(\texttt{CLSADesignAnyl.csaplan}\)\(\rightarrow\) click “Continue” \(\rightarrow\) select the corresponding variables to the “Dependent Variable”, “Factor” and “Covariate” panels \(\rightarrow\) click “Response Probabilities” and select “Accumulate from lowest value of dependent variable to highest value” \(\rightarrow\) click “Statistics…” \(\rightarrow\) select “Estimate” and “Standard error” \(\rightarrow\) click “Continue” \(\rightarrow\) click “OK”.

Stata

svy linearized : ologit ENV_AFRDWLK_MCQ i.SEX_ASK_TRM i.Age_group_5 
                 i.b3Education i.WGHTS_PROV_TRM 

Result comparison

R
SAS
SPSS
Stata
Population Estimates Coeff. SE Coeff. SE Coeff. SE Coeff. SE
SEX_ASK_TRM=“M” -0.2834 0.1991 -0.2834 0.1914 -0.2833 0.1991 -0.2833 0.1991
Age Groups: relative to Age_Gpr0: Age 45-48
Age_Gpr1:Age 49-54 -1.1538 0.3958 -1.1538 0.4419 -1.1538 0.3958 -1.1538 0.3958
Age_Gpr2:Age 55-64 -2.3273 0.3670 -2.3273 0.4091 -2.3273 0.3670 -2.3273 0.3670
Age_Gpr3:Age 65-74 -3.0533 0.4539 -3.0533 0.4509 -3.0533 0.4539 -3.0533 0.4539
Age_Gpr4:Age 75+ -2.6253 0.5430 -2.6249 0.4754 -2.6253 0.5430 -2.6253 0.5430
Education Levels: relative to Lower Education
Medium Education -0.0299 0.3011 -0.4574 0.3484 -0.0298 0.3011 -0.0298 0.3011
Higher Education lower -0.4574 0.3498 -0.1075 0.3101 -0.4574 0.3498 -0.4574 0.3498
Higher Education upper -0.1075 0.3200 -0.0299 0.2956 -0.1075 0.3200 -0.1075 0.3200
Provinces: relative to Alberta
British Columbia 0.3047 0.4696 0.3047 0.4341 0.3048 0.4696 0.3048 0.4696
Manitoba -1.0332 0.3975 -1.0332 0.4040 -1.0332 0.3975 -1.0332 0.3975
New Brunswick -0.1808 0.6405 -0.1807 0.5937 -0.1807 0.6405 -0.1807 0.6405
Newfoundland & Labrador -0.9933 0.5100 -0.9932 0.5213 -0.9931 0.5100 -0.9931 0.5100
Nova Scotia 0.2867 0.5028 0.2868 0.4749 0.2868 0.5028 0.2868 0.5028
Ontario 0.2651 0.3650 0.2652 0.3427 0.2652 0.3650 0.2652 0.3650
Prince Edward Island -0.1440 0.4912 -0.1439 0.4795 -0.1439 0.4912 -0.1439 0.4912
Quebec -0.4611 0.3718 -0.4611 0.3604 -0.4610 0.3718 -0.4610 0.3718
Saskatchewan -0.6360 0.4435 -0.6359 0.4366 -0.6359 0.4435 -0.6359 0.4435
(Intercepts)
Strongly Agree|Agree -4.8973 0.5194 -4.8972 0.5330 -4.8971 0.5194 -4.8971 0.5194
Agree|Disagree -3.7190 0.4981 -3.7189 0.5064 -3.7188 0.4981 -3.7188 0.4981
Disagree|Strongly Disagree -2.1126 0.5161 -2.1125 0.5287 -2.1124 0.5161 -2.1124 0.5161

The coefficient for the variable “\(\texttt{SEX_ASK_TRM}\)” is negative but with a large standard error. It suggests that both males and females have a similar feeling about the safety of walking alone after dark in the local area. The negative coefficients of “\(\texttt{Age_Gpr1:Age 49-54}\),” “\(\texttt{Age_Gpr2:Age 55-64}\),” “\(\texttt{Age_Gpr3:Age 65-74}\)” and “\(\texttt{Age_Gpr4:Age 75+}\)” suggest that older people would feel unsafe to walk alone in the dark compared to the reference age group, “\(\texttt{Age_Gpr0:Age 45-48}\).” The results are in line with common sense.