Introduction

The following paper further contextualizes recent findings of a policy analysis of Salt Lake Community College (SLCC) concurrent enrollment (CE) courses. In the previous analyses, we modeled the behavior of Math CE students compared to traditional students over two distinct sets of years using a difference in differences modelling approach which measured aggregate differences indiscriminant of student age in the sample.

The following paper eschews the difference in differences framework, expands the set of years in the sample, broadens the sample to CE courses outside of math and uses a logistic mixed effects model and a linear mixed effects model.

Exective Summary

Data

To construct the data set used for this analysis, we gathered all student course records for CE and traditional enrollments from courses that were offered as concurrent enrollments at SLCC between Spring semester 2010 and Summer semester 2018. We also pulled information about the subsequent course that each student enrolled in that was in the same course subject at SLCC.

So for example, if a student enrolled in a Math course offered at SLCC (such as Math 1010) in Spring 2015, we pulled the academic record for whichever math class that student enrolled in after Spring 2015. Similarly if a student enrolled in English 1010 in Fall 2010, we pulled the record for that student’s next English class after Fall 2010. If a student did not have a subsequent course, their record was dropped from the data set.

The fields we collected for the sample are shown on the following page

Table 1
Student ID Initial Course Semester Enrolled in Subsequent Course
Student Last Name Subsequent Course GPA-Equivalent Final Grade in Initial Course
Student Age in Initial Course Student Age in Subsequent Course GPA-Equivalent Final Grade in Subsequent Course
Former CE Indicator High School GPA Quintile Passed Initial Course Indicator
Student Ethnicity Cumulative GPA Quintile in Subsequent Course Passed Subsequent Course Indicator
Student Gender Cumulative GPA Quintile in Initial Course GPA-Equivalent Grade Change from Initial to Subsequent Course
First-Generation Indicator Year Enrolled in Initial Course Semesters Between Initial and Subsequent Courses
Ever Pell-Eligible Indicator Year Enrolled in Subsequent Course General Studies Indicator

Methods

We specified two separate Multi-level models. The first was a logistic mixed effects model, and the second was a linear mixed effects model. For each of these models we used the same sample of former concurrent students matched to students who were never concurrently enrolled at SLCC.

Sample Selection

To eliminate potential bias introduced by self-selection into control and treatment groups, we used two sample selection methodologies - hard filters and propensity score matching.

Hard Filters

In order to measure any causal effect that exists, we select treatment and control samples that are similar in order to eliminate bias from unaccounted sources of variation. Applying selection criteria through hard filters is a common practice in causal analysis. From the larger sample of data, we only selected students who had passed their initial course regardless of whether they were former concurrent or not. Since the linear model uses the GPA difference between the first and second course, we only selected students who passed their initial course on their first attempts, as second attempts would bias the estimates. Similarly we did not want repeated attempts of the subsequent course to bias estimates so we only looked at first attempts taking a subsequent class.

The second hard filter we applied sought to eliminate bias resulting from the age of former concurrent students. It is not difficult to imagine that more mature students will be better-equipped or more driven to do well in college. Rather than worry about what the influence of age is on the models, we only selected students who were 18 or 19 years-old at the time of enrolling in their initial course.

Finally, we only selected courses that happen in a sequence. Though there are many CE courses offered at SLCC, The enrollments that ended up meeting these criteria left us with only Math, English, and Communication courses.

Propensity Score Matching

In the case of this analysis, we compared former CE students to traditional students who were never CE. For each student we calculated the likelihood of being a CE student based on the interaction of gender, ethnicity, pell status, and first-generation status. We then created a one-to-one matched sample based on the closest propensity score matches in each category of former CE status. We plot the distribution of the calculated likelihood in each category in Figure 1.

Figure 1

Figure 1

The final matched sample is 561 former CE students and 561 traditional students all of whom were 18 or 19 years-old at the time of their initial enrollment, who never repeated their initial course or their subsequent course, and who passed their initial courses, matched via the propensity scores based on their gender, ethnicity, pell status, and first generation status.

Statistical Modelling

Because of the nested structure of academic data, straightforward logistic or linear modelling, which assume no systemic difference between courses, years, or semesters, will invariably bias model estimates in the opposite direction of any effect that exist within those nested structures. For example if it is the case that the year 2013 had a new determination for passing that caused uncommonly high pass rates, a straightforward logistic regression would show a slightly less positive effect for 2013 than actually existed, and slightly more negative effects for all years except 2013. Multi-level or hierarchical models account for these sorts of systemic differences in what are called random and fixed effects.

Heirarchical Logistic Model

The first model we evaluate is a hierarchical model of a binary response variable of whether or not the student passed their subsequent class on the fixed effects of the students’ CE status, age at the time of the second course enrollment, high school GPA, general studies indicator, and the student’s cumulative GPA at the time of enrollment in the second course.

Heirarchical Linear Model

The second model we evaluated is a hierarchical model of a continuous response variable of the GPA-equivalent grade difference between the second class and the first class on the fixed effects of the students’ CE status, age at the time of the second course enrollment, high school GPA, general studies indicator, and the student’s cumulative GPA at the time of enrollment in the second course.

Results

Passing

The first logistic multi-level model evaluates the random and fixed effects on a binary response variable indicating whether or not the student passed their subsequent course. Table 2 reports the 95% confidence interval of the estimate for former concurrent students, and the statistically significant fixed effects of the model. see Appendix for the full specification of the fixed effects and the estimates of each variable.

Table 2: Fixed Effects on the Probability of Passing Subsequent Course
Name Probability Lower Bound Upper Bound
Concurrent Student Indicator 8.810 -0.020 17.100
High School GPA between 3.5990 and 3.7946 15.970 1.060 28.260
High School GPA greater than 3.7946 35.840 17.500 44.650

The multi-level model that estimates the probability of passing the subsequent course for each student in the sample. Table 2 reports these probabilities in terms relative to a coin toss. In other words, since the null model assumes no difference between CE students and traditional students, we would expect both groups have an equal probability of passing, ie 50%. We can say that students in the fourth quintile (See Appendix for an explanation of what quintiles are) of high school GPA in the sample on average have a probability of passing their subsequent class that is between 1.06 and 28.26% higher than students with no record of high school GPA. Note, because it is the focus of the analysis, we include the estimate for former concurrent students even though it is not statistically significant. The multi-level model of the aggregate does not indicate a statistical difference in terms of passing between traditional and former CE students when we account for the the subsequent course. We assume that these effects are consistent and conditional on the values of the random effects, shown in Figure 2.

Figure 2

Figure 2

We can see in Figure 2, the random effects of the subsequent course each student in the sample enrolled in does not indicate any course with statistically significant effects other than Math 1050, Math 1030, and English 2010. Recall, we model these estimates independent of the fixed effects included in the model, ie these effects apply to both traditional and former CE students alike. All courses for which the error bars do not cross the the line y = 0, are statistically significant and different from zero. In the left-most points on the plot, we can see this is only the case for Math 1050, and Math 1030 for which students have a statistically lower than average probability of passing equal to -24.23% and -20.45% respectively. The point second from the right, we can see students whose subsequent course was English 2010 have a statistically higher than average probability of passing equal to 19.42%. Recall the previous analysis focused specifically on students in Math 1030, 1040, and 1050, and with the exception of Math 1040, all of these classes have a statistically significantly lower probability of passing.

Figure 3

Figure 3

Figure 3 shows the random effect that the year of each student’s subsequent course enrollment has on passing. Compared to the Figure 2 where the effect of the course the student enrolled in captures a great deal of the variation we see in the probability of passing, it makes sense if course is the primary driver of pass rates that year estimates would remain consistent over the course of the sample. We can’t say that any one year is statistically different from the rest in terms of the probability a student will pass their subsequent course because the estimates are all close to zero, and their confidence intervals overlap almost perfectly. What’s more, since we included the random effect of the year and semester of subsequent enrollment in the model’s random effects, we can say the effect of the subsequent course seen in Figure 2 is independent of the year the student enrolled. In other words, we have isolated the random effects in the data and shown evidence that the year of the subsequent enrollment has little effect on the probability of passing a course.

Figure 4

Figure 4

Similar to the random effects for year, when we account for the variation explained by the fixed effects and other random effects, we cannot claim the semester a student enrolls in a subsequent course in a sequence has any effect on the probability of passing that subsequent course either positively or negatively.

GPA Change

The second model uses a measure we will refer to as GPA Change. The GPA change is the difference between the student’s GPA-equivalent final grade in the first course, \(GEFG_{First}\), and the GPA-equivalent final grade they earned in their second course,\(GEFG_{Second}\). \[GPA_{Change} = GEFG_{Second}-GEFG_{First}\] So for example, if a student earned a B in their first course their GPA-equivalent final grade would be 3.0, \(GEFG_{First}=3.0\). If they then earn a letter grade of “A” in their second course, \(GEFG_{Second}=4.0\), the \(GPA_{Change}\) for that student, \(GPA_{Change} = 4.0-3.0 = 1.0\).

Table 3: Fixed Effects on the GPA-Equivalent Grade Change
Name Estimate LB UB
Former Concurrent Indicator 0.167 0.014 0.319
High School GPA between 3.599 and 3.7946 0.330 0.090 0.570
High School GPA greater than 3.7946 0.434 0.188 0.681

Similar to Table 2, Table 3 only reports estimates that are statistically different from 0. We can see in the aggregate former CE students earn a grade in their subsequent course in the same subject between 0.012 and 0.317 GPA-equivalent points higher than their traditional student counterparts. It’s not as large an effect as that for the fourth and fifth high school GPA quintile, but the effect is definitely present. Like in the logistic hierarchical model, the estimates are assumed to be independent of the random effects for the course semester and year of the subsequent enrollment which we include in the following plots.

Figure 5

Figure 5

Figure 5 shows the estimated random effect of the subsequent course students take. Random effects that don’t show error bars crossing the line y = 0 are not statistically different from zero. We can see in Figure 5, that only Math 1050, English 2010, English 2100, and Communication 2150 have random effects statistically different from zero. This result is consistent with the findings from previous models wherein Math courses appear to be the primary drivers of lower achievement. Recall these estimates do not include an interacted term for concurrent students as they relate to these math classes. The estimate for these differences will be captured in the fixed effects terms reported in the above table. However, the estimated negative effect of Math 1050 is large enough to make up for the positive effect of being a former CE student. Even though CE students have a more positive grade change than non-CE students, we still expect former CE students’ grades in Math 1050 to be 0.27 letter grades lower than the grades they earned in their prior math courses.

Figure 6

Figure 6

Based on the error bars in the above plot, we can’t say that there is a statistically significant systemic difference between each of the years present in the random effects. We can however, provide a bit of context for these data. In Figure 7, we illustrate separate time series of both components of the GPA change response variable for traditional and CE students.

Point estimates and standard errors for the random effects of the second semester enrollment are all zero values so we did not include that plot.

Figure 7

Figure 7

Figure 7 illustrates that that the continuous response variable in this multi-level model is not consistent over the course of the sample. We can think of the blue series in the top portion of the plot as the average grades assigned to CE students enrolled at a high schools and the yellow line beneath it as the average grades for those students when they enroll in a college course at SLCC.

The blue series in the bottom part of figure 7 are the average grades of non-CE students taking their first class at SLCC, and the yellow series beneath that as the average grades for traditional students in taking their second course in a series. If there was no difference between these CE students and traditional students, we would expect both yellow lines to follow the same path but since this is not the case, we have evidence that that the differences between CE and traditional students tend to shift over time, and appear more erratic for former CE students. On average, we see SLCC’s instructors had a dramatic shift in assigning grades to former CE students from 2010 to 2012, where the opposite effect shows up for traditional students over the same time period. the highest grade differences between 2010 and 2012, driven first by CE students and then by traditional students. Note that the estimates in the plot are independent of the random and fixed effect in the multi-level model, and only report the aggregates of these measures across each year and student type.

Conclusion

In the previous policy analysis which was restricted to a small selection of Math courses and years, our difference in differences estimates of the effect of a policy change in 2016 showed evidence that CE students in Math 1030, 1040, and 1050 on average perform worse in their subsequent classes than their traditional counterparts driven primarily by the age of students in the sample. We also simultaneously saw CE students experienced a smaller drop in their subsequent course grade differences over time than did the traditional students. Based on the results from this analysis, we can say with confidence that this finding does not persist in other course subjects. A more focused analysis within individual courses might illuminate more nuance, like we saw in the previous math analysis.

Appendix

Full Logistic Model Fixed Effects

Table 4: Fixed Effects on the Probability of Passing Subsequent Course
Name PROB PROB_LB PROB_UB
(Intercept) 9.260 -47.460 48.780
EVER_CONCURRENT_INDY 8.810 -0.020 17.100
fquint.HS_GPA2First 4.260 -8.260 16.270
fquint.HS_GPA2Second 2.130 -10.440 14.440
fquint.HS_GPA2Third 12.400 -2.250 25.080
High School GPA between 3.599 and 3.7946 15.970 1.060 28.260
High School GPA greater than 3.7946 35.840 17.500 44.650
d.GEN 5.220 -3.380 13.530
fquint.NEXT_PRIOR_GPA2First -22.010 -47.770 36.880
fquint.NEXT_PRIOR_GPA2Second -8.620 -46.030 42.350
fquint.NEXT_PRIOR_GPA2Third 6.760 -42.890 45.750
fquint.NEXT_PRIOR_GPA2Fourth 20.560 -37.820 47.640
fquint.NEXT_PRIOR_GPA2Fifth 25.010 -35.320 48.130
NEXT_COURSE_AGE 0.300 -3.400 4

Full Linear Model Fixed Effects

Table 5: Fixed Effects on the GPA-Equivalent Grade Change
Name Estimate LB UB
(Intercept) -1.948 -4.034 0.137
EVER_CONCURRENT_INDY 0.167 0.014 0.319
fquint.HS_GPA2First 0.118 -0.134 0.371
fquint.HS_GPA2Second 0.173 -0.067 0.413
fquint.HS_GPA2Third 0.187 -0.065 0.438
High School GPA between 3.599 and 3.7946 0.330 0.090 0.570
High School GPA greater than 3.7946 0.434 0.188 0.681
d.GEN 0.138 -0.009 0.284
fquint.NEXT_PRIOR_GPA2First 0.009 -1.666 1.685
fquint.NEXT_PRIOR_GPA2Second 0.137 -1.540 1.814
fquint.NEXT_PRIOR_GPA2Third 0.213 -1.464 1.889
fquint.NEXT_PRIOR_GPA2Fourth 0.326 -1.349 2.002
fquint.NEXT_PRIOR_GPA2Fifth 0.241 -1.435 1.918
NEXT_COURSE_AGE 0.044 -0.019 0.108

Figure 8

Figure 8

Figure 9

Figure 9

Quintile

Any of five equal groups into which a population can be divided according to the distribution of values of a particular variable. So put all values of high school GPA in ascending order, and the the lowest 20% of GPAs are in the first GPA quintile. The GPAs that are higher than the bottom 20% and lower than the top 60% of high school GPAs are in the second quintile, etc.