In this Q&A session with Dr. Steve Smith, Lead Data Scientist at Hoonuit, we explore several aspects of predictive analytics during COVID-19. Dr. Smith explains how Hoonuit has adjusted its predictive models during widespread distance learning and school closures, Hoonuit’s innovative work using Student Growth Percentiles and its applications for educators, as well as the different questions educators are exploring around potential learning loss.
Steve Smith: The amount of data available in education today is staggering and can be overwhelming. Predictive modeling uses data and outcomes from the past to make predictions on future outcomes for current students. Displaying the data this way puts it to work for educators, allowing them to make instructional decisions today based on the likelihood of future outcomes. Hoonuit is leveraging predictive analytics and data modeling to provide educators deeper insights into their students faster and more easily than they could otherwise. We’re seeing amazing results through our Early Warning and Student Success solutions based on client feedback.
Predictive analytics can be used to make instructional decisions and engage in goal setting for individual students. Retrospectively, I think it’s also useful to see if our instructional modifications are working to influence a student growth or probability of a specific outcome, so we can ask and learn “how much did they miss their predicted value?” and “is the student making progress over time toward reaching an outcome?”
Besides being used to make decisions on individual students, predictive analytics also can be utilized to provide useful tools for resource planning at the classroom, grade, school, and district levels. For example, “what percent of the students in our grade are likely to be proficient?” It is beneficial for student goal setting and resource planning at levels; from the organization down to the individual.
Steve Smith: For Early Waring and Student Success, the predictions are represented within our dashboards as a probability. For example, the likelihood or probability that a student will graduate on-time. We further use a subsequent ROC analysis on “test” data to determine what level of the probability is high risk for the outcome. The dashboards display the probability and are color-coded to highlight the risk levels; red for high-risk, yellow for moderate-risk, and green for low-risk.
For assessments, we do something a little different. We use student test score data from the past to “predict” their most recent score. This isn’t truly a prediction, as in making an inference about a future outcome. Rather, this analysis yields student growth in the context of other students with similar test score histories. In addition, we also use the data to make a true prediction on test scores; next interval, year, or multiple years forward. The dashboard display shows student growth as measured by student growth percentiles (1-99th percentile), as well as future projected test scores. This is done for both state and interim assessments, such as NWEA’s MAP, and STAR. It can also be used for local assessments with appropriate psychometric characteristics.
Steve Smith: Everything about education was upended and that includes the education data we use in our predictive models. Assessments didn’t really happen. We’re unsure how attendance was recorded. Attendance may have been recorded differently within and between various districts. We also, leverage discipline events in many of our models, and if you’re not in school, you’re also not getting in trouble on school grounds. There are a lot of unknowns and things that we, like everyone else, are not able to account for. With the way we pipe the data, we are able to inclue/exclude certain data from specific periods of time. We’ve done work using both, but observe the data relating best to prior cohorts using the data from 2019-2020 at the semester as a proxy for the entire school year.
Steve Smith: I don’t believe they have lost value. In fact, predictive analytics are a key tool to help educators form a richer and fuller picture of their students and will be especially useful during the 2020-21 academic year when face-to-face learning is limited. So we have put a lot of time and effort into exploring ways that we can modify our models to account for these data challenges and we’ve adjusted accordingly. In all likelihood, this may very well continue into the next school year or two. We will continue to do our work, as well as obtaining feedback from clients as to what works best.
Steve Smith: We built a new predictive model using a different matrix of covariates to off-set student data anomalies from the Spring of 2020. For our Early Warning and Students Success models … we’re treating the second semester of the 19-20 school year, the “COVID year” as a “Black Swan,” and we did not include that semester. We did, however, include an additional lagged year of data… two years of prior data to maintain, and in some cases improve the accuracy as measure in AUC.
Now the underlying assumption is that the prediction is “as if everything were normal” and there was no instructional disruption. We can’t know what the true effect of distance learning is going to be just yet. We haven’t observed that in relation to the outcome at all grade levels. It is still reasonable to interpret the probabilities from Early Warning and Student Success and interpret them as relative risk. As an example, before we could say confidently that 90% probability means that if you have a hundred kids, 90 of them will ultimately graduate on time. I don’t know if they’re going to calibrate exactly that way because we can’t yet model the effect of the 2019-2020 school year. So we’re saying although the calibration might not hold up … the students’ relative risks do hold up. For example, a probability of 90% has lower risk than a student with a 60% probability of not graduating on-time.
Steve Smith: We have been designing a new product using the Student Growth Percentile methodology developed by Damian Betebenner out of UC – Berkeley. We’ve defined like this … compared to other students with similar test score histories … how much did a specific student grow? Or in other words, how much did the student grow relative to other students who are like that particular student?
And that’s going to be relative to other students that also experienced this semester of distance learning. We’re also going to look at that growth measure and compare how students grew throughout this last school year (the COVID-19 year) versus how students grew in the past. This would be a separate SGP using growth from prior cohorts to obtain the various quantiles. It can address the question of what their current growth is compared to students that did not experience distance learning.
In other words, we’ve made an adjustment for is that state assessments are going to have a year without data. So the growth will be measured over a two-year period. For example, from third grade to fifth-grade or fourth grade to sixth grade. We’ll see how their growth compares to students over a two-year period that didn’t have that disruption. We’ve also made the same adjustment to benchmark, or interim, assessments missing the spring 2020 assessment.
Steve Smith: I think it depends on the level with which we aggregate that growth measure. I think it would be useful to look and see if there any subgroups that experience learning loss at a greater degree than others – so we can break it down by race, ethnicity, or any subgroup. We can also examine it by grade level and look across a school or a school district. Are there pockets of grade levels or schools that tended to have less learning loss than others? And we can use those results to say … if it didn’t happen here what did we do to make that learning loss less evident? And can we replicate that in other places?
Steve Smith: Yes, it will be a strong tool. One question we don’t know just yet is how long might learning loss be evident? So by tracking it over time we can see if gaps are closing. Hypothetically, maybe a student or cohort experienced learning loss from third to fifth grade, but somewhere in sixth, seventh, and eighth grade they caught up and sustained their growth. All indications suggest that there will be learning loss based on the overall disruption that occurred to the ‘normal’ instructional process. It will critical to identify where it happened, hypotheses about why it happened, and measure instructional modifications to engage in best practices moving forward.
In the upcoming school year (2020-21), we’re going to see different sorts of instructional modalities in different pockets throughout the entire country. So we will have a good mix to look at and measure growth in a variety of instructional contexts. We will be examining how 100 percent online instruction looks compared with 100 percent in person instruction, and whether hybrid learning differs substantially from either of the two extremes. I think moving forward, those will be good questions to address.
Dr. Steve Smith leads Hoonuit’s data science team. He has a rich background in statistical modeling, research design, program evaluation, educational psychology, and psychological assessment. Previous to Hoonuit, he was a researcher and associate scientist in the University of Wisconsin system (UW-Madison and UW-Milwaukee, respectively). Steve began his career as a school psychologist and served in that capacity for over a decade in Milwaukee Public Schools.