SAMSI Researchers Amonst Team Helping to Predict 2013 Boston Marathon Completion Times

After experiencing a tragic and truncated end to the 2013 Boston Marathon, race organizers were faced not only with grief but with hundreds of administrative decisions, including plans for the 2014 race – an event beloved by Bostonians and people around the world.

One of the issues they faced was what to do about the nearly 6,000 runners who were unable to complete the 2013 race. The Boston Athletic Association, the event’s organizers, quickly pledged to provide official finish times for these runners. Thinking ahead, they also had to consider how to provide these runners with an opportunity to qualify for the 2014 race.

To seek advice on these issues, they contacted Richard Smith, director of the Statistical and Applied Mathematics Sciences Institute (SAMSI) and professor of statistics at the University of North Carolina at Chapel Hill, who also happens to be an avid marathon runner. They asked Smith to come up with a statistical procedure for predicting each runner’s likely finish time based on their pace up to the last checkpoint before they had to stop.

Smith quickly assembled a team of fellow analysts that included Francesca Dominici and Giovanni Parmigiani at Harvard School of Public Health, and Dorit Hammerling, postdoctoral fellow at SAMSI, who were in the 2013 race and finished uninjured. The team also included Matthew Cefalu, Harvard School of Public Health; Jessi Cisewski, Carnegie Mellon University and Charles Paulson, Puffinware LLC.

The results, and the method the researchers developed, were published in the April 11 edition of PLOS ONE.

With the help of the Boston Athletic Association, the researchers created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, and all the runners from the 2010 and 2011 Boston marathons. The data consist of “split times” from each of the 5 km sections of the course (from the start up to 40 km), and the final 2.2 km. The research team was tasked to predict the missing split times for the runners who failed to finish in 2013.

The researchers adapted techniques used in such contexts as computing missing data in DNA microarray experiments and estimating ratings which Netflix subscribers would have given to movies they had not seen. They proposed five prediction methods and created a validation dataset to measure the runners’ performance by mean squared error and other measures. Of the five, the method that worked best used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality.

The KNN method looks at each of the runners who did not complete the race (DNF) and finds a set of comparison runners who finished the race in 2010 and 2011 whose split times were similar to the DNF runner up to the point where he or she left the race. These runners are called “nearest neighbors.”

“We had to come up with a method to compare the runners based on the split points up to a certain point of the race and then had to decide how many of the nearest neighbors to examine in order to develop a prediction for the DNF runner that would be based on the different finishing times of these nearest neighbors,” said Smith, who has run the Boston Marathon in the past and will run this year’s race. “We decided to choose 200 nearest neighbors. We also tried 100 and 300 nearest neighbors, but the results changed only slightly and didn’t make them better.” A Powerpoint presentation of the work can be found here.

The Boston Athletic Association decided to grant entry to the 2014 race to anyone who was stopped from completing the 2013 event, so they will have a chance to complete the Boston Marathon after all. But in the course of developing the method, Smith and his colleagues realized there were other uses for the technique.

“We have found that using the KNN method looking at a runner’s intermediate split-time will also be useful in predicting the person’s completion time while the race is in progress,” said Smith. “This can be helpful for relatives and friends to be able to meet the person at the finish line.”

The local television station, WRAL, ran a really nice story about the work the team did. You can watch the story here.

Boston Marathon 2013

The following are remarks from Richard Smith, Director of SAMSI.

Three people closely associated with SAMSI were participating in yesterday’s Boston marathon. One of our postdocs, Dorit Hammerling, completed the race in an outstanding time of 3 hours, 21 minutes. Even faster (by a few seconds) was Francesca Dominici of Harvard University, who will be our Simons Foundation Public Lecturer next week. Francesca’s husband, Giovanni Parmigiani, formerly a professor at Duke, also finished the race in a very good time. Congratulations to all three.

Talk of performances and finish times is, however, overshadowed by the tragic events that unfolded after the race. Dorit, Francesca and Giovanni all finished well in front of the bombs and escaped without harm. We at SAMSI express our gratitude that our friends and colleagues are safe, and our sadness and condolences for those who were not so fortunate.

These events resonated particularly with me as I have run the Boston marathon multiple times, and even provided some statistical input when the organizers revised their qualifying times a couple of years back. The Boston marathon is the greatest of all running events and I am sure it will continue. I am determined to go back and run it again myself.

Richard Smith, Director of SAMSI