Brian Caffo Shares His Impression of the Massive Datasets Opening Workshop

The following post is from Brian Caffo, Associate Professor, Department of Biostatistics at the Johns Bloomberg Hopkins School of Public Health at Johns Hopkins University. He recently attended the opening workshop for the Massive Datasets Program.

Close up of Brian Caffo

Brian Caffo, Johns Hopkins University. Photo by Jay VanRensselaer.

SAMSI’s opening workshop for its Massive Datasets program was a resounding success. It brought together applied mathematicians, computer scientists and statisticians of various stripes together to self organize into likely fruitful collaborations. The conference included many talks and panels on methods developments and applications for analyzing large data sets. The conference was absolutely top notch. The speakers and organizers should be applauded.

To stop this blog post from being an entirely congratulatory exercise, however, I’d like to raise some points of discussion for the relevant fields.  It was inescapable to notice that a prevailing style of methodology research exists from the talks at SAMSI. Particularly, researchers search for the core elements of commonality in massive data set problems and go after them with general solutions that would apply across settings.  The goal is then new methodology that applies broadly. This has been the model for statistics, applied mathematics and computer science for quite some time.

My question is, “Is this model sufficient for the challenge of big data?” An alternative strategy would be to focus on specific big data problems. The concern being that when focusing on the general, interesting specific aspects of a unique large data set get lost. So, for example, instead of worrying about general theory, such as computational orders of magnitude, asymptotics and optimality,  worrying about successfully fitting meaningful models on giant benchmark data sets that serve as rallying points. My guess, is that this strategy might serve better for bringing researchers together to work collaboratively, or even competitively towards common goals.

These points notwithstanding it was a great conference.  My experience probably mirrored others in that I learned quite a bit, got a chance to catch up with friends and got to meet great researchers that will likely result in collaborations.