The following is from Rollin Thomas, Lawrence Berkeley National Laboratory, as he shares a little bit of what he talked about at the opening workshop of the Massive Datasets program at SAMSI.
It was my pleasure to tell the audience at the Opening Workshop for the Massive Datasets Program about how astrophysicists are using wide-field time-domain sky surveys to make discovery of short-lived “transient” objects routine, and much less a result of serendipity. It was fun to talk about how Palomar Transient Factory (PTF) wide-field imaging data are moved, stored, processed, subtracted, and then scrutinized by robots and humans to identify new targets for triggered follow-up with other specialized instruments and bigger telescopes. PTF is an instructive case study in how instrumentation, high-performance networking and computing, machine learning, and scientific work-flows combine to help us
do better science faster. The tale of SN 2011fe shows what can be accomplished when all those things have been lined up.
The whole 4-year PTF imaging data set takes up 100+ TB of disk space; not that big of a data set when compared to some found in other domains. The challenge is to plow through it all as it streams in without delay, to find faint signatures of real objects on top of huge backgrounds. Some backgrounds are low level and dealt with easily in hardware and
software: statistical noise, detector artifacts, cosmic rays, sky brightness, etc. But at high level, one person’s background transient is another person’s signal, so you can’t just throw away asteroids if you only personally care about supernovae. Hence, another challenge is
helping a (probably) geographically distributed team of researchers to keep track of both discovery, and each other’s follow-up observations.
PTF has met these challenges, but what about the future? It seems very likely that scaling up to the Large Synoptic Survey Telescope in 10 years will require breakthroughs in networking, computing architectures, software paradigms, artificial intelligence, and more automated distributed scientific work-flows. I encourage participants at the Massive Datasets workshop to think about how their interests match with those challenges now.