Former Postdoc Kenneth Lopiano Speaks at RTP180

Dr. Kenneth Lopiano, co-founder of Roundtable Analytics and former postdoctoral fellow at SAMSI, spoke to a sold out crowd last night at the RTP180 event. RTP180 is a monthly after-hours get together where speakers spend about 5 minutes talking about a topic they are passionate about, and that highlights some of the research happening in the Triangle region. It’s kind of like a mini TED talk meets Pecha Kucha.

Kenneth Lopiano on stage

Kenneth Lopiano talking at RTP180.

Lopiano spoke about the simulation model he and others developed to help ER departments become more efficient. You can read more about it here.

Some of the comments on Twitter included: @nxtstop1 “”Round table analytics” ~ does work in ERs using simulation models to determine best practice for that particular dept~

@Jnewbay “Emergency departments moving more efficiently? I’m in! Shorter wait times in the ER?

@bentanthony01 “ pitching at – Are you tired of waiting at Emergency Department? ED simulation models

@HealthView “We need actionable insights to healthcare data says Roundtable Analytics <Hear! Hear!”

You can watch the full video, including Kenneth Lopiano’s presentation here.

Predicting number of landfalls of hurricanes — Undergraduate Modeling Workshop produces forecasts for 2013

group shot of undergraduates attending May 2013 workshop

Undergraduate workshop from May 2013.

Thirty-four undergraduate students from around the U.S. came to SAMSI and NC State University the week of May 13-17. During the week, the students interacted with an atmospheric scientist who works on hurricane research, and applied mathematicians and statisticians who work on climate research.  Students used the same database as used at NCSU to forecast various aspects of future hurricane seasons, and built Poisson regression models within R to produce their own forecasts of the 2013 hurricane season in the US. Below are some comments from participants:

three students with signs

Corey Raphael, U. Florida, Jonathan Skantz, U. Florida and Gwen Tian, U. British Columbia.

Corey Raphael, University of Florida
“I had a great time during my week at SAMSI! I learned all about climate science and hurricane predictions, and met a lot of great people. Thanks for all the advice and free food! I enjoyed getting to know the Raleigh area, and I learned a lot about R that I didn’t know previously. I hope the program enjoyed having me as much as I enjoyed being here!”

Group 3 shot

Evan Bittner, Penn State, Kasey Palmquist, UNC Wilmington, and Daria Drozdova, Pomona College.

Kasey Palmquist, University of North Carolina at Wilmington
“The workshop was an excellent experience; I truly feel that I am not leaving empty-handed. I not only learned new methods of statistical analysis, but how to collaborate with a group of people on a research topic. I found this workshop beneficial because it allows undergraduates to get a “feel” of mathematical/statistical research in order to see if it is right for them. I found the workshop to also be a great way to network and meet people that share the same interests as you. Overall, great experience!”

Group 6 SAMSI undergraduate modeling workshop May 2013

Brandon Sherman, U. Pitt, Kehao Zhu, Purdue, and Vinicius Taguchi, NCSU

Vinicius Taguchi, North Carolina State University
“This workshop was a wonderful experience.  I gained a better appreciation for statistics and applied mathematics, made lasting friendships, and got to see a new side of NC State University.  When I first got here, I was a little concerned about being one of the few non-math/stats majors, as well as one of the very few underclassmen.  Nevertheless, this never became an issue and I felt like part of the group right from the get-go.  Thank you, SAMSI.”

Group 2 photo SAMSI undergraduate modeling workshop May 2013

Lee Richardson, U. Washington-Seattle, Charles Ho, Rice and Anna Peris, Marquette.

Lee Richardson, University of Washington at Seattle
From his Twitter feed – “Predicted a Poisson Distribution with a mean 3.96. AKA 56% chance of greater than 4 hurricanes!!!!!”

Here are some of the presentations that the students gave the last day of the workshop.

Whither Environmental Statistics: where we’ve been, where we are, and some places we need to go

Photo of Walt Piegorsch

Walt Piegorsch

Early in March (of 2013), I had the honor and the pleasure of attending the SAMSI-SAVI Workshop on Environmental Statistics — an area of interest I’ve had for many years.  We convened in SAMSI’s HQ in RTP, NC, just up the street from the EPA (environmental statistics has sooooo many acronyms, doesn’t it?).  It was good timing: the weather was starting to turn nice in North Carolina.  (Well, actually, it was a bit cool for me — I’m in Arizona — but a number of my co-attendees from the frozen north were thrilled at how *warm* it was!  Global climate change at work…)  The workshop only lasted a few days, but I was enlivened by the energy it possessed.  Besides hearing some cutting-edge material presented during the talks, all attendees had a chance to interact and cogitate on the endeavor that is environmental statistics, during coffee breaks, on-site lunches, and a valuable set of breakout sessions one afternoon.  Well-designed workshop!  Indeed, in what was essentially only a two-day period I was able to give a talk on my own area of interest (environmental risk assessment), discuss the issue with many interested co-attendees, and then develop ideas with four attending co-authors for three different follow-up papers.  (Well, hopefully:  we came up with some great outlines — now all we have to do is write the manuscripts!)

room of people at tables

During a talk at the SAMSI-SAVI workshop focusing on environmental statistics

One theme I took from the workshop was, broadly speaking, ‘Whither Environmental Statistics’?  This is just my own opinion, of course, but the sense I got was that (1) we’re further along than we’ve ever been, but (2) there’s lots farther to go.  (Hmmm, maybe that’s why SAMSI held the workshop…)

This theme emerged during a lunch break, when SAMSI director Richard Smith and I had a chance to reflect on a paper we wrote — back in, cough, cough, 1998 — which aimed to (start to) bring the broad diversity of problems in environmental statistics into a cohesive light.  In retrospect, we both agreed that it was a good beginning — environmental science and with it environmental statistics had opened up in the early 1990s and was starting to get some traction by then.  Despite the advances made since, however, there’s still so much more to do (and so little time, sigh…).  Stimulating, but unanswered statistical questions abound in:

  • Climate change (which these days seems to always lead the list)
  • spatio-temporal modeling (which seems to always follow second)
  •  environmental security
  •  third-world challenges, including agricultural advancement, large-scale ecological damage, pesticide exposure (and not just in the third world…)
  •  informatics/”big” data (There’s lots of it. With more on the way.)
  • educating the next generation of environmetricians (and, getting more folks interested in working on these problems)
  • environmental sensing/sensor networks
  • incorporating prior knowledge into these problems via Bayesian methods
  •  new, efficient computer algorithms (for addressing *all* of the above)

to name just a few…  (Add your own favorite here: ____________________________ )

A decidedly mixed list, which seems daunting at first blush.  But, the good news is that along with us ‘seasoned veterans,’ there were many younger minds among the attendees, and we all seemed up to the challenge.  As I said, the energy was infectious, and fun too.  So, let’s get started!  (Indeed, I should probably stop blogging and get to those papers.  My co-authors are waiting…)

group shot of the attendees

SAMSI-SAVI workshop on environmental statistics.

Dr Qiu Presents “Jump Regression Analysis and Imaging Processing”

The following blog entry is from Jiayang Sun, Professor of Statistics and Professor of Epidemiology and Biostatistics from Case Western University. Dr. Sun is leading the imaging working group  together with Dr. Dani Dushizima, as part of the Statistical and Computational Methodology for Massive Datasets Program.

As part of SAMSI’s imaging working group activity, on Oct 29, Professor Peihua Qiu from U. of Minnesota gave a special talk on “Jump Regression Analysis and Imaging Processing” as an imaging tutorial from statistician’s perspective, based on his book published by Wiley, in addition to his recent research on blind image deblurring (BID), 3d image denosing and registration.

Cover of Peihua Qiu's book

The talk sparked interesting discussions on challenges and needs from a high level to the specifics that may motivate further research and better formulation of the various research problems.

Andreas Artemiou from Michigan Technological University said,

“It was insightful. I did not know the jump regression analysis and its application to imaging. Could I have a copy of the slides?”

SAMSI Postdoctoral Fellow Yi Grace Wang (whose research is in imaging from the mathematical side) said,

“I liked the tutorial very much. It included the big picture of image processing from statistical perspective as well as details from the Jump Regression Analysis in particular. It provided inspiring insights and also enlightened interesting thoughts and debates.”

SAMSI Postdoctoral Fellow Dan Yang (who has identified imaging from a statistical perspective as an area of research she would like to pursue) said,

“For me who has little experience in imaging, I enjoyed the tutorial a lot. It is neither too general nor too technical, giving me a big picture as well as the key ideas. I especially appreciate Prof. Qiu’s presentation for his careful organization, approachable explanation and interesting illustration.”

SAMSI Postdoctoral Fellow Garvesh Rasketti (who was interested to find out more about imaging) noted,

“I enjoyed the talk. I was unfamiliar with jump regression prior to the talk. It seems very applicable to imaging and other areas and the talk has encouraged me to read up more on jump regression.”

The Undergraduate Workshop Focusing on SAMSI Computational Methodology for Massive Datasets

This blog entry was written by James Anderson, undergraduate student double majoring in statistics-mathematics and economics from the University of Connecticut.

The undergraduate workshop attendees

Attendees and some presenters from the SAMSI undergraduate workshop held October 26-27, 2012.

This undergraduate workshop was notably different from my previous experience, though in no way inferior.  In fact, I would argue the content of this workshop was better for my current position. Massive datasets are surprisingly common and the topics covered included astronomy, high dimension regression, climate change, and image rescaling. In these contexts, we mainly discussed how to manage large datasets without crashing an individual computer.

The other aspect of the workshop, which I really enjoyed, was discussion panels. The students got a chance to talk to people working in academia and industry, as well as graduate students and postdocs. The professionals talked about their respective occupations and how they got to where they are, which was very interesting. On the other hand, the younger group talked about their transitions out of their respective undergraduate programs. This was particularly useful as I will be going through this phase over the next few months. One thing I was once more impressed with was SAMSI’s concern for the attendees. The presenters were happy to go into great detail about their presentations and field any general discipline related questions they could with interested attendees (the presentations had to be kept pretty short). This really impressed me; it didn’t matter if it was in the context of a presentation or not, the mentality seemed to be that the workshop was happening all the time. There was a great opportunity during panels or breaks to ask questions and get information that was quite personalized and would have been hard to find in another way. The workshop gave me a lot of information and resources that will be valuable going forward.

Nuala’s Impressions from the Astrostatistics Workshop

The following post was written by Nuala McCullagh who is a graduate student in the Physics & Astronomy department at Johns Hopkins.

Nuala McCullagh sitting on a bench

Nuala McCullagh

I was thrilled to have the opportunity to visit SAMSI for the Massive Datasets program for three weeks in September. One of the most positive aspects of my visit was my exposure to several flavors of diversity, the most salient of which was diversity of expertise and discipline. As a graduate student in the Physics & Astronomy department at Johns Hopkins, I have been active in promoting diversity within my department. Physics and astronomy, along with most math and science fields, have traditionally lacked racial and gender diversity, and while the benefits of diversity are well established and generally accepted, it can still be difficult to convince scientists that it is an issue they should care about. The benefits of the diversity I observed at SAMSI were very clear, and my experience there really reinforced my belief that diversity can inspire creativity and productivity.

At the opening workshop, we heard talks from experts in statistics, computer science, applied math, neuroscience, environment & climate science, high energy physics, and astronomy. While the conference covered a wide range of disciplines, there was a common thread of having to deal with massive datasets. I was surprised to learn about the similarities between my work in cosmology and work in other fields such as climate studies and neuroscience. Hearing about the problems and solutions in those fields have helped me think about my own problems in a different way.

At the astrostatistics workshop, we heard about large galaxy surveys, computer simulations, multi-dimensional datasets, time-domain astronomy, and more. It was helpful to hear about the different statistical problems with massive datasets in the context of astronomy, and interesting to see the similarities and differences between them. For example, just within cosmology, the statistical problems that arise when working with large dark matter simulations are different from those that arise in detecting weak lensing in galaxy surveys. Meanwhile, people who study exoplanets work with large simulations with many parameters, much like the simulations in cosmology. Hearing about the various statistical problems astronomers have encountered allowed me to make connections between different areas in astronomy that I would not have noticed otherwise.

I appreciated the opportunity to learn about a wide variety of problems concerning massive data. It was interesting to note the statistical similarities in seemingly disparate scientific problems. It was also reaffirming to see the positive impact that diversity can have in inspiring creativity and productivity in science.

Ilse Ipsen Speaks at the Science Communicators of North Carolina and the RTP Chapter of Sigma Xi Pizza Lunch

Ilse Ipsen speaking to the SCONC and RTP chapter of Sigma Xi

Ilse Ipsen spoke to the SCONC and RTP chapter of Sigma Xi on October 9.

Associate Director of SAMSI and professor of mathematics at North Carolina State University, Ilse Ipsen, recently spoke at the Sigma Xi pizza lunch. The lunch is a monthly gathering co-sponsored by the Science Communicators of North Carolina (SCONC) and the RTP chapter of Sigma Xi.

Ilse’s talk, “Rolling the Dice on Big Data” focused on how big data is permeating all aspects of our daily lives. From going to the grocery store, where supermarkets are gathering data on our personal buying habits, to analyzing images from space, to the Internet where Google receives 2 million inquiries a minute and 347 blog posts are happening every minute of the day. Facebook processes 500 terrabytes of information each day and 30 billion pieces of information are shared on Facebook each month.

To give her audience an understanding of how applied mathematicians approach this enormous problem of sifting through data, she used an example of trying to match an e-mail that comes from an unknown source to a series of e-mails that were received from known authors. The e-mail from the unknown source has three key words in it. In her example, she looks at the three e-mails and counts the number of times the key words were used. Then, the length of the sentence is measured to see how many words were used in each e-mail and in the query. Each word in each e-mail is counted and multiplied by the query to get a number. The words that are found in each e-mail and the query are squared and then divided by the sum. This method will help determine which of the e-mails is the author of the query.

If one were to look at every e-mail written each day, there would be about 294 billion e-mails to sort through and there is about 250,000 words in the English language, so it would be an enormous task to accomplish, but many mathematicians and statisticians use the Monte Carlo method to sample and narrow down the search.

She explained that using a randomized algorithmic approach was fast, easy to implement and simple to use and is as good as, or perhaps even better, than using a deterministic approach.

Room full of people listening to Ilse Ipsen's talk

The room was at full capacity for Ilse Ipsen’s talk “Rolling the Dice on Big Data.”

Ilse spoke to a packed room, including many science writers from the Triangle region, members of Sigma Xi and a high school physics class from Kestrel Heights, a local charter school.

Astronomy and Big Data

The following is from G. Jogesh Babu, Professor of Statistics Director, Center for Astrostatistics at Penn State University and one of the organizers of the astrostatistics workshop.

Jogesh Babu working at the astrostatistics workshopo

Jogesh Babu at the recent astrostatistics workshop held at SAMSI.

Astronomers are among the first researchers to encounter Big Data. Until a few decades ago astronomers would typically compete for observation time on telescopes, spending cold nights on distant mountain top observatories to collect data on few stars and galaxies. This has changed substantially. Today, they pour over massive data through high speed internet connection to their office computers, thinking of automated procedures to identify objects. They have become computer scientists, developing algorithms to read through massive data and make sense of it by inventing algorithms specific to their task.  In addition, some astrophysicists make massive simulations  under assumptions dictated by physical models; for example, the  Millennium Simulation calculates the formation of galaxies in an expanding Universe dominated by Dark Matter and Dark Energy.
The simulations must then be compared to the massive datasets to see if the assumed model explain the data.

The Sloan Digital Sky Survey (SDSS), designed in the 1990s and still active today, really brought astronomy into the massive data era. SDSS data rate for imaging was 17GBytes/hour, and much less for spectroscopy. Thus, SDSS produces about 200 GB of data every night, adding to a database that stands at around 50 TB today. The Sloan project has produced several thousand research papers, revolutionizing many fields of astronomy. Massive data in astronomy is thus producing a paradigm shift the way astronomy research is done. It is bringing  information scientists, statisticians and astronomers together to  collaborate on scientific investigations. For more on massive data in astronomy, see the recent article `Big data in astronomy’ by Eric D.  Feigelson and the author in the August 2012 issues of `Significance‘.

While astronomical data traditionally consists of images and spectra,  the time domain is adding a new dimension to the astronomical imaging (http://www.cambridge.org/us/knowledge/isbn/item6852606/?site_locale=en_US).

Repeated images of the sky reveal a wealth of information about our ever changing universe: dozens of species of variable stars; thousands of moving asteroids in the Solar System;  tens of thousands of quasars, supermassive black holes in distant galaxies; and hundreds of supernova explosions from dying stars. Type Ia supernovae are particularly important, as their numbers shed light on Dark Matter and Dark Energy.

An alphabet soup of time domain surveys in visible light are underway: SDSS III, PTF, CRTS, SNF, Pan-STARRS, VISTA, and more. The largest of the planned projects based on multi-epoch imaging is the Large Synoptic Survey Telescope (LSST), recently approved by the National Science Foundation as the largest U.S. ground-based project in astronomy.  It is expected to start around 2020. The LSST images half of the sky every 3 nights, producing a video of the sky with hundreds of millions of variable objects.  The data flow from this project will be around several Terabytes each night. The challenges from this project is putting hundreds of astronomers, engineers, computer scientists to think challenging problems, both in the management of massive data streams and in the data mining to emerge with strong scientific findings.

Recently, I co-organized a workshop on `Astrostatistics’  with Prajval  Shastri of Indian Institute of Astrophysics, as part of 2012-13 SAMSI  Program on Statistical and Computational Methodology for Massive Datasets.

Prajval Shastri and Ann Lee

Ann Lee (L) and son, and Prajval Shastri (R).

The three day workshop was held at SAMSI during September 19-21. Though it has a mix of talks by statisticians like Jim Berger and David Donoho, the majority of the talks were by astronomers who have ongoing collaborations with statisticians. Each talk ended with a lively, stimulating discussions. The audience consisted of a good mix of statisticians and astronomers. Presentations concentrated on Bayesian methods, faint source detection, learning from massive multidimensional data, exoplanets, time-domain astronomy, sparsity, reproducible research etc. The workshop concluded with a good discussion on future directions.

It is nice to get back to SAMSI and interact with friends and collaborators;  I had organized a very stimulating semester-long  Astrostatistics program at SAMSI in Spring 2006. Astronomers were already familiar with the concept of large scale electronic integration
of astronomy data, tools, and services on a global scale in a manner  that provides easy access by individuals around the world, via the  Virtual Observatory (VO). They are thus enabling science on massive data. Even in 2006, astronomers were grappling with massive data much before the terms `Big data’ or `megadatasets’ became vogue.

Andrea G. Campos Bianchi’s Impressions of the Massive Datasets Opening Workshop

The following is remarks from Andrea G. Campos Bianchi, Visiting Researcher, Lawrence Berkeley National Laboratory, who attended the Massive Datasets Opening Workshop in September.

Andrea Bianchi, visiting professor, Lawrence Berkeley National Laboratory.

What a delight to attend the 2012 SAMSI Workshop on Massive Datasets. The tutorials and talks were impressive, and they exposed me to different approaches of Massive Dataset, since the lectures covered a broad spectrum of cutting-edge topics, ranging from randomized methods in statistics to complex high energy physics problems.

As a visiting researcher at Lawrence Berkeley National Laboratory, and, originally, a professor at Federal University of Ouro Preto-Brazil, I want to express my sincere appreciation for SAMSI for sponsoring my trip, which make my attendance possible.
Certainly, I will benefit from this workshop and ideas for years to come, specially regarding large data analysis, theory and applications. Looking forward to participate of the Imaging Working Group, and establish collaboration with researchers from SAMSI.
 Big Thanks!

Thoughts on Bayesian Statistical Inference for Regional Climate Projections in North America

The following blog entry is from Noel Cressie Professor of Statistics University of Wollongong and Director, Program in Spatial Statistics and Environmental Statistics The Ohio State University.

Noel Cressie

Noel Cressie, Professor of Statistics, University of Wollongong and Director, Program in Spatial Statistics and Environmental Statistics, The Ohio State University

Two weeks ago I attended and spoke at the opening workshop for Statistical and Computational Methodology for Massive Datasets. It’s always good to get back to SAMSI and see old friends. (I was a visitor there in spring 2010 for a program on Space-Time Analysis for Environmental Mapping, Epidemiology, and Climate Change.) This time I spoke in a session on Environment and Climate, and I had the opportunity to present recent work on statistical inference for regional climate projections in North America.

A few features of the problem: The data are outputs from several regional climate models, and hence they are deterministic. To carry out inference on important questions, like, “Where and in which season will the temperature increase be most severe?”, we (Emily Kang, U. Cincinnati, and I) used a Bayesian hierarchical modeling approach. The data are spatial, at a 50 km resolution over North America. The dataset is large, about 100,000 in size, even after summarization! The problem is important: it involves projecting temperature change in 50 km x 50 km regions of North America by 2070, for the four seasons and over the whole year.

Our results are quite sobering…the website: http://www.stat.osu.edu/~sses/collab_warming.html can be consulted for more details.

image of North American with a wide swath of red to indicateprobability of  temperature increase

In this image, the color red indicates regions of North America for which our Bayesian statistical analysis gives a 97.5 percent posterior probability that average temperatures will rise by at least 2 degrees Celsius (3.6 degrees Fahrenheit) by 2070. Image by Noel Cressie and Emily Kang, courtesy of Ohio State University.

In a follow up to the Bayesian inference we did, I was asked the following questions by a reporter for a popular science magazine, Science et Vie:

“I would like to know how central was the role of Bayesian statistics in your work. That is : What is the improvement brought by the use of these statistics when compared to “classical” statistics?

More generally, I’d like to know if Bayesian statistics have emerged only recently in climatology, and if yes, why now ? How would you qualify the (current or potential) contribution of Bayesian statistics to climatology?”

I thought that readers of the SAMSI blog might be interested in my responses (lightly edited for the context of this blog):

In our statistical analysis, we are considering output from different climate models at a very regional scale and seasonally. The outputs are deterministic and complete over the North American region, at a 50 km x 50 km scale. No continental-scale or global-scale averaging is being done. At this level, communities and even individuals can see the impact of climate change on their lives. Moreover, because the output can be presented seasonally, the impact of climate change on water storage, agriculture, pest control, and so forth can be considered.

There is model-to-model variability and spatial variability in the climate-model output, but the output is deterministic. That is, if the models were run again with the same boundary conditions, the same values would be obtained. That is where a Bayesian analysis is essential, because without it we could just summarize the data but not do any inference on it. In a paper (Kang and Cressie, 2012) published this year in the International Journal of Applied Earth Observation and Geoinformation, we give examples of inference based on samples from the posterior distribution.

There is a generic approach to climate modeling that has emerged relatively recently (last 15 years) based on hierarchical statistical modeling. This approach uses Bayes Theorem and modern computing technology (e.g., Markov chain Monte Carlo algorithms) to allow us to answer climate-related questions (e.g., “Will temperatures increase by 2070 beyond a sustainabilty threshold of 2 deg. C?”), in the presence of data uncertainty, and scientific-model uncertainty. There is a version of this called Bayesian hierarchical modeling (BHM) that we used.

I have a particular interest in remote sensing data, which can be massive in size. The last 5 years of my research program has been directed towards dimension reduction in hierarchical statistical models, with a particular emphasis on climate questions. The dataset analyzed in the paper above is large, about 100,000, and we had to use dimension-reduction techniques to solve the problem. Bayesian statistics is computationally intensive, and usually problems of only moderate data sizes can be solved. Our work and that of others, in dimension reduction, has been a breakthrough that has allowed BHMs to be used in very complex models with massive datasets.

The Bayesian “movement” is growing in science in general. Good scientists are honest about what they know, and they are aware of the uncertainties in their work. There has been a general trend in science towards “Uncertainty Quantification” or “UQ,” and the Bayesian approach allows uncertainties to be expressed through conditional probabilities. Bayes Theorem is a coherent way to combine all these sources of uncertainty.

Many climatologists will have difficulty with fitting BHMs because there’s a big statistical investment involved. The savvy ones are partnering with statisticians in research teams to answer parts of grand-challenge questions in the presence of uncertainties. This movement is small but growing, but I expect it will be accepted in 5-10 (hopefully 5) years by the climate community as being essential.