*The following blog entry is from Noel Cressie Professor of Statistics University of Wollongong and Director, Program in Spatial Statistics and Environmental Statistics The Ohio State University.*

Noel Cressie, Professor of Statistics, University of Wollongong and Director, Program in Spatial Statistics and Environmental Statistics, The Ohio State University

Two weeks ago I attended and spoke at the opening workshop for Statistical and Computational Methodology for Massive Datasets. It’s always good to get back to SAMSI and see old friends. (I was a visitor there in spring 2010 for a program on Space-Time Analysis for Environmental Mapping, Epidemiology, and Climate Change.) This time I spoke in a session on Environment and Climate, and I had the opportunity to present recent work on statistical inference for regional climate projections in North America.

A few features of the problem: The data are outputs from several regional climate models, and hence they are deterministic. To carry out inference on important questions, like, “Where and in which season will the temperature increase be most severe?”, we (Emily Kang, U. Cincinnati, and I) used a Bayesian hierarchical modeling approach. The data are spatial, at a 50 km resolution over North America. The dataset is large, about 100,000 in size, even after summarization! The problem is important: it involves projecting temperature change in 50 km x 50 km regions of North America by 2070, for the four seasons and over the whole year.

Our results are quite sobering…the website: http://www.stat.osu.edu/~sses/collab_warming.html can be consulted for more details.

In this image, the color red indicates regions of North America for which our Bayesian statistical analysis gives a 97.5 percent posterior probability that average temperatures will rise by at least 2 degrees Celsius (3.6 degrees Fahrenheit) by 2070. Image by Noel Cressie and Emily Kang, courtesy of Ohio State University.

In a follow up to the Bayesian inference we did, I was asked the following questions by a reporter for a popular science magazine, *Science et Vie*:

“I would like to know how central was the role of Bayesian statistics in your work. That is : What is the improvement brought by the use of these statistics when compared to “classical” statistics?

More generally, I’d like to know if Bayesian statistics have emerged only recently in climatology, and if yes, why now ? How would you qualify the (current or potential) contribution of Bayesian statistics to climatology?”

I thought that readers of the SAMSI blog might be interested in my responses (lightly edited for the context of this blog):

In our statistical analysis, we are considering output from different climate models at a very regional scale and seasonally. The outputs are deterministic and complete over the North American region, at a 50 km x 50 km scale. No continental-scale or global-scale averaging is being done. At this level, communities and even individuals can see the impact of climate change on their lives. Moreover, because the output can be presented seasonally, the impact of climate change on water storage, agriculture, pest control, and so forth can be considered.

There is model-to-model variability and spatial variability in the climate-model output, but the output is deterministic. That is, if the models were run again with the same boundary conditions, the same values would be obtained. That is where a Bayesian analysis is essential, because without it we could just summarize the data but not do any inference on it. In a paper (Kang and Cressie, 2012) published this year in the *International Journal of Applied Earth Observation and Geoinformation*, we give examples of inference based on samples from the posterior distribution.

There is a generic approach to climate modeling that has emerged relatively recently (last 15 years) based on hierarchical statistical modeling. This approach uses Bayes Theorem and modern computing technology (e.g., Markov chain Monte Carlo algorithms) to allow us to answer climate-related questions (e.g., “Will temperatures increase by 2070 beyond a sustainabilty threshold of 2 deg. C?”), in the presence of data uncertainty, and scientific-model uncertainty. There is a version of this called Bayesian hierarchical modeling (BHM) that we used.

I have a particular interest in remote sensing data, which can be massive in size. The last 5 years of my research program has been directed towards dimension reduction in hierarchical statistical models, with a particular emphasis on climate questions. The dataset analyzed in the paper above is large, about 100,000, and we had to use dimension-reduction techniques to solve the problem. Bayesian statistics is computationally intensive, and usually problems of only moderate data sizes can be solved. Our work and that of others, in dimension reduction, has been a breakthrough that has allowed BHMs to be used in very complex models with massive datasets.

The Bayesian “movement” is growing in science in general. Good scientists are honest about what they know, and they are aware of the uncertainties in their work. There has been a general trend in science towards “Uncertainty Quantification” or “UQ,” and the Bayesian approach allows uncertainties to be expressed through conditional probabilities. Bayes Theorem is a coherent way to combine all these sources of uncertainty.

Many climatologists will have difficulty with fitting BHMs because there’s a big statistical investment involved. The savvy ones are partnering with statisticians in research teams to answer parts of grand-challenge questions in the presence of uncertainties. This movement is small but growing, but I expect it will be accepted in 5-10 (hopefully 5) years by the climate community as being essential.