It is Hard to Define What is Beyond Bioinformatics

The following blog entry was written by ClarLynda Williams-DeVane, Assistant Professor Bioinformatics/Biostatistics, Department of Biology and Director of Bioinformatics Genomics and Computational Chemistry Core (BGCCC) at the Biotechnology Biomedical Research Institute (BBRI), North Carolina Central University; Building Interdisciplinary Careers in Women’s Health (BIRCWH)  Duke University.


ClarLynda Williams-Devane

Dr. ClarLynda Williams-DeVane

Two weeks ago I participated in SAMSI’s Opening Workshop for the 2014-15 Program on Beyond Bioinformatics: Statistical and Mathematical Challenges. I was particularly interested in participating in this program because of the focus on data integration and large-scale data methodology. The focus of my research is in large-scale data integration for complex women’s diseases. As an assistant professor at a smaller university, it was an amazing opportunity to spend a week thinking about and discussing current and developing methodology in my research area. The discussion of exploratory data analysis (eda) methods in comparison or compliment to Bayesian model based methods was insightful and of great benefit as I have these discussions often with my K-award mentorship team. The thought leaders in these areas all made very well defined and supported arguments about which methodology was best given specific research questions.


Terry Speed talk at SAMSI

Dr. Terry Speed, UC-Berkeley, and Walter and Eliza Hall Institute of Medical Research

2014-09-09 12.19.12

Throughout the meeting, it was difficult for most speakers and attendees to define what it means to move beyond Bioinformatics. Many of the speakers and discussions following the speakers exemplified moving beyond bioinformatics while discussing how to move from exploratory data analysis methods to more model based analysis methods, which defines for me the need to move beyond bioinformatics. I appreciate the focus on mathematical and statistical approaches to problems. As a junior faculty member, the discussion about publishing in this area and developing clinically relevant methodologies was very helpful. At the end of the workshop as we broke into working groups, we continued our discussions of data integration. The working group process was a bit overwhelming attempting to find the appropriate fit. Through the various discussions on data integration, it was possible to find a working group that complimented my current research and to which I could be a major contributor. I am eagerly anticipating the next face-to-face meeting of my working group and seeing the outcome of the other working groups.

Former Postdoc Kenneth Lopiano Speaks at RTP180

Dr. Kenneth Lopiano, co-founder of Roundtable Analytics and former postdoctoral fellow at SAMSI, spoke to a sold out crowd last night at the RTP180 event. RTP180 is a monthly after-hours get together where speakers spend about 5 minutes talking about a topic they are passionate about, and that highlights some of the research happening in the Triangle region. It’s kind of like a mini TED talk meets Pecha Kucha.

Kenneth Lopiano on stage

Kenneth Lopiano talking at RTP180.

Lopiano spoke about the simulation model he and others developed to help ER departments become more efficient. You can read more about it here.

Some of the comments on Twitter included: @nxtstop1 “”Round table analytics” ~ does work in ERs using simulation models to determine best practice for that particular dept~

@Jnewbay “Emergency departments moving more efficiently? I’m in! Shorter wait times in the ER?

@bentanthony01 “ pitching at – Are you tired of waiting at Emergency Department? ED simulation models

@HealthView “We need actionable insights to healthcare data says Roundtable Analytics <Hear! Hear!”

You can watch the full video, including Kenneth Lopiano’s presentation here.

Predicting number of landfalls of hurricanes — Undergraduate Modeling Workshop produces forecasts for 2013

group shot of undergraduates attending May 2013 workshop

Undergraduate workshop from May 2013.

Thirty-four undergraduate students from around the U.S. came to SAMSI and NC State University the week of May 13-17. During the week, the students interacted with an atmospheric scientist who works on hurricane research, and applied mathematicians and statisticians who work on climate research.  Students used the same database as used at NCSU to forecast various aspects of future hurricane seasons, and built Poisson regression models within R to produce their own forecasts of the 2013 hurricane season in the US. Below are some comments from participants:

three students with signs

Corey Raphael, U. Florida, Jonathan Skantz, U. Florida and Gwen Tian, U. British Columbia.

Corey Raphael, University of Florida
“I had a great time during my week at SAMSI! I learned all about climate science and hurricane predictions, and met a lot of great people. Thanks for all the advice and free food! I enjoyed getting to know the Raleigh area, and I learned a lot about R that I didn’t know previously. I hope the program enjoyed having me as much as I enjoyed being here!”

Group 3 shot

Evan Bittner, Penn State, Kasey Palmquist, UNC Wilmington, and Daria Drozdova, Pomona College.

Kasey Palmquist, University of North Carolina at Wilmington
“The workshop was an excellent experience; I truly feel that I am not leaving empty-handed. I not only learned new methods of statistical analysis, but how to collaborate with a group of people on a research topic. I found this workshop beneficial because it allows undergraduates to get a “feel” of mathematical/statistical research in order to see if it is right for them. I found the workshop to also be a great way to network and meet people that share the same interests as you. Overall, great experience!”

Group 6 SAMSI undergraduate modeling workshop May 2013

Brandon Sherman, U. Pitt, Kehao Zhu, Purdue, and Vinicius Taguchi, NCSU

Vinicius Taguchi, North Carolina State University
“This workshop was a wonderful experience.  I gained a better appreciation for statistics and applied mathematics, made lasting friendships, and got to see a new side of NC State University.  When I first got here, I was a little concerned about being one of the few non-math/stats majors, as well as one of the very few underclassmen.  Nevertheless, this never became an issue and I felt like part of the group right from the get-go.  Thank you, SAMSI.”

Group 2 photo SAMSI undergraduate modeling workshop May 2013

Lee Richardson, U. Washington-Seattle, Charles Ho, Rice and Anna Peris, Marquette.

Lee Richardson, University of Washington at Seattle
From his Twitter feed – “Predicted a Poisson Distribution with a mean 3.96. AKA 56% chance of greater than 4 hurricanes!!!!!”

Here are some of the presentations that the students gave the last day of the workshop.

Whither Environmental Statistics: where we’ve been, where we are, and some places we need to go

Photo of Walt Piegorsch

Walt Piegorsch

Early in March (of 2013), I had the honor and the pleasure of attending the SAMSI-SAVI Workshop on Environmental Statistics — an area of interest I’ve had for many years.  We convened in SAMSI’s HQ in RTP, NC, just up the street from the EPA (environmental statistics has sooooo many acronyms, doesn’t it?).  It was good timing: the weather was starting to turn nice in North Carolina.  (Well, actually, it was a bit cool for me — I’m in Arizona — but a number of my co-attendees from the frozen north were thrilled at how *warm* it was!  Global climate change at work…)  The workshop only lasted a few days, but I was enlivened by the energy it possessed.  Besides hearing some cutting-edge material presented during the talks, all attendees had a chance to interact and cogitate on the endeavor that is environmental statistics, during coffee breaks, on-site lunches, and a valuable set of breakout sessions one afternoon.  Well-designed workshop!  Indeed, in what was essentially only a two-day period I was able to give a talk on my own area of interest (environmental risk assessment), discuss the issue with many interested co-attendees, and then develop ideas with four attending co-authors for three different follow-up papers.  (Well, hopefully:  we came up with some great outlines — now all we have to do is write the manuscripts!)

room of people at tables

During a talk at the SAMSI-SAVI workshop focusing on environmental statistics

One theme I took from the workshop was, broadly speaking, ‘Whither Environmental Statistics’?  This is just my own opinion, of course, but the sense I got was that (1) we’re further along than we’ve ever been, but (2) there’s lots farther to go.  (Hmmm, maybe that’s why SAMSI held the workshop…)

This theme emerged during a lunch break, when SAMSI director Richard Smith and I had a chance to reflect on a paper we wrote — back in, cough, cough, 1998 — which aimed to (start to) bring the broad diversity of problems in environmental statistics into a cohesive light.  In retrospect, we both agreed that it was a good beginning — environmental science and with it environmental statistics had opened up in the early 1990s and was starting to get some traction by then.  Despite the advances made since, however, there’s still so much more to do (and so little time, sigh…).  Stimulating, but unanswered statistical questions abound in:

  • Climate change (which these days seems to always lead the list)
  • spatio-temporal modeling (which seems to always follow second)
  •  environmental security
  •  third-world challenges, including agricultural advancement, large-scale ecological damage, pesticide exposure (and not just in the third world…)
  •  informatics/”big” data (There’s lots of it. With more on the way.)
  • educating the next generation of environmetricians (and, getting more folks interested in working on these problems)
  • environmental sensing/sensor networks
  • incorporating prior knowledge into these problems via Bayesian methods
  •  new, efficient computer algorithms (for addressing *all* of the above)

to name just a few…  (Add your own favorite here: ____________________________ )

A decidedly mixed list, which seems daunting at first blush.  But, the good news is that along with us ‘seasoned veterans,’ there were many younger minds among the attendees, and we all seemed up to the challenge.  As I said, the energy was infectious, and fun too.  So, let’s get started!  (Indeed, I should probably stop blogging and get to those papers.  My co-authors are waiting…)

group shot of the attendees

SAMSI-SAVI workshop on environmental statistics.

Impressions from the Undergraduate Workshop on Data-Driven Decisions in Healthcare

big group of students outside SAMSI

February 2013 Undergraduate Workshop participants.

SAMSI recently held the Undergraduate Workshop on Data-Driven Decisions in Healthcare for about 30 students. Visiting professors, postdoctoral fellows and graduate fellows who are participating in this SAMSI program led the sessions providing cutting-edge research into the lectures. Students had a chance to work with data from the SEElab at Technion in Israel, got an overview of personalized medicine and a tutorial in R and a demonstration of the ARENA software.  Here are a few of the students’ impressions from the workshop.

Eric Laber instructing students

Eric Laber, NCSU, giving lecture at the workshop.

Eric Kernfeld, Tufts University Class of 2014, Applied Mathematics

“I had a great time at the workshop on Data Driven Decisions in Health Care this past weekend. It was a nice opportunity to meet statisticians, something I don’t get the chance to do back at Tufts. I also met a lot of undergraduates majoring in statistics and mathematics. The food was good, the staff were welcoming, the accommodations were convenient, and the talks were well-pitched. I recommend SAMSI workshops to anyone who’s interested in the topics, especially to people considering graduate education down the road.”

Danielle Llanos, Georgetown University

“I thought the SAMSI workshop was wonderful. It was a great opportunity to learn from talented individuals, and a chance to expand my network. The lecture topics were incredibly interesting and were very relevant to my career goals. Probably the best part of the workshop was the graduate student panel. The ability to ask those burning questions and learn from the experiences of others was great. I would recommend any SAMSI workshop to students looking to learn more about opportunities in the sciences, and expanding their educational experiences.”

three students at table

Students networking at lunch.

Brittany Boribong, sophomore, biomathematics major at University of Scranton

“As a student with no background in statistics and programming, I found the workshop a bit overwhelming but no less interesting. Coming into this with no experience just allowed me to take that much more out of the workshop.  I was able to explore new fields of math that I never considered before and learn about topics that I had no idea even existed. As a Biomathematics major, I found the topic of using data to derive decisions in healthcare intriguing since it is an application of my major that I was not aware of. Another wonderful aspect of the workshop was the chance to speak to people in different fields. During lunch, I had the opportunity to speak to a post-doc fellow and during dinner, I spoke to one of the professors that gave a lecture earlier in the day; these opportunities don’t come along every day. It was enjoyable hearing their stories and being able to have a casual conversation with them. The panel made up of current graduate students and post-docs was also helpful in that they were able to share their experiences about graduate school and offer along any advice. I found it particularly helpful since one of the speakers was currently in a biomathematics program and I was able to ask questions I had about my major.

However, the best part of the workshop, in my opinion, was being to meet other students. Coming from a university with a smaller math department, I really enjoyed meeting students from around the country with interests similar to my own. It was great being able to make connections with students in different fields and from universities from all over. Overall, I had a wonderful time meeting new people and exploring different fields of mathematics during the workshop and found this to be a great experience.”

Dr Qiu Presents “Jump Regression Analysis and Imaging Processing”

The following blog entry is from Jiayang Sun, Professor of Statistics and Professor of Epidemiology and Biostatistics from Case Western University. Dr. Sun is leading the imaging working group  together with Dr. Dani Dushizima, as part of the Statistical and Computational Methodology for Massive Datasets Program.

As part of SAMSI’s imaging working group activity, on Oct 29, Professor Peihua Qiu from U. of Minnesota gave a special talk on “Jump Regression Analysis and Imaging Processing” as an imaging tutorial from statistician’s perspective, based on his book published by Wiley, in addition to his recent research on blind image deblurring (BID), 3d image denosing and registration.

Cover of Peihua Qiu's book

The talk sparked interesting discussions on challenges and needs from a high level to the specifics that may motivate further research and better formulation of the various research problems.

Andreas Artemiou from Michigan Technological University said,

“It was insightful. I did not know the jump regression analysis and its application to imaging. Could I have a copy of the slides?”

SAMSI Postdoctoral Fellow Yi Grace Wang (whose research is in imaging from the mathematical side) said,

“I liked the tutorial very much. It included the big picture of image processing from statistical perspective as well as details from the Jump Regression Analysis in particular. It provided inspiring insights and also enlightened interesting thoughts and debates.”

SAMSI Postdoctoral Fellow Dan Yang (who has identified imaging from a statistical perspective as an area of research she would like to pursue) said,

“For me who has little experience in imaging, I enjoyed the tutorial a lot. It is neither too general nor too technical, giving me a big picture as well as the key ideas. I especially appreciate Prof. Qiu’s presentation for his careful organization, approachable explanation and interesting illustration.”

SAMSI Postdoctoral Fellow Garvesh Rasketti (who was interested to find out more about imaging) noted,

“I enjoyed the talk. I was unfamiliar with jump regression prior to the talk. It seems very applicable to imaging and other areas and the talk has encouraged me to read up more on jump regression.”

The Undergraduate Workshop Focusing on SAMSI Computational Methodology for Massive Datasets

This blog entry was written by James Anderson, undergraduate student double majoring in statistics-mathematics and economics from the University of Connecticut.

The undergraduate workshop attendees

Attendees and some presenters from the SAMSI undergraduate workshop held October 26-27, 2012.

This undergraduate workshop was notably different from my previous experience, though in no way inferior.  In fact, I would argue the content of this workshop was better for my current position. Massive datasets are surprisingly common and the topics covered included astronomy, high dimension regression, climate change, and image rescaling. In these contexts, we mainly discussed how to manage large datasets without crashing an individual computer.

The other aspect of the workshop, which I really enjoyed, was discussion panels. The students got a chance to talk to people working in academia and industry, as well as graduate students and postdocs. The professionals talked about their respective occupations and how they got to where they are, which was very interesting. On the other hand, the younger group talked about their transitions out of their respective undergraduate programs. This was particularly useful as I will be going through this phase over the next few months. One thing I was once more impressed with was SAMSI’s concern for the attendees. The presenters were happy to go into great detail about their presentations and field any general discipline related questions they could with interested attendees (the presentations had to be kept pretty short). This really impressed me; it didn’t matter if it was in the context of a presentation or not, the mentality seemed to be that the workshop was happening all the time. There was a great opportunity during panels or breaks to ask questions and get information that was quite personalized and would have been hard to find in another way. The workshop gave me a lot of information and resources that will be valuable going forward.

Nuala’s Impressions from the Astrostatistics Workshop

The following post was written by Nuala McCullagh who is a graduate student in the Physics & Astronomy department at Johns Hopkins.

Nuala McCullagh sitting on a bench

Nuala McCullagh

I was thrilled to have the opportunity to visit SAMSI for the Massive Datasets program for three weeks in September. One of the most positive aspects of my visit was my exposure to several flavors of diversity, the most salient of which was diversity of expertise and discipline. As a graduate student in the Physics & Astronomy department at Johns Hopkins, I have been active in promoting diversity within my department. Physics and astronomy, along with most math and science fields, have traditionally lacked racial and gender diversity, and while the benefits of diversity are well established and generally accepted, it can still be difficult to convince scientists that it is an issue they should care about. The benefits of the diversity I observed at SAMSI were very clear, and my experience there really reinforced my belief that diversity can inspire creativity and productivity.

At the opening workshop, we heard talks from experts in statistics, computer science, applied math, neuroscience, environment & climate science, high energy physics, and astronomy. While the conference covered a wide range of disciplines, there was a common thread of having to deal with massive datasets. I was surprised to learn about the similarities between my work in cosmology and work in other fields such as climate studies and neuroscience. Hearing about the problems and solutions in those fields have helped me think about my own problems in a different way.

At the astrostatistics workshop, we heard about large galaxy surveys, computer simulations, multi-dimensional datasets, time-domain astronomy, and more. It was helpful to hear about the different statistical problems with massive datasets in the context of astronomy, and interesting to see the similarities and differences between them. For example, just within cosmology, the statistical problems that arise when working with large dark matter simulations are different from those that arise in detecting weak lensing in galaxy surveys. Meanwhile, people who study exoplanets work with large simulations with many parameters, much like the simulations in cosmology. Hearing about the various statistical problems astronomers have encountered allowed me to make connections between different areas in astronomy that I would not have noticed otherwise.

I appreciated the opportunity to learn about a wide variety of problems concerning massive data. It was interesting to note the statistical similarities in seemingly disparate scientific problems. It was also reaffirming to see the positive impact that diversity can have in inspiring creativity and productivity in science.

Ilse Ipsen Speaks at the Science Communicators of North Carolina and the RTP Chapter of Sigma Xi Pizza Lunch

Ilse Ipsen speaking to the SCONC and RTP chapter of Sigma Xi

Ilse Ipsen spoke to the SCONC and RTP chapter of Sigma Xi on October 9.

Associate Director of SAMSI and professor of mathematics at North Carolina State University, Ilse Ipsen, recently spoke at the Sigma Xi pizza lunch. The lunch is a monthly gathering co-sponsored by the Science Communicators of North Carolina (SCONC) and the RTP chapter of Sigma Xi.

Ilse’s talk, “Rolling the Dice on Big Data” focused on how big data is permeating all aspects of our daily lives. From going to the grocery store, where supermarkets are gathering data on our personal buying habits, to analyzing images from space, to the Internet where Google receives 2 million inquiries a minute and 347 blog posts are happening every minute of the day. Facebook processes 500 terrabytes of information each day and 30 billion pieces of information are shared on Facebook each month.

To give her audience an understanding of how applied mathematicians approach this enormous problem of sifting through data, she used an example of trying to match an e-mail that comes from an unknown source to a series of e-mails that were received from known authors. The e-mail from the unknown source has three key words in it. In her example, she looks at the three e-mails and counts the number of times the key words were used. Then, the length of the sentence is measured to see how many words were used in each e-mail and in the query. Each word in each e-mail is counted and multiplied by the query to get a number. The words that are found in each e-mail and the query are squared and then divided by the sum. This method will help determine which of the e-mails is the author of the query.

If one were to look at every e-mail written each day, there would be about 294 billion e-mails to sort through and there is about 250,000 words in the English language, so it would be an enormous task to accomplish, but many mathematicians and statisticians use the Monte Carlo method to sample and narrow down the search.

She explained that using a randomized algorithmic approach was fast, easy to implement and simple to use and is as good as, or perhaps even better, than using a deterministic approach.

Room full of people listening to Ilse Ipsen's talk

The room was at full capacity for Ilse Ipsen’s talk “Rolling the Dice on Big Data.”

Ilse spoke to a packed room, including many science writers from the Triangle region, members of Sigma Xi and a high school physics class from Kestrel Heights, a local charter school.

Astronomy and Big Data

The following is from G. Jogesh Babu, Professor of Statistics Director, Center for Astrostatistics at Penn State University and one of the organizers of the astrostatistics workshop.

Jogesh Babu working at the astrostatistics workshopo

Jogesh Babu at the recent astrostatistics workshop held at SAMSI.

Astronomers are among the first researchers to encounter Big Data. Until a few decades ago astronomers would typically compete for observation time on telescopes, spending cold nights on distant mountain top observatories to collect data on few stars and galaxies. This has changed substantially. Today, they pour over massive data through high speed internet connection to their office computers, thinking of automated procedures to identify objects. They have become computer scientists, developing algorithms to read through massive data and make sense of it by inventing algorithms specific to their task.  In addition, some astrophysicists make massive simulations  under assumptions dictated by physical models; for example, the  Millennium Simulation calculates the formation of galaxies in an expanding Universe dominated by Dark Matter and Dark Energy.
The simulations must then be compared to the massive datasets to see if the assumed model explain the data.

The Sloan Digital Sky Survey (SDSS), designed in the 1990s and still active today, really brought astronomy into the massive data era. SDSS data rate for imaging was 17GBytes/hour, and much less for spectroscopy. Thus, SDSS produces about 200 GB of data every night, adding to a database that stands at around 50 TB today. The Sloan project has produced several thousand research papers, revolutionizing many fields of astronomy. Massive data in astronomy is thus producing a paradigm shift the way astronomy research is done. It is bringing  information scientists, statisticians and astronomers together to  collaborate on scientific investigations. For more on massive data in astronomy, see the recent article `Big data in astronomy’ by Eric D.  Feigelson and the author in the August 2012 issues of `Significance‘.

While astronomical data traditionally consists of images and spectra,  the time domain is adding a new dimension to the astronomical imaging (

Repeated images of the sky reveal a wealth of information about our ever changing universe: dozens of species of variable stars; thousands of moving asteroids in the Solar System;  tens of thousands of quasars, supermassive black holes in distant galaxies; and hundreds of supernova explosions from dying stars. Type Ia supernovae are particularly important, as their numbers shed light on Dark Matter and Dark Energy.

An alphabet soup of time domain surveys in visible light are underway: SDSS III, PTF, CRTS, SNF, Pan-STARRS, VISTA, and more. The largest of the planned projects based on multi-epoch imaging is the Large Synoptic Survey Telescope (LSST), recently approved by the National Science Foundation as the largest U.S. ground-based project in astronomy.  It is expected to start around 2020. The LSST images half of the sky every 3 nights, producing a video of the sky with hundreds of millions of variable objects.  The data flow from this project will be around several Terabytes each night. The challenges from this project is putting hundreds of astronomers, engineers, computer scientists to think challenging problems, both in the management of massive data streams and in the data mining to emerge with strong scientific findings.

Recently, I co-organized a workshop on `Astrostatistics’  with Prajval  Shastri of Indian Institute of Astrophysics, as part of 2012-13 SAMSI  Program on Statistical and Computational Methodology for Massive Datasets.

Prajval Shastri and Ann Lee

Ann Lee (L) and son, and Prajval Shastri (R).

The three day workshop was held at SAMSI during September 19-21. Though it has a mix of talks by statisticians like Jim Berger and David Donoho, the majority of the talks were by astronomers who have ongoing collaborations with statisticians. Each talk ended with a lively, stimulating discussions. The audience consisted of a good mix of statisticians and astronomers. Presentations concentrated on Bayesian methods, faint source detection, learning from massive multidimensional data, exoplanets, time-domain astronomy, sparsity, reproducible research etc. The workshop concluded with a good discussion on future directions.

It is nice to get back to SAMSI and interact with friends and collaborators;  I had organized a very stimulating semester-long  Astrostatistics program at SAMSI in Spring 2006. Astronomers were already familiar with the concept of large scale electronic integration
of astronomy data, tools, and services on a global scale in a manner  that provides easy access by individuals around the world, via the  Virtual Observatory (VO). They are thus enabling science on massive data. Even in 2006, astronomers were grappling with massive data much before the terms `Big data’ or `megadatasets’ became vogue.