Understanding Droughts – Part of the Undergraduate Modeling Workshop May 17-22, 2015

The following was written by Gabriel Ruiz, attendee from the University of California, Riverside.

attendees sitting listening to lecture

All of the attendees and some of the speakers on Day 1 of the workshop

UG-2

The workshop attendees hard at work.

Just a few weeks ago in May, I was fortunate to be among the 26 undergraduates to attend one of many undergraduate workshops offered at the Statistical and Applied Mathematical Sciences Institute (SAMSI). This was a 5-day-long workshop on mathematical and statistical modeling. The backgrounds of students in attendance ranged from mathematics and statistics, to chemical or aerospace engineering and other fields from universities all across the country. There was also current researchers from SAMSI and other universities in attendance who gave talks on very interesting topics and who led the workshop sessions. Among my favorite parts of this workshop were talks in Bayesian Statistics, Discriminant Analysis, meeting some established researchers, getting to know my peers in mathematics and statistics, the great food we had, and, of course, having the opportunity to visit SAMSI in such a beautiful section of the country.

First Impressions: Raleigh, SAMSI, and NC State

students walking past sign

Attendees as they arrive at SAMSI to kick start the workshop.

cement pathway with trees

The scenic path attendees took to explore NC State and the surrounding area on the first day.

My very first impressions of Raleigh and its surrounding area was how green and pretty everything was. Coming from California, and considering the current drought we are experiencing, this was quite a sight. It was such a relaxing feel.

Students in front of the James B. Hunt Library

Workshop attendees visiting the famous Hunt Library at NC State.

Later on, it was fun meeting with all of the other undergraduate attendees at North Carolina State University, where we all stayed for the next 5 days. In the evening, after some great food, we took a walk around campus and even visited the renowned James B. Hunt Jr. Library. The NC State campus is so beautiful and big! Because of this, we got a little lost but that ended up being a good thing because we were able to see some more of the surrounding area in Raleigh.

The next day, we went to SAMSI on the other side of town for the introduction to what we would be doing throughout the week. We heard from some speakers on interesting topics, and ate some more delicious food. It was nice to get a sense of all the great work that goes on there.

Building on the NC State campus

A scenic example of Raleigh and NC State beauty.

The rest of the workshop was held in SAS Hall at NC State—named after the statistical software company when it was donated by former statistics faculty and founders of SAS Institute Inc.  This building is home to the Mathematics and Statistics departments and was just a light walk from where we were staying. The place we stayed at, I should add, contained a volleyball court that held several competitive games of volleyball among the attendees. This was a fun break after a day of math and statistics.

3 postdocs

Kimberly Kaufeld, Daniel Taylor-Rodriguez and Jyotishka Datta, all postdocs at SAMSI, working together.

There was plenty of informative talks given by researchers from various universities. Among some of the notable talks were given by:

Paul Brooks from Virginia Commonwealth University on “What Causes Shifts in the Human Microbiome.” This talk focused on the Community State Types (CST) of the vaginal microbiome to identify the microbiome profiles that are associated with a high risk of certain diseases as well as devising better predictions for changes in CSTs over time. Students at the workshop were able to work on a subset of this interesting project throughout the rest of week.

Daniel Taylor Rodriguez, a SAMSI postdoc, spoke about his approach to parameter estimation and variable selection of site-occupancy models that use presence-absence data. He presented an occupancy model with probit links and demonstrated his work on deriving more objective parameter priors as opposed to using AIC methods or other Bayesian approaches that require substantially more prior knowledge than is usually available.

Leah Jenkins of Clemson University gave a great talk titled “The Strawberries of Wrath: Farming Under the Realities of Drought”, in which she spoke about the current drought crisis in California—where 80% of the fruits and vegetables consumed in the US come from. The main focus of her talk was describing her and other mathematicians’ role in creating the “virtual farmer” software tool and the team’s use of mathematical modeling and optimization to help farmers in Pajaro Valley, CA remain profitable through current water restrictions. This challenging project was the primary motivation for the second project students were able to work on during this workshop.

Two other SAMSI post-doctoral researchers, Kimberly Kaufield and Yize Zhao, also had hands-on workshops in R, a statistical software, which were very informative to those of us who had limited experience with R. Jyotishka Datta, another postdoc at SAMSI, had a session in which he went over introductory statistical and probabilistic concepts in regression and classification in addition to high-dimensional applications and their implementations in R. A fifth postdoc, Christopher Strickland, went over some very useful approaches to the modeling and data analysis of dynamical systems in Python, as an alternative or complement to R and Matlab.

Among other notable talks were those by NC State PhD student, Neal Grantham, and SAS Institute Data Scientist, Yue Qi. Neal Grantham’s talk focused on the alternative approach to identifying the origin and history of a dust sample through the pollen found in it; the approach uses discriminant analysis and DNA sequencing to identify samples to within a short distance with a measurable degree of certainty as a compliment to a pollen expert’s more subjective identification. Yue Qi’s talk was about the tools he is helping to develop at SAS to more easily analyze “Big Data”, and more specifically he focused on the use of these tools in Machine Learning approaches to fight banking and insurance fraud.

These talks were all of the high quality you would expect at SAMSI, yet were accessible for all of us as undergraduates. After listening to all of these, I hope to learn some more about the research techniques that were discussed and maybe even contribute to the areas in which they have applied these techniques, such as the California drought. It was nice to get a feel about just how broad statistics and mathematics are.

The Workshop: working with a predator-prey dynamical system dataset

For the actual workshop aspect, we were split into groups of 5 that each worked on one of two very interesting topics. The first topic dealt with modeling a predator-prey dynamical system that was meant to be a simplified representation of the more complicated drought situation currently affecting California farms which account for a large portion of US vegetable and fruit supply. The second topic had to do with performing discriminant analysis to differentiate between microbiome states that are defined by the various levels of vaginal microorganisms thought to be higher or lower risk factors for certain diseases as compared to other microbiome states.

Group with mentors

One of the workshop groups alongside their mentors for the week, Daniel Rodriguez (first on the left) and Kimberly Kaufield (furthest on the right)

The dataset I worked with was the predator-prey dataset. We were tasked with first analyzing the time series data we were given on the abundance of three variables: water, plants, and beetles. The key here was to use some sort of time series techniques to model each variable against time. After we were able to find good models for each variable, we could plot the fitted lines of all three to see how they varied over time. The first observation we had was that the densities of each varied over time according to a sine and cosine pattern, so naturally we used a time series model with these properties. The fitted lines further demonstrated that plants had a spike (or dip) in their density whenever there was a spike (or dip) in the water supply. Of course, we know plants depend on water but it was nice to see this graphically over time. There was a very high correlation between these two variables, which helped quantify how strong the relationship was. This relationship is the key characteristic of a dynamical system. Because we had the “noisier” dataset, the same dependency of beetles on plants was not as observable, although it was present.

The next part of the workshop was to develop a system of differentiable equations that brought together all of these relationships. We used the Lotka-Volterra equations, which are also known as the predator-prey equations. The key here was that the parameters and variables needed some tweaking through ODE packages in R, further simulation, and our own intuition in order to best describe the system. This was interesting considering we had three variables to work with: natural resource, a prey, and a predator. The transition from the statistical aspect of this to mathematical modeling was the trickiest part, to say the least, since our group had no real experience with differential equations, much less bridging math and statistics in this way. Luckily, the postgraduates, Drs. Kaufield and Rodriguez, running this workshop walked us through the process and taught us about these equations.

two workshop members giving a talk

Two workshop attendees presenting their findings on the predator-prey dynamical system.

While I am still not completely comfortable with this last aspect, it was important to see the union of statistics and math modeling as a person who is mostly accustomed to the data analysis side. I have already started to look into creating a better system of differential equations this summer. And because I gained curiosity in this type of modeling after the workshop, I am also signed up for some extra math classes on ordinary and partial differential equations for next year and might even take some coursework in dynamical systems somewhere down the line.

Final thoughts: My key takeaways

Coming from California, it was interesting to see just how complicated these dynamic systems involving the seasonality of rain can be. It is important to note that our dynamical system was much more simplified, although still difficult to model with three variables, than the current drought in California. I can only imagine how many variables the analysts involved with this have to deal with, including legislation, people refusing to let their lawns go dry, and the system of aqueducts that go under farmer land which make modeling water levels quite challenging. Although difficult, there are plenty of mathematicians involved in the effort to conserve water in the most efficient way possible, including Clemson University’s Dr. Leah Jenkins who gave a great talk on the topic. I am curious enough from living in a section of California affected by this drought and by attending this workshop to continue to stay in the loop about what mathematicians will continue to do.

Having been in the process of finishing up my second year at the University of California, Riverside studying statistics, this opportunity was an invaluable and eye-opening experience. While I have not been in the world of Mathematics and Statistics for a long time, this workshop sparked curiosity in me about topics I had not yet been acquainted with but would now like to learn more about. For example, this summer, I will almost surely look into developing a better set of differential equations for the predator-prey dataset we were given during the workshop. I would also like to look into the other dataset to learn more about discriminant analysis. I have also come to realize that computational skills are very important. Among my programming to-do list this summer are Julia, Python, and some more R.

Besides the new statistical and mathematical techniques that we learned, I feel the main theme that I have taken away from this workshop is that statistics, math, and computing can all be brought together for meaningful applications in ecology and human health. Moreover, it is refreshing to have experienced first-hand that statistics and math are more than just numbers and equations in a textbook like I had become accustomed to in some of my coursework so far.

It was great to be around a great undergraduate cohort of statisticians and mathematicians who are all at the same point in their careers in this type of environment doing what we love most. The perspective I gained from my peers here, who are all from different universities across the country, about classes to take and interesting research topics is invaluable. To have met some established applied statisticians and mathematicians and listened to their research talks was inspiring. I hope to one day achieve that same level of expertise and fun they are having.

If you are an undergraduate student considering to apply to one of these workshops at SAMSI, I highly recommend that you apply and attend! You won’t regret it!

portrait of Gabriel Ruiz

Gabriel Ruiz.

Advertisements

Postdoc Profile – Christopher Strickland

Christopher Strickland on a hill with the ocean in the background

Christopher Strickland hiking in New Zealand.

SAMSI postdoctoral fellow, Christopher Strickland was born in Houston, Texas and lived briefly there and in Dallas before he could really remember either place. He grew up in Oxford, Mississippi. His grandfather was Chair of Modern Languages at the University of Mississippi and helped to establish a study abroad program, and his grandmother was originally from France, so many summers his father and grandfather traveled to France. When Christopher went to ” Ole Miss,” in the honors college, he minored in physics before switching degrees and getting a double degree in Mathematics and French.

At first he was following a more pure math route. He went to the University of Florida in Gainesville for his Master’s degree and was studying logic, but after about a year and a half, he realized this was not the area he preferred. After changing his focus to dynamical systems and defending his Master’s thesis, he stayed in Gainesville for a year as he tried to figure out what to do next and taught mathematics at Santa Fe Community College. He knew he would prefer to get into an area that involved applied math instead of pure math. He became interested in mathematical ecology and had heard that Colorado State University had a great program in ecology and the natural sciences, so he applied there to get his mathematics Ph.D.

Christopher considers Patrick Shipman, who was a new faculty member at Colorado State at the time, and Gerhard Dangelmayr, who is the Chair of the department, to be his mentors. They were also his co-advisors. Christopher and Patrick started collaborating on projects right away.

“I was headed toward dynamical systems which is really related to mathematical ecology, so I worked with Patrick and Gerhard for the next six years,” Christopher said, “I still collaborate with both of them, and we are currently applying for a research grant to work with the U.S. Fish and Wildlife Commission.” Christopher has also collaborated with Patrick Shipman and Snehal Shetye at Colorado State on a project modeling the mechanical properties of spinal cords.

Nate Burch told me about SAMSI originally,” said Christopher. “Nate and I were colleagues at Colorado State.” So, when the Ecology program was announced, he applied and was accepted.

Christopher Strickland standing on a rocky edge

Christopher Strickland hiking in Australia.

While he’s been at SAMSI, Christopher has worked on getting various parts of his dissertation re-written into smaller parts so that he can publish each part in various journals. He has three of the four published. The manuscript of the fourth one is completed, and has been submitted as of June 2015.

Christopher has been participating in two working groups this year: The Tipping Point group and the Physical Ecology group. The Physical Ecology group led by Laura Miller, has been particularly interesting for him. “We recently had this really great workshop at SAMSI, which was for the people participating in the working group. We invited Nadia Kristensen from the University of Queensland who brought in all this great data from parasitoid wasp release and spread. That’s been really nice because I mostly do modeling of dynamic systems and the model that she had with this data could be something I could help her improve,” he commented.

“We are also working on a review paper, which is something the working group conceived of sometime around December. The entire working group and even some other people, including some ecologists and my advisor from Colorado State, Patrick, is working on this review,” Christopher said. He believes the review will be completed by the end of this summer.

Much of Christopher’s research focuses on networks, specifically looking at spread and control of contagions on the network. One example would be to look at container shipping networks or airline networks. He is working on a grant that is looking at white nose bat syndrome that involves a network of caves. While bats could spread the disease themselves from cave to cave, there is also the concern that hikers or cavers could get the fungus on their boots and spread the disease when they hike in a different cave. By figuring out how these networks work, it may help ecologists figure out where the disease might spread next, or help them to get a disease under control.

Christopher Strickland makes a kick

Christopher practicing Cuong Nhu.

When Christopher is not at work, he is either playing a game of soccer (he used to be on a math league!) or he is practicing the art of Cuong Nhu, (meaning hard/soft in Vietnamese) a type of martial arts that was brought to the United States in Gainesville, Florida. Christopher is on target to get his black belt, probably in about a year. “A lot of scholarly people actually do this type of martial arts. It has been a good way to network,” quipped Christopher. He also spends time with his girlfriend, Anne Ho, who is a theoretical mathematician. They like to travel a lot, many times to national parks or overseas.

In the fall, Christopher will be teaching at the University of North Carolina at Chapel Hill while he completes his second year as a postdoctoral fellow for SAMSI.

Why you should attend the SAMSI Forensics 2015-2016 opening workshop

The following was written by Dr. Clifford Spiegelman, Distinguished Professor of Statistics at Texas A&M and one of the program leaders for the 2015-2016 SAMSI Program on Statistics and Applied Mathematics of Forensic Science.

Cliff Spiegelman

Dr. Clifford Spiegelman

Imagine having a nightmare where nearly all evidence presented in courts was seriously misrepresented. No, not a nightmare about someone accused of being a witch, but a more current trial. Say the defendant is accused of rape or murder and all the scientific evidence presented was seriously misrepresented and biased toward the prosecution. It would not be a pleasant dream, but it is today’s reality, and that is worse than a nightmare as it is real. Within the last months the FBI has admitted to over representing the importance of hair matches for decades. Prior to that in 2007 CBLA or comparative bullet lead analysis was another procedure used for decades where the FBI admitted to overstating the importance of a match.

Forensic science is inherently a field that uses data (patterns, pictures, etc.) to link suspects to crimes. Unfortunately, the use of formal statistical methods or even statistical or mathematical thinking is uncommon.

That is where you can help. There is a dearth of persons, as in way to few mathematical scientists, that are aware of the issues.

What are the issues?

Well one can read the summary of the 2009 NRC report “Strengthening Forensic Science in the United States: A Path Forward” to get a good overall view. Here are some of my recent consults: A defendant was charged with indecent contact with a minor. The minor had chlamydia but the defendant did not and was not treated for chlamydia. What is the probability? In another case a convict has been in jail for 40 years largely based upon hair and fiber evidence. The hair evidence was inconclusive. That is the crime lab hair examiner testified that there were both similarities and dissimilarities between the pubic hairs found at the scene and on the defendant. Subsequently some inconclusive results (not the case in question as the evidence has gone missing) have been investigated using DNA. What are the odds that an inconclusive microscopic hair analysis has a DNA analysis that excludes the defendant? It is more than ½.

The opening workshop will look at various forms of traditional pattern evidence. These include fingerprints, firearm/toolmarks, shoeprints etc.. Help become part of the birth of taking forensic science from oxymoron state to a real science.

The opening workshop program can be found here. Read more about the overall program here, and if you want to learn more about forensics before the opening workshop, consider attending a special tutorial a few days before the big event begins.

Please join us. You can make a difference to the legal system and make our country a more just place.

Measuring the Success of a SAMSI Program – My Experience at the Beyond BIoinformatics Transition Workshop

The following was written by Katerina Kechris, Associate Professor and Graduate Program Director, University of Colorado – Denver. School of Public Health.

Katerina Kechris

Katerina Kechris

In mid-May 2015, working groups from the Beyond Bioinformatics Program gathered during the Bioinformatics Transition Workshop. This was a culmination of eight months of progress for over 10 working groups. The workshop topics were diverse and covered a variety of topics including epigenetics, microbial communities, evolutionary models, imaging genetics, next generation sequencing errors, high-dimensional discrete data, multiple hypothesis testing and data integration. The diversity of these topics reflects the current state of research in the biomedical sciences where technologies are advancing the study of biological mechanisms, structures, populations and disease. These technologies are generating high-dimensional and complex data structures providing intriguing opportunities for statisticians, mathematicians and computer scientists to develop new models, methods and algorithms to answer important biological questions.

Group photo outside

The Beyond Bioinformatics Transition Workshop attendees.

As a leader for one of the two Data Integration working groups, I was excited to hear about the activities from the other working groups during the workshop. I found their progress impressive, considering that many of the group members did not know each other until the Opening Workshop just eight months earlier. The transition workshop gave me the opportunity to reflect: How does one measure success of a program year and a working group? There are the usual metrics of publications, conference presentations and grant proposals that will be documented in great detail for reports. But at the workshop I could see more qualitative and interpersonal measures of successes. First, new collaborations were developed among researchers who would otherwise not have had the opportunity to meet and work together.

Personally, I enjoyed getting to know and working as a team with the other Data Integration working group leaders and members. Second, I was pleased to see great attendance and presentations at the workshop by students and post-docs. I know in several cases that the working group facilitated thesis and post-doctoral research projects for these junior investigators. Finally, I observed that there are ongoing plans to continue the working group efforts beyond the formal program year, which speaks to the positive aspects of the program. As for our working groups, it was such a pleasure to make new colleagues and see the evolution of how we approached the problem of data integration with very different perspectives and methods. I look forward to learning about the continuing progress of all groups.

Classroom shot of people listening to lecture

Listening to a working group make its report.