The following was written by Gabriel Ruiz, attendee from the University of California, Riverside.
All of the attendees and some of the speakers on Day 1 of the workshop
The workshop attendees hard at work.
Just a few weeks ago in May, I was fortunate to be among the 26 undergraduates to attend one of many undergraduate workshops offered at the Statistical and Applied Mathematical Sciences Institute (SAMSI). This was a 5-day-long workshop on mathematical and statistical modeling. The backgrounds of students in attendance ranged from mathematics and statistics, to chemical or aerospace engineering and other fields from universities all across the country. There was also current researchers from SAMSI and other universities in attendance who gave talks on very interesting topics and who led the workshop sessions. Among my favorite parts of this workshop were talks in Bayesian Statistics, Discriminant Analysis, meeting some established researchers, getting to know my peers in mathematics and statistics, the great food we had, and, of course, having the opportunity to visit SAMSI in such a beautiful section of the country.
First Impressions: Raleigh, SAMSI, and NC State
Attendees as they arrive at SAMSI to kick start the workshop.
The scenic path attendees took to explore NC State and the surrounding area on the first day.
My very first impressions of Raleigh and its surrounding area was how green and pretty everything was. Coming from California, and considering the current drought we are experiencing, this was quite a sight. It was such a relaxing feel.
Workshop attendees visiting the famous Hunt Library at NC State.
Later on, it was fun meeting with all of the other undergraduate attendees at North Carolina State University, where we all stayed for the next 5 days. In the evening, after some great food, we took a walk around campus and even visited the renowned James B. Hunt Jr. Library. The NC State campus is so beautiful and big! Because of this, we got a little lost but that ended up being a good thing because we were able to see some more of the surrounding area in Raleigh.
The next day, we went to SAMSI on the other side of town for the introduction to what we would be doing throughout the week. We heard from some speakers on interesting topics, and ate some more delicious food. It was nice to get a sense of all the great work that goes on there.
A scenic example of Raleigh and NC State beauty.
The rest of the workshop was held in SAS Hall at NC State—named after the statistical software company when it was donated by former statistics faculty and founders of SAS Institute Inc. This building is home to the Mathematics and Statistics departments and was just a light walk from where we were staying. The place we stayed at, I should add, contained a volleyball court that held several competitive games of volleyball among the attendees. This was a fun break after a day of math and statistics.
Kimberly Kaufeld, Daniel Taylor-Rodriguez and Jyotishka Datta, all postdocs at SAMSI, working together.
There was plenty of informative talks given by researchers from various universities. Among some of the notable talks were given by:
Paul Brooks from Virginia Commonwealth University on “What Causes Shifts in the Human Microbiome.” This talk focused on the Community State Types (CST) of the vaginal microbiome to identify the microbiome profiles that are associated with a high risk of certain diseases as well as devising better predictions for changes in CSTs over time. Students at the workshop were able to work on a subset of this interesting project throughout the rest of week.
Daniel Taylor Rodriguez, a SAMSI postdoc, spoke about his approach to parameter estimation and variable selection of site-occupancy models that use presence-absence data. He presented an occupancy model with probit links and demonstrated his work on deriving more objective parameter priors as opposed to using AIC methods or other Bayesian approaches that require substantially more prior knowledge than is usually available.
Leah Jenkins of Clemson University gave a great talk titled “The Strawberries of Wrath: Farming Under the Realities of Drought”, in which she spoke about the current drought crisis in California—where 80% of the fruits and vegetables consumed in the US come from. The main focus of her talk was describing her and other mathematicians’ role in creating the “virtual farmer” software tool and the team’s use of mathematical modeling and optimization to help farmers in Pajaro Valley, CA remain profitable through current water restrictions. This challenging project was the primary motivation for the second project students were able to work on during this workshop.
Two other SAMSI post-doctoral researchers, Kimberly Kaufield and Yize Zhao, also had hands-on workshops in R, a statistical software, which were very informative to those of us who had limited experience with R. Jyotishka Datta, another postdoc at SAMSI, had a session in which he went over introductory statistical and probabilistic concepts in regression and classification in addition to high-dimensional applications and their implementations in R. A fifth postdoc, Christopher Strickland, went over some very useful approaches to the modeling and data analysis of dynamical systems in Python, as an alternative or complement to R and Matlab.
Among other notable talks were those by NC State PhD student, Neal Grantham, and SAS Institute Data Scientist, Yue Qi. Neal Grantham’s talk focused on the alternative approach to identifying the origin and history of a dust sample through the pollen found in it; the approach uses discriminant analysis and DNA sequencing to identify samples to within a short distance with a measurable degree of certainty as a compliment to a pollen expert’s more subjective identification. Yue Qi’s talk was about the tools he is helping to develop at SAS to more easily analyze “Big Data”, and more specifically he focused on the use of these tools in Machine Learning approaches to fight banking and insurance fraud.
These talks were all of the high quality you would expect at SAMSI, yet were accessible for all of us as undergraduates. After listening to all of these, I hope to learn some more about the research techniques that were discussed and maybe even contribute to the areas in which they have applied these techniques, such as the California drought. It was nice to get a feel about just how broad statistics and mathematics are.
The Workshop: working with a predator-prey dynamical system dataset
For the actual workshop aspect, we were split into groups of 5 that each worked on one of two very interesting topics. The first topic dealt with modeling a predator-prey dynamical system that was meant to be a simplified representation of the more complicated drought situation currently affecting California farms which account for a large portion of US vegetable and fruit supply. The second topic had to do with performing discriminant analysis to differentiate between microbiome states that are defined by the various levels of vaginal microorganisms thought to be higher or lower risk factors for certain diseases as compared to other microbiome states.
One of the workshop groups alongside their mentors for the week, Daniel Rodriguez (first on the left) and Kimberly Kaufield (furthest on the right)
The dataset I worked with was the predator-prey dataset. We were tasked with first analyzing the time series data we were given on the abundance of three variables: water, plants, and beetles. The key here was to use some sort of time series techniques to model each variable against time. After we were able to find good models for each variable, we could plot the fitted lines of all three to see how they varied over time. The first observation we had was that the densities of each varied over time according to a sine and cosine pattern, so naturally we used a time series model with these properties. The fitted lines further demonstrated that plants had a spike (or dip) in their density whenever there was a spike (or dip) in the water supply. Of course, we know plants depend on water but it was nice to see this graphically over time. There was a very high correlation between these two variables, which helped quantify how strong the relationship was. This relationship is the key characteristic of a dynamical system. Because we had the “noisier” dataset, the same dependency of beetles on plants was not as observable, although it was present.
The next part of the workshop was to develop a system of differentiable equations that brought together all of these relationships. We used the Lotka-Volterra equations, which are also known as the predator-prey equations. The key here was that the parameters and variables needed some tweaking through ODE packages in R, further simulation, and our own intuition in order to best describe the system. This was interesting considering we had three variables to work with: natural resource, a prey, and a predator. The transition from the statistical aspect of this to mathematical modeling was the trickiest part, to say the least, since our group had no real experience with differential equations, much less bridging math and statistics in this way. Luckily, the postgraduates, Drs. Kaufield and Rodriguez, running this workshop walked us through the process and taught us about these equations.
Two workshop attendees presenting their findings on the predator-prey dynamical system.
While I am still not completely comfortable with this last aspect, it was important to see the union of statistics and math modeling as a person who is mostly accustomed to the data analysis side. I have already started to look into creating a better system of differential equations this summer. And because I gained curiosity in this type of modeling after the workshop, I am also signed up for some extra math classes on ordinary and partial differential equations for next year and might even take some coursework in dynamical systems somewhere down the line.
Final thoughts: My key takeaways
Coming from California, it was interesting to see just how complicated these dynamic systems involving the seasonality of rain can be. It is important to note that our dynamical system was much more simplified, although still difficult to model with three variables, than the current drought in California. I can only imagine how many variables the analysts involved with this have to deal with, including legislation, people refusing to let their lawns go dry, and the system of aqueducts that go under farmer land which make modeling water levels quite challenging. Although difficult, there are plenty of mathematicians involved in the effort to conserve water in the most efficient way possible, including Clemson University’s Dr. Leah Jenkins who gave a great talk on the topic. I am curious enough from living in a section of California affected by this drought and by attending this workshop to continue to stay in the loop about what mathematicians will continue to do.
Having been in the process of finishing up my second year at the University of California, Riverside studying statistics, this opportunity was an invaluable and eye-opening experience. While I have not been in the world of Mathematics and Statistics for a long time, this workshop sparked curiosity in me about topics I had not yet been acquainted with but would now like to learn more about. For example, this summer, I will almost surely look into developing a better set of differential equations for the predator-prey dataset we were given during the workshop. I would also like to look into the other dataset to learn more about discriminant analysis. I have also come to realize that computational skills are very important. Among my programming to-do list this summer are Julia, Python, and some more R.
Besides the new statistical and mathematical techniques that we learned, I feel the main theme that I have taken away from this workshop is that statistics, math, and computing can all be brought together for meaningful applications in ecology and human health. Moreover, it is refreshing to have experienced first-hand that statistics and math are more than just numbers and equations in a textbook like I had become accustomed to in some of my coursework so far.
It was great to be around a great undergraduate cohort of statisticians and mathematicians who are all at the same point in their careers in this type of environment doing what we love most. The perspective I gained from my peers here, who are all from different universities across the country, about classes to take and interesting research topics is invaluable. To have met some established applied statisticians and mathematicians and listened to their research talks was inspiring. I hope to one day achieve that same level of expertise and fun they are having.
If you are an undergraduate student considering to apply to one of these workshops at SAMSI, I highly recommend that you apply and attend! You won’t regret it!
Gabriel Ruiz.