Taking a Different Road – Being a Statistics Major

The following is written by Sarah Lotspeich, University of Florida who attended the SAMSI Undergraduate Workshop focusing on Computational Neuroscience.

I declared my Statistics major in the eleventh grade, approximately halfway through my AP Statistics course. As everyone around me pondered medical school and the many types of engineering, I knew that my choice seemed unconventional. Now three years into my undergraduate degree, I have met only a handful of fellow Statistics majors to date. During the third week of October, however, this changed forever as I attended the SAMSI Undergraduate Workshop.

Duke Chapel

Duke Chapel.

It was a gorgeous fall day (a pleasant surprise for me, as my typical “fall” in Gainesville, Florida includes a few fallen leaves and a high temperature in the 80s) in Research Triangle Park, North Carolina. Budding statistics and mathematics students from across the country gathered to explore computational neuroscience, and to enjoy fantastic food. Always eager for an adventure, I flew in as early as possible the day before the workshop to get maximum exploring time in Durham. Perhaps a bit TOO eager, I walked over eight miles through Downtown Durham and to both edges of Duke University’s gorgeous gothic campus.

Dame's Chicken and Waffles

Excellent chicken and waffles place!

Fret not, however, as I was well fueled by Dame’s Chicken and Waffles and fondue from the Little Dipper. Needless to say the local area surpassed my every expectation and left me excited to wear scarves and learn more about statistics the following day. The mingling began at approximately 7:30am the next morning, as over thirty of my fellow “numbers people” bonded over bagels and oatmeal. I was so excited to hear from people who care as much about significance tests and p-values as I do!

The presentations commenced with an absolute bang as Dr. Ciprian Crainiceanu of Johns Hopkins University immersed us in “Neurohacking”. He outlined the basic principles of converting MRI images from picture to a system of numbers, and by the end of the hour left us with a data set and the necessary code to explore it independently. One of my favorite components of the workshop, actually, was the interactive nature of each presentation with the integration of R or Matlab code.

Guest lecturers introduced many fascinating facets of computational neuroscience, and I especially enjoyed how my knowledge on the subject compounded with each additional lecture. As the workshop progressed I found that I was relating information from one speaker’s presentation back to material I learned even hours previously, and even today I walked away with a nice basis on the topic. It very much feels as if I went from zero to one hundred with this material, and I appreciate the challenges posed to us by the complicated subject matter.

Beyond the presentations, the field trip to the laboratory for psychiatric neuroengineering at Duke University provided a “behind-the-scenes” glimpse at the processes of data collection that create the massive sets we dealt with during lecture. I was also just happy for any excuse to ogle the beautiful campus once more. Each new speaker and opportunity brought about new questions to ask and facts to learn, so I was happy for the constantly changing environment of the workshop from lecture to lecture, or even breaks for the field trip or panel.

students by SAMSI sign

From left to right: Jordan Zeldin, Eion Blanchard, Sarah Lotspeich, Michelle Zamperlini.

The many bus rides provided unexpectedly pleasant opportunities to meet new people, as well, as I was shuffled into new groups with each trip. I thoroughly enjoyed swapping stories about my university – about the weather, everyday dress code, the statistics department – with people from other schools! And I was even lucky enough to give suggestions about things to do and places to eat in Florida, as one of my new friends is planning a trip to the Sunshine State soon. Perhaps the most unexpected bonus to this experience was the people.

This was honestly one of the most incredible groups of students, and upon learning more about each person and their involvement I am absolutely honored to have been selected among them for the 2015 SAMSI Undergraduate Workshop. Though the workshop lasted only two day, the people I met and research I was immersed in will carry through my entire career. I cannot emphasize enough the importance of this experience and how strongly I recommend it.

There is a 100% probability that I would love to return to SAMSI sometime in the future.

Understanding Droughts – Part of the Undergraduate Modeling Workshop May 17-22, 2015

The following was written by Gabriel Ruiz, attendee from the University of California, Riverside.

attendees sitting listening to lecture

All of the attendees and some of the speakers on Day 1 of the workshop


The workshop attendees hard at work.

Just a few weeks ago in May, I was fortunate to be among the 26 undergraduates to attend one of many undergraduate workshops offered at the Statistical and Applied Mathematical Sciences Institute (SAMSI). This was a 5-day-long workshop on mathematical and statistical modeling. The backgrounds of students in attendance ranged from mathematics and statistics, to chemical or aerospace engineering and other fields from universities all across the country. There was also current researchers from SAMSI and other universities in attendance who gave talks on very interesting topics and who led the workshop sessions. Among my favorite parts of this workshop were talks in Bayesian Statistics, Discriminant Analysis, meeting some established researchers, getting to know my peers in mathematics and statistics, the great food we had, and, of course, having the opportunity to visit SAMSI in such a beautiful section of the country.

First Impressions: Raleigh, SAMSI, and NC State

students walking past sign

Attendees as they arrive at SAMSI to kick start the workshop.

cement pathway with trees

The scenic path attendees took to explore NC State and the surrounding area on the first day.

My very first impressions of Raleigh and its surrounding area was how green and pretty everything was. Coming from California, and considering the current drought we are experiencing, this was quite a sight. It was such a relaxing feel.

Students in front of the James B. Hunt Library

Workshop attendees visiting the famous Hunt Library at NC State.

Later on, it was fun meeting with all of the other undergraduate attendees at North Carolina State University, where we all stayed for the next 5 days. In the evening, after some great food, we took a walk around campus and even visited the renowned James B. Hunt Jr. Library. The NC State campus is so beautiful and big! Because of this, we got a little lost but that ended up being a good thing because we were able to see some more of the surrounding area in Raleigh.

The next day, we went to SAMSI on the other side of town for the introduction to what we would be doing throughout the week. We heard from some speakers on interesting topics, and ate some more delicious food. It was nice to get a sense of all the great work that goes on there.

Building on the NC State campus

A scenic example of Raleigh and NC State beauty.

The rest of the workshop was held in SAS Hall at NC State—named after the statistical software company when it was donated by former statistics faculty and founders of SAS Institute Inc.  This building is home to the Mathematics and Statistics departments and was just a light walk from where we were staying. The place we stayed at, I should add, contained a volleyball court that held several competitive games of volleyball among the attendees. This was a fun break after a day of math and statistics.

3 postdocs

Kimberly Kaufeld, Daniel Taylor-Rodriguez and Jyotishka Datta, all postdocs at SAMSI, working together.

There was plenty of informative talks given by researchers from various universities. Among some of the notable talks were given by:

Paul Brooks from Virginia Commonwealth University on “What Causes Shifts in the Human Microbiome.” This talk focused on the Community State Types (CST) of the vaginal microbiome to identify the microbiome profiles that are associated with a high risk of certain diseases as well as devising better predictions for changes in CSTs over time. Students at the workshop were able to work on a subset of this interesting project throughout the rest of week.

Daniel Taylor Rodriguez, a SAMSI postdoc, spoke about his approach to parameter estimation and variable selection of site-occupancy models that use presence-absence data. He presented an occupancy model with probit links and demonstrated his work on deriving more objective parameter priors as opposed to using AIC methods or other Bayesian approaches that require substantially more prior knowledge than is usually available.

Leah Jenkins of Clemson University gave a great talk titled “The Strawberries of Wrath: Farming Under the Realities of Drought”, in which she spoke about the current drought crisis in California—where 80% of the fruits and vegetables consumed in the US come from. The main focus of her talk was describing her and other mathematicians’ role in creating the “virtual farmer” software tool and the team’s use of mathematical modeling and optimization to help farmers in Pajaro Valley, CA remain profitable through current water restrictions. This challenging project was the primary motivation for the second project students were able to work on during this workshop.

Two other SAMSI post-doctoral researchers, Kimberly Kaufield and Yize Zhao, also had hands-on workshops in R, a statistical software, which were very informative to those of us who had limited experience with R. Jyotishka Datta, another postdoc at SAMSI, had a session in which he went over introductory statistical and probabilistic concepts in regression and classification in addition to high-dimensional applications and their implementations in R. A fifth postdoc, Christopher Strickland, went over some very useful approaches to the modeling and data analysis of dynamical systems in Python, as an alternative or complement to R and Matlab.

Among other notable talks were those by NC State PhD student, Neal Grantham, and SAS Institute Data Scientist, Yue Qi. Neal Grantham’s talk focused on the alternative approach to identifying the origin and history of a dust sample through the pollen found in it; the approach uses discriminant analysis and DNA sequencing to identify samples to within a short distance with a measurable degree of certainty as a compliment to a pollen expert’s more subjective identification. Yue Qi’s talk was about the tools he is helping to develop at SAS to more easily analyze “Big Data”, and more specifically he focused on the use of these tools in Machine Learning approaches to fight banking and insurance fraud.

These talks were all of the high quality you would expect at SAMSI, yet were accessible for all of us as undergraduates. After listening to all of these, I hope to learn some more about the research techniques that were discussed and maybe even contribute to the areas in which they have applied these techniques, such as the California drought. It was nice to get a feel about just how broad statistics and mathematics are.

The Workshop: working with a predator-prey dynamical system dataset

For the actual workshop aspect, we were split into groups of 5 that each worked on one of two very interesting topics. The first topic dealt with modeling a predator-prey dynamical system that was meant to be a simplified representation of the more complicated drought situation currently affecting California farms which account for a large portion of US vegetable and fruit supply. The second topic had to do with performing discriminant analysis to differentiate between microbiome states that are defined by the various levels of vaginal microorganisms thought to be higher or lower risk factors for certain diseases as compared to other microbiome states.

Group with mentors

One of the workshop groups alongside their mentors for the week, Daniel Rodriguez (first on the left) and Kimberly Kaufield (furthest on the right)

The dataset I worked with was the predator-prey dataset. We were tasked with first analyzing the time series data we were given on the abundance of three variables: water, plants, and beetles. The key here was to use some sort of time series techniques to model each variable against time. After we were able to find good models for each variable, we could plot the fitted lines of all three to see how they varied over time. The first observation we had was that the densities of each varied over time according to a sine and cosine pattern, so naturally we used a time series model with these properties. The fitted lines further demonstrated that plants had a spike (or dip) in their density whenever there was a spike (or dip) in the water supply. Of course, we know plants depend on water but it was nice to see this graphically over time. There was a very high correlation between these two variables, which helped quantify how strong the relationship was. This relationship is the key characteristic of a dynamical system. Because we had the “noisier” dataset, the same dependency of beetles on plants was not as observable, although it was present.

The next part of the workshop was to develop a system of differentiable equations that brought together all of these relationships. We used the Lotka-Volterra equations, which are also known as the predator-prey equations. The key here was that the parameters and variables needed some tweaking through ODE packages in R, further simulation, and our own intuition in order to best describe the system. This was interesting considering we had three variables to work with: natural resource, a prey, and a predator. The transition from the statistical aspect of this to mathematical modeling was the trickiest part, to say the least, since our group had no real experience with differential equations, much less bridging math and statistics in this way. Luckily, the postgraduates, Drs. Kaufield and Rodriguez, running this workshop walked us through the process and taught us about these equations.

two workshop members giving a talk

Two workshop attendees presenting their findings on the predator-prey dynamical system.

While I am still not completely comfortable with this last aspect, it was important to see the union of statistics and math modeling as a person who is mostly accustomed to the data analysis side. I have already started to look into creating a better system of differential equations this summer. And because I gained curiosity in this type of modeling after the workshop, I am also signed up for some extra math classes on ordinary and partial differential equations for next year and might even take some coursework in dynamical systems somewhere down the line.

Final thoughts: My key takeaways

Coming from California, it was interesting to see just how complicated these dynamic systems involving the seasonality of rain can be. It is important to note that our dynamical system was much more simplified, although still difficult to model with three variables, than the current drought in California. I can only imagine how many variables the analysts involved with this have to deal with, including legislation, people refusing to let their lawns go dry, and the system of aqueducts that go under farmer land which make modeling water levels quite challenging. Although difficult, there are plenty of mathematicians involved in the effort to conserve water in the most efficient way possible, including Clemson University’s Dr. Leah Jenkins who gave a great talk on the topic. I am curious enough from living in a section of California affected by this drought and by attending this workshop to continue to stay in the loop about what mathematicians will continue to do.

Having been in the process of finishing up my second year at the University of California, Riverside studying statistics, this opportunity was an invaluable and eye-opening experience. While I have not been in the world of Mathematics and Statistics for a long time, this workshop sparked curiosity in me about topics I had not yet been acquainted with but would now like to learn more about. For example, this summer, I will almost surely look into developing a better set of differential equations for the predator-prey dataset we were given during the workshop. I would also like to look into the other dataset to learn more about discriminant analysis. I have also come to realize that computational skills are very important. Among my programming to-do list this summer are Julia, Python, and some more R.

Besides the new statistical and mathematical techniques that we learned, I feel the main theme that I have taken away from this workshop is that statistics, math, and computing can all be brought together for meaningful applications in ecology and human health. Moreover, it is refreshing to have experienced first-hand that statistics and math are more than just numbers and equations in a textbook like I had become accustomed to in some of my coursework so far.

It was great to be around a great undergraduate cohort of statisticians and mathematicians who are all at the same point in their careers in this type of environment doing what we love most. The perspective I gained from my peers here, who are all from different universities across the country, about classes to take and interesting research topics is invaluable. To have met some established applied statisticians and mathematicians and listened to their research talks was inspiring. I hope to one day achieve that same level of expertise and fun they are having.

If you are an undergraduate student considering to apply to one of these workshops at SAMSI, I highly recommend that you apply and attend! You won’t regret it!

portrait of Gabriel Ruiz

Gabriel Ruiz.

Apply Now for SAMSI Undergraduate Modeling Workshop

Undergraduate students take note! SAMSI is taking applications for a unique, week-long opportunity to explore mathematical and statistical research in data modeled using networks. Talks will be presented by statisticians and mathematicians who work with networks, particularly focusing on social networks.

Many communication mediums, such as face-to-face conversations, text messaging to Facebook or Twitter make modern social networks complex and exciting systems to study.  Students will look at things such as how an individual’s attitudinal, behavioral or health characteristics are altered as a result of interacting with others.

For a good part of the week, students will be in teams and will use data from the Social Evolution experiment in the MIT Human Dynamics Lab to investigate a variety of questions related to the formation and evolution of social networks using data from approximately 100 students in a college dormitory during the 2008-2009 academic year.

Students will spend most of the week on the campus of North Carolina State University in Raleigh, North Carolina.

Hurry as the deadline to apply is April 7, 2014 at 5pm.  More details and the application can be found here.

Predicting number of landfalls of hurricanes — Undergraduate Modeling Workshop produces forecasts for 2013

group shot of undergraduates attending May 2013 workshop

Undergraduate workshop from May 2013.

Thirty-four undergraduate students from around the U.S. came to SAMSI and NC State University the week of May 13-17. During the week, the students interacted with an atmospheric scientist who works on hurricane research, and applied mathematicians and statisticians who work on climate research.  Students used the same database as used at NCSU to forecast various aspects of future hurricane seasons, and built Poisson regression models within R to produce their own forecasts of the 2013 hurricane season in the US. Below are some comments from participants:

three students with signs

Corey Raphael, U. Florida, Jonathan Skantz, U. Florida and Gwen Tian, U. British Columbia.

Corey Raphael, University of Florida
“I had a great time during my week at SAMSI! I learned all about climate science and hurricane predictions, and met a lot of great people. Thanks for all the advice and free food! I enjoyed getting to know the Raleigh area, and I learned a lot about R that I didn’t know previously. I hope the program enjoyed having me as much as I enjoyed being here!”

Group 3 shot

Evan Bittner, Penn State, Kasey Palmquist, UNC Wilmington, and Daria Drozdova, Pomona College.

Kasey Palmquist, University of North Carolina at Wilmington
“The workshop was an excellent experience; I truly feel that I am not leaving empty-handed. I not only learned new methods of statistical analysis, but how to collaborate with a group of people on a research topic. I found this workshop beneficial because it allows undergraduates to get a “feel” of mathematical/statistical research in order to see if it is right for them. I found the workshop to also be a great way to network and meet people that share the same interests as you. Overall, great experience!”

Group 6 SAMSI undergraduate modeling workshop May 2013

Brandon Sherman, U. Pitt, Kehao Zhu, Purdue, and Vinicius Taguchi, NCSU

Vinicius Taguchi, North Carolina State University
“This workshop was a wonderful experience.  I gained a better appreciation for statistics and applied mathematics, made lasting friendships, and got to see a new side of NC State University.  When I first got here, I was a little concerned about being one of the few non-math/stats majors, as well as one of the very few underclassmen.  Nevertheless, this never became an issue and I felt like part of the group right from the get-go.  Thank you, SAMSI.”

Group 2 photo SAMSI undergraduate modeling workshop May 2013

Lee Richardson, U. Washington-Seattle, Charles Ho, Rice and Anna Peris, Marquette.

Lee Richardson, University of Washington at Seattle
From his Twitter feed – “Predicted a Poisson Distribution with a mean 3.96. AKA 56% chance of greater than 4 hurricanes!!!!!”

Here are some of the presentations that the students gave the last day of the workshop.

Impressions from the Undergraduate Workshop on Data-Driven Decisions in Healthcare

big group of students outside SAMSI

February 2013 Undergraduate Workshop participants.

SAMSI recently held the Undergraduate Workshop on Data-Driven Decisions in Healthcare for about 30 students. Visiting professors, postdoctoral fellows and graduate fellows who are participating in this SAMSI program led the sessions providing cutting-edge research into the lectures. Students had a chance to work with data from the SEElab at Technion in Israel, got an overview of personalized medicine and a tutorial in R and a demonstration of the ARENA software.  Here are a few of the students’ impressions from the workshop.

Eric Laber instructing students

Eric Laber, NCSU, giving lecture at the workshop.

Eric Kernfeld, Tufts University Class of 2014, Applied Mathematics

“I had a great time at the workshop on Data Driven Decisions in Health Care this past weekend. It was a nice opportunity to meet statisticians, something I don’t get the chance to do back at Tufts. I also met a lot of undergraduates majoring in statistics and mathematics. The food was good, the staff were welcoming, the accommodations were convenient, and the talks were well-pitched. I recommend SAMSI workshops to anyone who’s interested in the topics, especially to people considering graduate education down the road.”

Danielle Llanos, Georgetown University

“I thought the SAMSI workshop was wonderful. It was a great opportunity to learn from talented individuals, and a chance to expand my network. The lecture topics were incredibly interesting and were very relevant to my career goals. Probably the best part of the workshop was the graduate student panel. The ability to ask those burning questions and learn from the experiences of others was great. I would recommend any SAMSI workshop to students looking to learn more about opportunities in the sciences, and expanding their educational experiences.”

three students at table

Students networking at lunch.

Brittany Boribong, sophomore, biomathematics major at University of Scranton

“As a student with no background in statistics and programming, I found the workshop a bit overwhelming but no less interesting. Coming into this with no experience just allowed me to take that much more out of the workshop.  I was able to explore new fields of math that I never considered before and learn about topics that I had no idea even existed. As a Biomathematics major, I found the topic of using data to derive decisions in healthcare intriguing since it is an application of my major that I was not aware of. Another wonderful aspect of the workshop was the chance to speak to people in different fields. During lunch, I had the opportunity to speak to a post-doc fellow and during dinner, I spoke to one of the professors that gave a lecture earlier in the day; these opportunities don’t come along every day. It was enjoyable hearing their stories and being able to have a casual conversation with them. The panel made up of current graduate students and post-docs was also helpful in that they were able to share their experiences about graduate school and offer along any advice. I found it particularly helpful since one of the speakers was currently in a biomathematics program and I was able to ask questions I had about my major.

However, the best part of the workshop, in my opinion, was being to meet other students. Coming from a university with a smaller math department, I really enjoyed meeting students from around the country with interests similar to my own. It was great being able to make connections with students in different fields and from universities from all over. Overall, I had a wonderful time meeting new people and exploring different fields of mathematics during the workshop and found this to be a great experience.”

Apply Now for the 2-Day Undergraduate Workshop at SAMSI October 26-27

group of undergraduate students from 2011

Last year’s undergraduate workshop group.

SAMSI is accepting applications for the two-day undergraduate workshop that will focus on Statistical and Computational Methodology for Massive Datasets. The workshop will be held October 26-27 at SAMSI in Research Triangle Park, NC. The program begins at 9:30am on Friday, October 26 and ends at noon on Saturday, October 27.

Applications received by Friday, September 28 will receive full consideration. SAMSI will reimburse appropriate travel expenses, including food and lodging. Participants are urged to arrive on Thursday evening.

The Statistical and Computational Methodology for Massive Datasets program focuses on fundamental methodological questions of statistics, mathematics and computer science posed by massive datasets, with applications to astronomy, high energy physics, and the environment. Serious challenges posed by massive datasets have to do with “scalability” and “data streaming.”