What Does It Mean to be a Woman in Mathematics?

The following blog post was written by Jessica Matthews, Cooperative Institute for Climate and Satellites (CICS-NC).

room full of women watching presentation

Workshop for Women in Mathematics held April 6-8, 2016.

When offered the invitation to speak at SAMSI’s Opportunities Workshop for Women in Math Sciences, I gladly accepted. When it came time to actually prepare the presentation, I realized that I had never attended, let alone presented, at this type of workshop ever before. I am well versed in putting together a scientific presentation, but this was different. So I myself was faced with the opportunity to consider what it meant to be a woman in mathematics. I had the opening talk time slot, which inherently carries with it the pressure of setting the tone for the entire event. I chose to draw from my personal experiences and to discuss career possibilities beyond the classroom, skill sets I have found necessary (beyond math), and a few key challenges faced by women in our field. A spirited discussion regarding the pay gap and the importance of negotiation entailed. I enjoyed the free-flowing discussion, and felt like this open and welcoming atmosphere was present for the rest of our gathered time.

Throughout the two and half days of the workshop, we had the privilege of hearing from a number of women who have successful careers in academia, industry, and government. They shared their lessons learned, fielded questions, and led discussions about career opportunities and challenges experienced. I cannot possibly capture a comprehensive account of all the great talks and conversations that took place in this workshop, so I provide merely a few personal highlights.

two ladies talking in the hallway

Amanda Goldbeck (R) talking to a participant of the workshop.

Amanda Golbeck introduced the concept of viewing one’s career path as a jungle-gym rather than a ladder. We tend to have the ingrained view of the traditional (and linear) career path, while in reality, to maintain a healthy life–work balance, flexibility is required.  Another grain of wisdom she offered is that being a strong leader is important, but being a valuable team member is paramount. I think this is often forgotten in our power-hungry society, but the truth is that more can be accomplished via cooperation and we should value the cultivation of teamwork skills.

Panel at the women in math workshop

L-R: Ulrica Wilson, Lea Jenkins and Amanda Goldbeck.

Drawing on her experiences at a historically black university, Ulrica Wilson offered a great explanation as to why having workshops such as this one is not only relevant, but important for increasing and maintaining diversity. When we take the time to create this space, we are able to stop focusing on what makes us different and just focus on the math—which is really what we were all drawn to when we chose this pursuit in the first place!

Marie Davidian gave a fascinating overview of notable women in the mathematical sciences, both in the past and the present. I was captivated with the story of the trailblazer Gertrude Cox, founding head of the (then-named) Department of Experimental Statistics at NCSU in 1941. Her recommendation for the position came in the way of a footnote appended to a letter containing a list of recommended male peers: “Of course if you would consider a woman for this position, I would recommend Gertrude Cox of my staff.” This truly puts into perspective how far the community has come with regard to gender equality.

The workshop attendees were energetic and engaged, which made the panel-led discussions and breakout sessions (not to mention breaks) both stimulating and fun. The participants were largely graduate students and early career scientists, who had plenty of thoughtful questions for the expert representatives from academia, industry, and government. Even though I may have been cast as one of the experts, I found that I learned a lot and left the workshop with a to-do list of actions I am interested in taking. In particular: joining a mentor network, engaging more in professional society events, and advocating for family leave benefits.

I am glad to have had this opportunity to consider the challenges, and solutions to those challenges, faced by women and minorities in the mathematical sciences. I’d like to thank SAMSI for hosting this event and allowing us to gather and reflect on both the progress that has been made, and the issues that remain. It is only through this type of directed intention that we may continue to move towards equality.

Statistical Methods and Analysis of Environmental Health Data

The following was written by SukhDev Mishra,Ph.D., Division of Bio-Statistics, National Institute of Occupational Health, Indian Council of Medical Research, Ahmedabad(India)

group shot

Statistical Methods and Analysis of Environmental Health Data Workshop group.

I was fortunate to attend the SAMSI workshop on Statistical Methods and Analysis of Environmental Health Data last week in Mumbai. It focused on various topics related to the statistical analysis of environmental health data, some of which discussed latest methodological development in this field, particularly during the first day’s opening lecture from Professor Joel Schwartz.

Time series data has proven to be critical in the assessment of systematic impact of environmental factors on human health. Professor Francesca Dominici, a researcher with significant contributions in this area was a very dynamic and enthusiastic co-leader for this workshop. She discussed in length the statistical principles and assumptions of multi-site time series analysis along with careful interpretation of such data. Due to technological advances and regular measurement availability, time series data could be accessed and easily analyzed with the techniques elaborated by Professor Dominici, which will be integral to the success of my future studies.

Working Group 5 - Gene x Environment Interactions

Working Group 5 – Gene x Environment Interactions

The Gene x Environment Analysis & Epigenetics lecture taken by Professor Bhramar Mukherjee provided very useful information on interaction/additive and multiplicative models citing practical applications in area of environmental health that she developed. Her very creative way of teaching, blended with great sense of humor, kept us engaged so much so that we wouldn’t blink for a second.

Spatial statistics is a critical part for environmental health data, so it was helpful to have the basics covered by Dr. Safraj Shahul Hameed and Dr. Brian Reich well. Professor Donna Spiegelman presented a wonderful talk on measurement error starting from statistical notations to complete logit function (being a statistician ….I always love this part J ). She put great effort explaining Regression calibration method for MS/EVS and algorithms. Interesting talk!

Working groups were engaged in different exercises that included working on different problems/real data sets generated through various participants and coming up with new analysis and interpretation of data. I worked on Exposure Modelling of Ambient and Household Air Pollution for Acute and Chronic Health Effects. I enjoyed working with my fellow WG colleagues- Kalpana Balakrishnan, Santu Ghosh, Donna Spiegelman, Kevin Lane, Joel Schwartz, Sourangsu Chowdhury , and Poonam Rathi. Fine scientific arguments during the process of analysis were the crux of our exercise; thanks to Joel, Kalpana, Donna and Kevin especially.

This is no way a comprehensive description of this workshop, just my thoughts. I would also like to record here that I learned from each and every speaker and fellow participant. It was a gathering of great scientific minds and very inquisitive researchers. My understanding is that one of SAMSI’s objectives is to foster a culture of collaborative research among Indo-US researcher in area of public health; and I could see that coming true as we collectively discussed ideas on how to continue our work in mutual scientific engagement. I hope these efforts result in great scientific endeavors in coming time for environmental health priorities.

People drinking tea during a break

Enjoying afternoon tea.

One of the unique features of this workshop was meticulous planning by the team of organizers, be it scientific contents or overall execution by Professor Richard Smith, Professor Sujit Ghosh, Professor Francesca Dominici, and Ms. Krista Coleman whose scientific management and interaction with participants was very encouraging.

My working experience mainly includes working in pharmaceutical industry earlier, as biostatistician, and I consider myself a beginner in environmental health. This workshop has helped me to gain more scientific perspectives in this area by leaps and bounds.

This kind of knowledge sharing exercises may prove very helpful for researchers in the area of statistics and epidemiology to address India’s most pressing public health needs. Thank you SAMSI, Harvard, ISI-Kolkata and all of the other participating organizations for such a wonderful experience!



My Experience at the Undergraduate Workshop Focusing on Forensics

The following was written by Briahnna Austin, and undergraduate student from University of California Riverside.

Briahnna Austin

Briahnna Austin

Statistics is the interchange and communication of everyday information.

This past February of 2016, I was fortunate enough to attend my first SAMSI workshop. The topic was forensic science and I was completely overjoyed and anxious, not only for the material I was going to engage in, but also excited for the interesting people I was going to interact and converse with. Coming from an undergraduate biology background, and aspiring to go into graduate level biostatistics, I have a particular fondness for interdisciplinary fields. This interdisciplinary material I was able to find during SAMSI’s Forensic Science Workshop; the purpose of this workshop was to give insight about how statistics, mathematics, data, and scientific principles amalgamate to form what we call forensic science.

Upon my arrival I was able to meet a professor from Duke at the airport; this was one of the most amazing coincidences since SAMSI has ties with Duke; I took it as a sign the workshop has something important in store for me, which it did. On the first day of the workshop, I was able to learn about comparative bullet analysis, retail sampling, and latent fingerprinting. The speakers highlighted the importance of decision-making and techniques choices. In forensic science, there is a large toolkit of information to pull from, and this toolkit gets larger as technology grows so it is our job as the statistician, investigator, or forensic scientist to make responsible and informed selections. During the first day, I was also able to see a forensics science lab; this is where movies and TV shows portray a lot of action going on, but it is different in the real world. Going to the forensic lab, gave a great opportunity to clear up assumptions and see what the real “CSI” does on a daily basis. The director of the crime lab showed my group around the facilities, and I kept hoping to see something scary or something crazy pop out of the wall, but no luck.

two lab workers

Lab workers at the Wake County Crime Lab.

During the next day of the workshop, I was able to learn about the uniqueness fallacy, statistical reliability, contextual/confirmation bias as well as a Bayesian model for fingerprint statistics. This gave insight into how important reproducibility of work as well as professionalism comes into play. In this field of work, it is essential to keep out biases and ensuring statistical reliability can assist with the types of bias we went over. The take away from both days was the idea of accountability of your work and passion for the field. Every speaker enjoyed his or her line of work. Their commitment to the field was inspiring, and shows first hand how forensic science is a collaborative effort, and when working open dialogue and communication is key to success.

Students listening to a lecture.

Students listening to a lecture.

The last large take away I acquired from this workshop was regarding networking. One of my most vivid memories during the SAMSI workshop, beside the awesome food, was communicating with the post-doc student, and undergraduate students. At the end of the first day I was able to talk to post-doc students, which help steer me in the right direction for my educational future. I am glad SAMSI provided the time to network with post-doc students; they were very friendly and funny. Not only did I network with the post-doc students, but the students attending the workshop as well. The SAMSI workshop gave me the opportunity to make new friends. Moving forward in education and career aspiration, I will be calling upon others for different aspects in STEM. Looking around the conference room and realizing these students will be the next set of forensic scientists, investigators, statisticians, and researchers, it is important we are able to network with one another. I would definitely recommend this workshop to other students and I encourage student to seek out other SAMSI opportunities as well. Lastly, do not forget to take many pictures; looking back, I realized how scenic Durham is and wish I had more pictures.

Learning about the challenges of computational neuroscience

The following was written by Thomas Witelski, Associate Director at SAMSI and Professor at Duke University in the Mathematics Department.

At some level, everyone is aware of the pressing medical and societal
challenges of neuroscience from media coverage of the growing impact of
neurological diseases like Alzheimer’s and Parkinson’s. Understanding the
brain at a scientific level has been identified as one of the central
challenges for this century’s research, as reflected in the magnitude of
resources invested in the NIH’s BRAIN initiative and the European Union’s
Human Brain Project.

attendees sitting in the auditorium

The opening workshop for CCNS was held at the NC Biotech Center.

In August, a diverse community of researchers converged at the NC
Biotechnology Center for SAMSI’s opening workshop for the Challenges in
Computational Neuroscience (CCNS) program. The presentations by leading
researchers on clinical, cognitive, computational and theoretical aspects of
brain research yielded many very lively discussions. Some talks addressed
technical issues, but many pointed to big fundamental questions on
exploring what might be nature’s most intricate black box.

Martin Lindquist speaking at the podium

Martin Lindquist, Johns Hopkins, speaking at the opening workshop.

A long history of anatomical studies has established the general features
comprising the human brain, but great challenges lie ahead in making clear
how the structure and functions of the brain relate to each other. Many of
the talks in the CCNS workshop addressed methods in neuroimaging. Martin
Lindquist (Johns Hopkins Univ) gave a lecture over viewing the various
modalities for functional imaging of brains in vivo, including functional
magnetic resonance imaging (fMRI), positron emission tomography (PET), and
electro/magneto-encephalography (EEG/MEG). These techniques differ in the
technologies used to collect data, but more importantly, they fundamentally
differ in the physiological types of behavior they monitor — in terms of
either blood flow, metabolic activity or electrical activity in the brain.
The methods have different limitations and trades-off in terms of spatial
and temporal resolutions, and represent the current state-of-the art in
clinical methods of collecting neuroimaging data.

Several talks in the meeting addressed fundamental statistical and
mathematical questions on image processing and how to use collected data
(possibly coming from multiple scans) to obtain the most accurate possible
maps of the brain’s structure. Of particular interest is the use of
neuroimaging data to infer the networks of connections among parts of the
brain, called the field of connectomics. In this direction, Max Descoteaux
(Univ of Sherbrooke) showed how diffusion in MRI images could be used to
identify structural connections within the white matter of the brain.

Another major branch of neuroscience research explored in the workshop is
based on “bottom-up” modeling of time series of neural activity in networks
of connected neurons. Physiologically-based models of chemical/electrical
activity like the Hodgkin-Huxley equations can effectively reproduce the
dynamics observed in individual neurons. Equivalent reduced models, like
the “leaky-integrate-and-fire” neuron, can then be used to give statistical
descriptions for the patterns of spikes typically recorded in EEG data.
Workshop presentations in this area included talks by Robert Kass (Carnegie
Mellon), Kenneth Miller (Columbia) and Uri Eden (Boston Univ).

Returning to studies at the “whole-brain” level, many speakers touched on
the computational challenges involved in analyzing the huge datasets that
have been collected in connection with some clinical studies. The importance
of using mathematical and statistical methods to interpret clinical
neuroscience was also highlighted in talks on neurodevelopment by Raquel Gur
(Univ Pennsylvania), behavioral studies by Ruben Gur (Univ Pennsylvania)
and the influence of anesthesia on brain activity by Emery Brown (Harvard).

two people looking at the poster

Looking at a poster during the CCNS Opening Workshop.

Many of the advanced topics addressed in the workshop were also introduced
in a Neuroscience Summer School that was held in connection with the CCNS
program in July. The research focuses begun in the workshop are being
carried forward in several working groups, two graduate courses and further

5 Key Takeaways from the Innovations Lab

The following was written by Ellen Eischen, Assistant Professor, Department of Mathematics at the University of Oregon.

A doctor, a mathematician, and a statistician walk into… No, this isn’t the beginning of a joke.  It’s the beginning of the formation of a research team at SAMSI’s five-day Innovations Lab on Interdisciplinary Approaches to Biomedical Data Science Challenges, in which I was fortunate to participate in mid-July at the North Carolina Biotechnology Center.  This innovative NSF-funded pilot program – which brought together 35 experts from mathematics, statistics, computer science, biology, and medicine – facilitated new collaborations in precision medicine.

Participants posted questions of interest.

Precision medicine, which concerns the use of a person’s individual characteristics (e.g. genetic profile, lifestyle, environment) to diagnose, prevent, and treat diseases, is already changing the way medicine is practiced and seems likely to revolutionize treatments, at least for certain classes of diseases.  For example, certain cancer treatments are effective only in patients whose tumors have a particular genetic profile.  Reflecting the urgency for developments in precision medicine, the White House recently announced a major precision medicine initiative.   Projects that began at the SAMSI workshop – which covered a wide range of topics, including pain intervention, psychiatry, mobile technology for public health, and geriatric care – may very well lead toward solutions to medical problems that affect all of us.

Advances in precision medicine rely on analysis of large, complex datasets (commonly referred to as “Big Data”).   Biomedical data tends to be highly heterogeneous – an issue my teams at the workshop repeatedly faced – and thus particularly challenging to handle.  We also repeatedly returned to issues of data quality and accessibility, which in turn led to refining research problems so that they would actually be feasible in the context of currently available datasets.

Structure of the workshop (or How twelve new research groups successfully got off the ground and running in four and a half days)

Due to the unconventional and highly collaborative nature of the workshop, the format differed from a typical workshop.  In collaboration with a team of mentors (experienced scientists who provided feedback), workshop organizers (additional experienced scientists), and officers from the NSF and NIH (who provided crucial guidance concerning funding opportunities), the workshop was partly led and structured by Knowinnovation, a team with vast experience facilitating creative collaborations on interdisciplinary problems.  With Knowinnovation’s website stating “we like to collect people in a room and surprise them with their own ingenuity,” I was skeptical as to whether they would actually have a significant effect in such a short period.  Immediately, though, it was apparent that they were skilled at getting us to interact, share ideas, and turn them into compelling proposals.

The workshop began with an activity designed to get participants to immediately engage with each other and share their expertise and interests.

For most of the first two days, participants met in many different groups, typically for ten to sixty minutes to design a problem, identify the key challenges, and also identify expertise and data needed to address the relevant challenges.  Each session was followed by a brief presentation to the entire group of participants and mentors, and ideas were recorded and posted for all to view throughout the workshop.  At the end of each day, the facilitators from Knowinnovation photographed and posted all these notes to a private online forum, where they also posted photographs and slides from all our presentations, links to databases, and useful articles about collaboration.

Recording sources of important data
Nirmish Shah describes his team’s plans for an app.

Midway through the week, participants formed teams – typically consisting of three to five members –  with whom they would remain for the duration of the workshop.  As we quickly learned from experience while working in groups early in the week, it was usually essential to have a doctor (or other person with medical expertise), someone from mathematics or computer science, and someone with experience in biostatistics in order to make progress.  As an example of the diversity of expertise on a typical team, I will note that my own team consists of a psychiatrist, biostatistician originally trained in mathematics, professor of information sciences specializing in visualization methods and with a background in computer science, computer and electrical engineer, professor of bioinformatics and radiology, and mathematician (me).

Each team spent much of the last two days of the workshop intensely working to refine their ideas in preparation for continued collaboration and for a grant proposal on “Quantitative Approaches to Biomedical Big Data,” due to the NSF’s Division of Mathematical Sciences just two weeks after the workshop ended.  This process was partly aided by helpful feedback.  To start, we spent an afternoon providing feedback to each other’s groups, through a four-stage feedback process consisting of “Pluses, Potentials, Concerns, and Overcoming Concerns” (“PPCO”), which encouraged participants to formulate constructive feedback.  Each group also met with groups of mentors on the last two days to get several rounds of expert feedback, both on the feasibility of the project and on aspects that might need extra work or focus in order to be part of a compelling NSF proposal.

Arianna Di Florio giving a soapbox talk that would soon lead to a new collaboration

Since new ideas about a problem often suddenly arise during the course of thinking about a different problem, there was also time each day for “soapbox talks.”  Anyone could sign up to give a one-minute slide-free talk on a half-baked idea.  Some of the research groups developed in response to ideas proposed in these brief talks.  In fact, the group with whom I ultimately ended up working (and applying for a collaborative NSF grant) initially consisted of several of us with vastly different backgrounds who came together to discuss how to tackle a problem proposed in one of the first soapbox talks.


While most of the week was spent discussing research problems in small groups, the workshop featured several talks (some remote) by experts on precision medicine and data science:

  • Joe Gray, Gordon Moore Endowed Chair in the Biomedical Engineering department and a member of the Knight Cancer Institute at Oregon Health Sciences University, described the use of data science and genomics for cancer treatment.
  • Susan Murphy, a MacArthur fellow and H.E. Robbins Distinguished University Professor of Statistics and Professor of Psychiatry at the University of Michigan, discussed an app her team has designed for personalized health interventions.
  • Bill Noble, a Professor of Genome Sciences and Computer and Electrical Engineering at the University of Washington, spoke about modeling the 4D nucleome (3-dimensional modeling that also accounts for time).
  • DJ Patil, Chief Data Scientist of the United States, gave an engaging talk in which he encouraged the participants to be innovative, emphasizing that “Clever beats smart nine times out of ten” for the sorts of problems we were considering. Patil also highlighted the current administration’s commitment to promoting advances in precision medicine.

Key takeaways

This was one of the most engaging and worthwhile workshops I have attended.  To those who are unfamiliar with this sort of workshop, though, it might sound implausible that thirty-five experts on disparate areas, who had never met before, have since come together to begin functional collaborations and submit compelling grant proposals in such a short period of time.  Since this format of workshop has the potential to lead innovative collaborations in other fields as well, I will conclude by sharing my thoughts on which aspects made the workshop at SAMSI particularly successful and enjoyable:

1. Carefully select participants not just for expertise but also for skills crucial to collaboration.

Richard Smith, the Director of SAMSI, told participants on the first day that the highly selective acceptance procedure (with only 35 accepted participants out of a pool of more than 350 applicants) involved selecting applicants who not only were accomplished in a particular discipline but also were excellent communicators.  In addition to asking about applicants’ professional background, the six-question program application required applicants to discuss their approach to working on teams and their ability to engage and work with non-experts or those with a different perspective.  This led to a highly functional working environment.

2. Involve a team of experienced facilitators.

From accelerating the process of engaging with other participants to approaching feedback effectively (see above), Knowinnovation lived up to its claim of helping “smart people have interesting conversations about complex questions, which leads to novel ideas and innovative research.”  Each person I asked at the end of the workshop said that they found the facilitators from Knowinnovation to have been particularly helpful.

3. Let the structure and schedule be partially participant-driven.

While we received a schedule at the beginning of the workshop, Knowinnovation warned us that it would likely change several times in response to participants’ progress.  This allowed us to work more productively and creatively than a rigid, traditional schedule would.  Also, during the second half of the week, there were large blocks of time open for groups to collaborate without interruption.

4. Have an abundance of mentors readily available.

While progress was participant-driven, mentors provided crucial feedback.  They asked challenging questions that sometimes helped narrow a group’s focus or send it in a more promising direction.

5. Encourage teams to quickly determine and seek whichever expertise they need.

Otherwise, teams get stuck discussing hypotheticals.  Far from the beginning of a joke, it quickly became apparent that the roles “doctor, mathematician, and statistician” from the beginning of this blog post each played an essential role in advancing most of the productive discussions and collaborations from the workshop.

Learning about Neural Spike Train Analysis and More at the SAMSI CCNS Summer School

This week’s blog entry is written by SAMSI’s Kenan Fellow, Alexandra Solender who is the Science Department Chair at Holly Springs High School and teaches AP Physics C, Honors Physics. She is also the Science Olympiad Coach and SNHS Advisor.

Alexandra Solender

Alexandra Solender

During the week of July 27th, SAMSI hosted a Computational Neuroscience Summer School in which I was lucky enough to participate.  The workshop was aimed at researchers in areas such as neuroscience, computer science, applied mathematics, and biomedical engineering as it covered research topics like pattern theory, signal processing, and functional and structural imagining.  This opportunity was extremely exciting for me as I am a high school physics teacher not a math or science researcher.  I have been working with Dr. Tom Witelski at SAMSI since the beginning of June through the Kenan Fellows Program.  The Kenan Fellows Program selects K-12 teachers in North Carolina to work with university or industry experts during the summer to gain real world experience to bring back to their respective classrooms.  As a Kenan Fellow, I worked with Dr. Witelski on bringing data analysis into my classroom.  Because Dr. Witelski was in charge of organizing the Computational Neuroscience workshop I was given the opportunity to sit in on some of the presentations that would be accessible to me given my background and they were all quite interesting!

Tom Witelski and two others at a table

Tom Witelski at the CCNS Summer School.

The first day focused on Neural Spike Train Analysis and Compressed Sensing.  These topics were quite math heavy but the applications were exciting and easier to follow than I was expecting.  The Neural Spike Train Analysis focused on statistical models that capture the structure of the signals in the brain and allow for the prediction of timing of the spikes.  Dr. Uri Eden of Boston University walked us through the processes used to mathematically model physical systems.  Subtopics included poisson distribution (useful models, but don’t fit data well) and Bernoulli Process (a sequence of binary random variables).  The Compressed Sensing talk was led by Justin Romberg of the Georgia Institute of Technology.  His talk focused on signals and images with applications such as radar, MRI scans, sonar, and video cameras.  They all combine pieces/signals to produce a larger picture following an abstract formula of the form y = A x where y is a set of observed data, A is a linear equation system, and x is an unknown.  The compressed sensing was much more complicated but the application was impressive.

Person getting ready to talk

Justin Romberg, Georgia Tech, getting ready to speak at the CCNS Summer School.

The second day of Neural Spike Train Analysis was the talk that related most closely to physics.  Dr. Mark Kramer picked up where his colleague at BU left off.  He talked about biophysical models that simply compared a neural membrane to a simple circuit.  He worked through Ohm’s Law by manipulating the variables based upon what information was most important to the neural spiking.  Dr. Kramer adapted the model for different elements moving through the membrane, making the model more complicated mathematically, but with the intention of better modeling the spikes.  I personally enjoyed this talk the most as it was a topic I had background knowledge in and it was an easy concept to connect biology and physics together.

One of my main goals as an educator is to provide bridges between the sciences so that students can better understand how everything works together.  They so often believe that biology is just biology and physics is just physics when in reality all of the sciences are interrelated.  The partnership between Kenan Fellows and SAMSI has given me the opportunity to see high level applications of my personal educational goals.  It was wonderful to see physics in a new way and get to interact with researchers in the many scientific fields that were represented.

Takeaways from the Bayesian Nonparametrics Workshop

The first entry is from Chetkar Jha, PhD Student at Missouri University.

Group shot at SAMSI's Bayesian Nonparametrics Workshop

Attendees at the SAMSI Bayesian Nonparametrics workshop.

A couple of weeks back, I attended a workshop on “Bayesian Nonparametrics” organized at Statistical and Applied Mathematical Sciences Institute (SAMSI) .

It was a 4-day-long workshop on Nonparametric Bayesian. The goal of the workshop was to brainstorm on some of the pressing problems related to Nonparametric Bayesian and discuss possible solutions as a group. Let me describe the format of the workshop to give you some flavor. Each day was divided in two halves: morning session and afternoon session. In the morning session there were presentations on Nonparametric Bayesian and that would lead to brainstorming sessions on related problems in the afternoon session. Since, we’re working in smaller groups that gave us a lot of latitude to discuss the topics closely and ask a lot of questions and clarifications. I, for one, really enjoyed talks and discussions on convergence/contraction, variational inference, MCMC methods and scalable models. Being a graduate student, there was a lot of new content for me and it was harder to assimilate but the workshop gave me exposure to lot of new content and some topical problems.

Person talking at the Bayesian Nonparametric workshop

Interesting lectures were presented.

The workshop was attended by some of the leading researchers in the field. It was sort of a ‘fanboy’ moment for me, as I was only aware of their names and their work. The workshop provided a perfect opportunity to meet ’real’ people behind the names. Also, I loved the energy and the passion that the group shared for Non Parametric Bayesian that was really motivating and hopefully, some of it did get rubbed on me.

Also, I would like to take this opportunity to thank the organizers and people at SAMSI, who did a wonderful job in organizing the entire event. Hopefully, we can have more such workshops in the future.

The second entry is from Dootika Vats, PhD Student in the School of Statistics at the University of Minnesota

My build up to the 4th of July weekend turned out to be a rather educational experience. I was fortunate enough to attend SAMSI’s workshop on “Bayesian Nonparametrics: Synergies between Statistics, Probability and Mathematics” from June 29th to July 2nd. This was my first visit to SAMSI and to the Research Triangle area. The first thing that stands out about the area is how green it is! Calming stretches of green fields and trees, make for an ideal research environment.

Driveway with grass and trees.

Driveway to SAMSI’s building in RTP.

The 4-day workshop followed the 10th Conference on Bayesian Nonparametrics held in Raleigh from June 22-26. Many participants of the workshop had attended both events, which made the workshop a great platform to discuss key points and ideas that came out of the conference.

The workshop was attended by professors, postdocs and graduate students from all over the world. We were a small group of people that came with varied research focuses to contribute to/learn about Bayesian nonparametrics. The days were packed into discussion style seminars in the morning, followed by a delicious lunch spread, and breakout groups in the afternoon. Each day had a somewhat broad, yet specific focus of interests like multi-resolution methods, high dimensional analysis, scalability and optimization, and theoretical developments.

Food at the SAMSI workshop

The food at the workshop was splendid!

The breakout groups really made this workshop different from other conferences and programs I had attended before. Each group was led by an expert in the field, and the audience could choose any group that appealed to them. Most groups ended up with 5-10 people at most. This made for an extremely educational experience for a graduate student such as myself. We got an insight into how experts in the field approach a problem and attempt to come up with plausible solution paths. Just observing these world-class researchers openly think about a problem and having the opportunity to ask trivial questions was worth the trip!

Apart from reading an introductory paper, I was not very familiar with Bayesian nonparametrics. My research is on Markov chain Monte Carlo(MCMC) algorithms so, of course, there were times when I did not quite understand the questions put forth in discussions or the even the problem at hand. However, since there were so many young researches, post-docs and new faculty, it made it easier to ask “stupid” questions. The workshop also held a poster session for young researchers to talk about their own research. I was able to present my work on MCMC output analysis and discuss ideas and improvements over delicious food and drinks.

SAMSI Bayesian Nonparametric poster session

People talking at the poster session.

Overall, I think SAMSI put forth a wonderfully organized workshop. I came back with a better understanding of Bayesian nonparametrics and with feedback and ideas for my own research. The logistics of the workshop were also well managed with frequent communications from the staff about the schedules. And, of course, the almost endless supply of coffee was deeply appreciated! I will definitely keep a lookout for more SAMSI events and encourage other graduate students to apply for such workshops and conferences.

Understanding Droughts – Part of the Undergraduate Modeling Workshop May 17-22, 2015

The following was written by Gabriel Ruiz, attendee from the University of California, Riverside.

attendees sitting listening to lecture

All of the attendees and some of the speakers on Day 1 of the workshop


The workshop attendees hard at work.

Just a few weeks ago in May, I was fortunate to be among the 26 undergraduates to attend one of many undergraduate workshops offered at the Statistical and Applied Mathematical Sciences Institute (SAMSI). This was a 5-day-long workshop on mathematical and statistical modeling. The backgrounds of students in attendance ranged from mathematics and statistics, to chemical or aerospace engineering and other fields from universities all across the country. There was also current researchers from SAMSI and other universities in attendance who gave talks on very interesting topics and who led the workshop sessions. Among my favorite parts of this workshop were talks in Bayesian Statistics, Discriminant Analysis, meeting some established researchers, getting to know my peers in mathematics and statistics, the great food we had, and, of course, having the opportunity to visit SAMSI in such a beautiful section of the country.

First Impressions: Raleigh, SAMSI, and NC State

students walking past sign

Attendees as they arrive at SAMSI to kick start the workshop.

cement pathway with trees

The scenic path attendees took to explore NC State and the surrounding area on the first day.

My very first impressions of Raleigh and its surrounding area was how green and pretty everything was. Coming from California, and considering the current drought we are experiencing, this was quite a sight. It was such a relaxing feel.

Students in front of the James B. Hunt Library

Workshop attendees visiting the famous Hunt Library at NC State.

Later on, it was fun meeting with all of the other undergraduate attendees at North Carolina State University, where we all stayed for the next 5 days. In the evening, after some great food, we took a walk around campus and even visited the renowned James B. Hunt Jr. Library. The NC State campus is so beautiful and big! Because of this, we got a little lost but that ended up being a good thing because we were able to see some more of the surrounding area in Raleigh.

The next day, we went to SAMSI on the other side of town for the introduction to what we would be doing throughout the week. We heard from some speakers on interesting topics, and ate some more delicious food. It was nice to get a sense of all the great work that goes on there.

Building on the NC State campus

A scenic example of Raleigh and NC State beauty.

The rest of the workshop was held in SAS Hall at NC State—named after the statistical software company when it was donated by former statistics faculty and founders of SAS Institute Inc.  This building is home to the Mathematics and Statistics departments and was just a light walk from where we were staying. The place we stayed at, I should add, contained a volleyball court that held several competitive games of volleyball among the attendees. This was a fun break after a day of math and statistics.

3 postdocs

Kimberly Kaufeld, Daniel Taylor-Rodriguez and Jyotishka Datta, all postdocs at SAMSI, working together.

There was plenty of informative talks given by researchers from various universities. Among some of the notable talks were given by:

Paul Brooks from Virginia Commonwealth University on “What Causes Shifts in the Human Microbiome.” This talk focused on the Community State Types (CST) of the vaginal microbiome to identify the microbiome profiles that are associated with a high risk of certain diseases as well as devising better predictions for changes in CSTs over time. Students at the workshop were able to work on a subset of this interesting project throughout the rest of week.

Daniel Taylor Rodriguez, a SAMSI postdoc, spoke about his approach to parameter estimation and variable selection of site-occupancy models that use presence-absence data. He presented an occupancy model with probit links and demonstrated his work on deriving more objective parameter priors as opposed to using AIC methods or other Bayesian approaches that require substantially more prior knowledge than is usually available.

Leah Jenkins of Clemson University gave a great talk titled “The Strawberries of Wrath: Farming Under the Realities of Drought”, in which she spoke about the current drought crisis in California—where 80% of the fruits and vegetables consumed in the US come from. The main focus of her talk was describing her and other mathematicians’ role in creating the “virtual farmer” software tool and the team’s use of mathematical modeling and optimization to help farmers in Pajaro Valley, CA remain profitable through current water restrictions. This challenging project was the primary motivation for the second project students were able to work on during this workshop.

Two other SAMSI post-doctoral researchers, Kimberly Kaufield and Yize Zhao, also had hands-on workshops in R, a statistical software, which were very informative to those of us who had limited experience with R. Jyotishka Datta, another postdoc at SAMSI, had a session in which he went over introductory statistical and probabilistic concepts in regression and classification in addition to high-dimensional applications and their implementations in R. A fifth postdoc, Christopher Strickland, went over some very useful approaches to the modeling and data analysis of dynamical systems in Python, as an alternative or complement to R and Matlab.

Among other notable talks were those by NC State PhD student, Neal Grantham, and SAS Institute Data Scientist, Yue Qi. Neal Grantham’s talk focused on the alternative approach to identifying the origin and history of a dust sample through the pollen found in it; the approach uses discriminant analysis and DNA sequencing to identify samples to within a short distance with a measurable degree of certainty as a compliment to a pollen expert’s more subjective identification. Yue Qi’s talk was about the tools he is helping to develop at SAS to more easily analyze “Big Data”, and more specifically he focused on the use of these tools in Machine Learning approaches to fight banking and insurance fraud.

These talks were all of the high quality you would expect at SAMSI, yet were accessible for all of us as undergraduates. After listening to all of these, I hope to learn some more about the research techniques that were discussed and maybe even contribute to the areas in which they have applied these techniques, such as the California drought. It was nice to get a feel about just how broad statistics and mathematics are.

The Workshop: working with a predator-prey dynamical system dataset

For the actual workshop aspect, we were split into groups of 5 that each worked on one of two very interesting topics. The first topic dealt with modeling a predator-prey dynamical system that was meant to be a simplified representation of the more complicated drought situation currently affecting California farms which account for a large portion of US vegetable and fruit supply. The second topic had to do with performing discriminant analysis to differentiate between microbiome states that are defined by the various levels of vaginal microorganisms thought to be higher or lower risk factors for certain diseases as compared to other microbiome states.

Group with mentors

One of the workshop groups alongside their mentors for the week, Daniel Rodriguez (first on the left) and Kimberly Kaufield (furthest on the right)

The dataset I worked with was the predator-prey dataset. We were tasked with first analyzing the time series data we were given on the abundance of three variables: water, plants, and beetles. The key here was to use some sort of time series techniques to model each variable against time. After we were able to find good models for each variable, we could plot the fitted lines of all three to see how they varied over time. The first observation we had was that the densities of each varied over time according to a sine and cosine pattern, so naturally we used a time series model with these properties. The fitted lines further demonstrated that plants had a spike (or dip) in their density whenever there was a spike (or dip) in the water supply. Of course, we know plants depend on water but it was nice to see this graphically over time. There was a very high correlation between these two variables, which helped quantify how strong the relationship was. This relationship is the key characteristic of a dynamical system. Because we had the “noisier” dataset, the same dependency of beetles on plants was not as observable, although it was present.

The next part of the workshop was to develop a system of differentiable equations that brought together all of these relationships. We used the Lotka-Volterra equations, which are also known as the predator-prey equations. The key here was that the parameters and variables needed some tweaking through ODE packages in R, further simulation, and our own intuition in order to best describe the system. This was interesting considering we had three variables to work with: natural resource, a prey, and a predator. The transition from the statistical aspect of this to mathematical modeling was the trickiest part, to say the least, since our group had no real experience with differential equations, much less bridging math and statistics in this way. Luckily, the postgraduates, Drs. Kaufield and Rodriguez, running this workshop walked us through the process and taught us about these equations.

two workshop members giving a talk

Two workshop attendees presenting their findings on the predator-prey dynamical system.

While I am still not completely comfortable with this last aspect, it was important to see the union of statistics and math modeling as a person who is mostly accustomed to the data analysis side. I have already started to look into creating a better system of differential equations this summer. And because I gained curiosity in this type of modeling after the workshop, I am also signed up for some extra math classes on ordinary and partial differential equations for next year and might even take some coursework in dynamical systems somewhere down the line.

Final thoughts: My key takeaways

Coming from California, it was interesting to see just how complicated these dynamic systems involving the seasonality of rain can be. It is important to note that our dynamical system was much more simplified, although still difficult to model with three variables, than the current drought in California. I can only imagine how many variables the analysts involved with this have to deal with, including legislation, people refusing to let their lawns go dry, and the system of aqueducts that go under farmer land which make modeling water levels quite challenging. Although difficult, there are plenty of mathematicians involved in the effort to conserve water in the most efficient way possible, including Clemson University’s Dr. Leah Jenkins who gave a great talk on the topic. I am curious enough from living in a section of California affected by this drought and by attending this workshop to continue to stay in the loop about what mathematicians will continue to do.

Having been in the process of finishing up my second year at the University of California, Riverside studying statistics, this opportunity was an invaluable and eye-opening experience. While I have not been in the world of Mathematics and Statistics for a long time, this workshop sparked curiosity in me about topics I had not yet been acquainted with but would now like to learn more about. For example, this summer, I will almost surely look into developing a better set of differential equations for the predator-prey dataset we were given during the workshop. I would also like to look into the other dataset to learn more about discriminant analysis. I have also come to realize that computational skills are very important. Among my programming to-do list this summer are Julia, Python, and some more R.

Besides the new statistical and mathematical techniques that we learned, I feel the main theme that I have taken away from this workshop is that statistics, math, and computing can all be brought together for meaningful applications in ecology and human health. Moreover, it is refreshing to have experienced first-hand that statistics and math are more than just numbers and equations in a textbook like I had become accustomed to in some of my coursework so far.

It was great to be around a great undergraduate cohort of statisticians and mathematicians who are all at the same point in their careers in this type of environment doing what we love most. The perspective I gained from my peers here, who are all from different universities across the country, about classes to take and interesting research topics is invaluable. To have met some established applied statisticians and mathematicians and listened to their research talks was inspiring. I hope to one day achieve that same level of expertise and fun they are having.

If you are an undergraduate student considering to apply to one of these workshops at SAMSI, I highly recommend that you apply and attend! You won’t regret it!

portrait of Gabriel Ruiz

Gabriel Ruiz.

Why you should attend the SAMSI Forensics 2015-2016 opening workshop

The following was written by Dr. Clifford Spiegelman, Distinguished Professor of Statistics at Texas A&M and one of the program leaders for the 2015-2016 SAMSI Program on Statistics and Applied Mathematics of Forensic Science.

Cliff Spiegelman

Dr. Clifford Spiegelman

Imagine having a nightmare where nearly all evidence presented in courts was seriously misrepresented. No, not a nightmare about someone accused of being a witch, but a more current trial. Say the defendant is accused of rape or murder and all the scientific evidence presented was seriously misrepresented and biased toward the prosecution. It would not be a pleasant dream, but it is today’s reality, and that is worse than a nightmare as it is real. Within the last months the FBI has admitted to over representing the importance of hair matches for decades. Prior to that in 2007 CBLA or comparative bullet lead analysis was another procedure used for decades where the FBI admitted to overstating the importance of a match.

Forensic science is inherently a field that uses data (patterns, pictures, etc.) to link suspects to crimes. Unfortunately, the use of formal statistical methods or even statistical or mathematical thinking is uncommon.

That is where you can help. There is a dearth of persons, as in way to few mathematical scientists, that are aware of the issues.

What are the issues?

Well one can read the summary of the 2009 NRC report “Strengthening Forensic Science in the United States: A Path Forward” to get a good overall view. Here are some of my recent consults: A defendant was charged with indecent contact with a minor. The minor had chlamydia but the defendant did not and was not treated for chlamydia. What is the probability? In another case a convict has been in jail for 40 years largely based upon hair and fiber evidence. The hair evidence was inconclusive. That is the crime lab hair examiner testified that there were both similarities and dissimilarities between the pubic hairs found at the scene and on the defendant. Subsequently some inconclusive results (not the case in question as the evidence has gone missing) have been investigated using DNA. What are the odds that an inconclusive microscopic hair analysis has a DNA analysis that excludes the defendant? It is more than ½.

The opening workshop will look at various forms of traditional pattern evidence. These include fingerprints, firearm/toolmarks, shoeprints etc.. Help become part of the birth of taking forensic science from oxymoron state to a real science.

The opening workshop program can be found here. Read more about the overall program here, and if you want to learn more about forensics before the opening workshop, consider attending a special tutorial a few days before the big event begins.

Please join us. You can make a difference to the legal system and make our country a more just place.

Measuring the Success of a SAMSI Program – My Experience at the Beyond BIoinformatics Transition Workshop

The following was written by Katerina Kechris, Associate Professor and Graduate Program Director, University of Colorado – Denver. School of Public Health.

Katerina Kechris

Katerina Kechris

In mid-May 2015, working groups from the Beyond Bioinformatics Program gathered during the Bioinformatics Transition Workshop. This was a culmination of eight months of progress for over 10 working groups. The workshop topics were diverse and covered a variety of topics including epigenetics, microbial communities, evolutionary models, imaging genetics, next generation sequencing errors, high-dimensional discrete data, multiple hypothesis testing and data integration. The diversity of these topics reflects the current state of research in the biomedical sciences where technologies are advancing the study of biological mechanisms, structures, populations and disease. These technologies are generating high-dimensional and complex data structures providing intriguing opportunities for statisticians, mathematicians and computer scientists to develop new models, methods and algorithms to answer important biological questions.

Group photo outside

The Beyond Bioinformatics Transition Workshop attendees.

As a leader for one of the two Data Integration working groups, I was excited to hear about the activities from the other working groups during the workshop. I found their progress impressive, considering that many of the group members did not know each other until the Opening Workshop just eight months earlier. The transition workshop gave me the opportunity to reflect: How does one measure success of a program year and a working group? There are the usual metrics of publications, conference presentations and grant proposals that will be documented in great detail for reports. But at the workshop I could see more qualitative and interpersonal measures of successes. First, new collaborations were developed among researchers who would otherwise not have had the opportunity to meet and work together.

Personally, I enjoyed getting to know and working as a team with the other Data Integration working group leaders and members. Second, I was pleased to see great attendance and presentations at the workshop by students and post-docs. I know in several cases that the working group facilitated thesis and post-doctoral research projects for these junior investigators. Finally, I observed that there are ongoing plans to continue the working group efforts beyond the formal program year, which speaks to the positive aspects of the program. As for our working groups, it was such a pleasure to make new colleagues and see the evolution of how we approached the problem of data integration with very different perspectives and methods. I look forward to learning about the continuing progress of all groups.

Classroom shot of people listening to lecture

Listening to a working group make its report.