The following is an extract from the Blog of Alex Hayes. To view the entire piece, visit his blog.
I spent the last week at Statistical and Mathematical Sciences Institute’s (SAMSI) undergraduate modeling workshop. This year the workshop was hosted at North Carolina State University (NCSU) in Raleigh, NC.
About thirty students attended the workshop. To get in there’s a mellow application process. SAMSI covered travel, rooming and food for the participants. We were expected to bring laptops with R and RStudio installed. The purpose of the workshop was to give undergrads experience modelling real world data. Each year the workshop has a different theme, in our case statistical analysis of climate phenomena.
Before the workshop, we choose from a list of six projects for the week. On Sunday night, we flew in for a welcome dinner and met the other students on our project team. Each group had a SAMSI postdoc as group leader.
On Monday Doug Nychka and Chris Jones gave us a broad overview of the statistical issues present in climate science. We spent the afternoon doing some team building activities, discussing our interests, what skills we brought to our respective groups and developing research questions.
We spent the next three days working on our projects. We probably spent six hours a day modeling, and an hour or so at a research presentation or R workshop, and an hour goofing off and hanging out. The talks in particular were very good, presenting current research at the undergrad level in an engaging way.
In the evenings a small group would normally explore the bars in the NCSU area, which was nice after a long day on campus. The workshop concluded on Friday, when each group presented their findings before flying out in the afternoon.
“Personally, the workshop enabled me to make some valuable connections within the stats community.”
My group was led by Mikael Kuusela, who did a fantastic job helping my group find research questions. He gave us a ton of individual feedback and was very attentive and patient. I particularly appreciated his advice on choosing questions that scientists care about.
Personally, the workshop enabled me to make some valuable connections within the stats community. At the end of the workshop, Mikael asked me if I’d like to write up a short outreach piece based on my project with him, which I’m super excited about. Keep an eye out for an upcoming piece on a functional decomposition of ocean thermoclines during El Niño (feat a plot we’re calling The Bananafold).
Earlier this year Maggie Johnson, another SAMSI postdoc, put me in contact with some of the bioinformatics crew at Pacific Northwest National Laboratory and I nearly ended up taking a year off to work on omics projects with them.
I also had a blast getting to know Doug Nychka. Not only was Doug super patient with my many newbie questions about GAMs and splines, it was fun to chat with him about climbing and the UW-Madison statistics program.
Perspective on the Undergrad Stats Community…
As someone who’s spent a bunch of time organizing undergrad statistics activities over the last year, the workshop was an interesting opportunity to learn about the broader community of statistics undergraduates. Here are some of the notes I took.
We have fundamental misconceptions about the purpose of modeling: When groups presented their initial research questions, it was immediately clear that many students were conflating description, prediction and causation. Throughout the week, there were many attempts to turn everything into a prediction problem, or to interpret descriptive analyses as causal.
The pre-requisite stack is not very deep: Most students had taken a mathematical statistic course, but very few had much coursework beyond that. Less than half the workshop had background in linear regression, and people were much less comfortable with linear algebra than I would have expected. Barely anyone had probability or analysis background.
Programming skills are rate determining: We dramatically overestimate our R capabilities. In particular, non-tabular data really threw people off. My group took about three days to calculate mostly summary statistics and make basic plots.
Everybody’s resume looks the same: I’ll write more about this soon, but everybody advertises themselves in exactly the same way. This is despite having wildly varying skillsets. As a job seeker, how do you demonstrate that you are on the upper end of the competency spectrum? As a recruiter, how do you differentiate between candidates who look identical?
That’s a Wrap…
We learned things at a great workshop. Everyone should go if they get a chance. The statistics community should spend more time teaching beginners about the big picture: what statistics is and how we should use it.