Drexel PhD Candidate Gains Perspective on Big Data in Astronomy at International Workshop

Contributed by: Jackeline Moreno, Physics Ph.D. Candidate, Drexel University

Contributed by: Jackeline Moreno, Physics Ph.D. Candidate, Drexel University

I am a fourth year Graduate Student at Drexel University. My research area is optical AGN variability and accretion physics.  However, attending workshops like this one and participating in a SAMSI ASTRO working group, has expanded my interest to other types of variable objects and time series signatures.  I enjoy thinking critically about how these characterizations relate to physical properties of objects grouped in the same hyperplane of parameter space.

Our community of astronomers, statisticians and physical scientists are excitedly anticipating the era of time domain astronomy and, our new lens for probing the distant universe, gravitational wave detection.  The SAMSI-ICTS workshop (Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy) made a pioneering effort to bring together experts from seemingly different research fields in order to find common ground to exchange techniques and insights for analyzing time series data.  The workshop was hosted by the International Centre for Theoretical Sciences (ICTS) in Bengaluru, India. ICTS and SAMSI worked together to arrange speakers to present interesting content, coordinate for meals, handle logistics for the workshop and manage transportation for outings to explore the city. Special thanks are owed to ICTS as they went above and beyond assisting with visas, travel, accommodations and in orchestrating the 4-day workshop.

James Long, Asst. Professor of Statistics from Texas A&M University, gives a talk during the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy in Bengaluru, India. The four day workshop was held at the International Center for Theoretical Sciences (ICTS) and was a co-sponsored workshop with SAMSI.

James Long, Asst. Professor of Statistics from Texas A&M University, gives a talk during the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy workshop in Bengaluru, India. The four day workshop was held at the International Center for Theoretical Sciences (ICTS) and was a co-sponsored workshop with SAMSI.

The speakers presented on various topics, such as: variability statistics for classification in surveys; domain adaptation; noise modelling; and a whole slew of methodologies used to study the physics of transients, periodic and aperiodic variables and binary candidates for GW detection and localization. Speakers emphasized critical issues that needed improvements or further investigation. These issues were framed in the form of challenges to facilitate possible projects for collaboration. Talks were followed by panel discussions.  Several participants suggested that future similar workshops should provide allotted time for hacking or coding in conjunction with the panel discussions.  There was also an effort to document the challenges in an Authorea document, to serve as a discussion board afterward.

SAMSI workshops and working groups have helped me understand how my thesis work fits into the larger scientific picture and how to gain a better understanding of what our science priorities are as a community of observational astronomers.” 

All of the talks were video recorded, so visitors can view the talks, participants and abstracts of the presentations. In addition, photos and links to the webpage at SAMSI are also provided. SAMSI was a proud co-sponsor of this event and, in the future, they look forward to supporting research events like this in an international community setting. Sessions between panel discussions were organized into the following broad topics:

  1. Outliers and Background
  2. EM follow up of GW events
  3. Science of Transients, and
  4. Techniques for Time Domain Astronomy

A few talks that stood out to me included Rafael Martinez‘s (Associate Scientist at the Harvard-Smithsonian Center for Astrophysics) talk on “Building a Training Set for an Automatic LSST Lightcurve Classifier.” He talked about combining different classifiers, the problems with miscellaneous labels containing the largest number of objects and problems with period finding algorithms.  Hyungsuk Tak, a SAMSI postdoc, also gave a very nice talk, “Robust and accurate inference via a mixture of Gaussian and terrors,” and he asked the question why do astronomers so often and automatically assume Gaussian distributed errors? He presented a very promising method he developed combining Gaussians and heavy tailed (t-distributed) error models and demonstrated that the accuracy of inferred parameters improved significantly.  Another talk I enjoyed was Kuntal Misra‘s (Scientist of the Aryabhatta Research Institute Observational Sciences [ARIES] in Naintal, India). She talked about “Gamma Ray Bursts and Associated Supernovae”.  She provided a comprehensive discussion of lightcurve and spectral features used to classify and characterize these objects.

Participants of the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy pose for a group shot at the International Center for Theoretical Sciences (ICTS) in Bengaluru, India. The group was composed of astronomers, astrophysicists and statisticians from all over the world.

Participants of the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy workshop pose for a group shot at the International Center for Theoretical Sciences (ICTS) in Bengaluru, India. The group was composed of astronomers, astrophysicists and statisticians from all over the world.

From my perspective, as a fourth-year graduate student, I found the SAMSI workshops to be very eye-opening because they gave me so much context about sophisticated and efficient methodologies that work well with different data sets.  They provided a briefing on the latest and greatest techniques being applied to astronomical data in a setting conducive to discussion, cross-discipline education, and collaboration.  SAMSI workshops and working groups have helped me to understand how my thesis work fits into the larger scientific picture and to gain a better understanding of what our science priorities are as a community of observational astronomers.

I’m excited to see where these applications of machine learning take us?  In the future, I’d like to see more applications of hierarchical clustering and other techniques that capture continuity between subpopulations within a broader class.  These methods might help us transition into this massive (time series) data era to better understand our observations as dynamic systems but also in an evolutionary context.

This conference was not only great because of the science and stats. The location and the people who attended made it an unforgettable experience for me! Both ICTS locals and people invited through SAMSI were genuinely welcoming and kind folks. In the evenings after the workshop we all had dinner together, went for bike rides and played some ping pong.  After the workshop, I was invited to join a group touring the central part of Bengaluru and the archaeological sites at Hampi.  The days that followed were an adventure, and I sincerely appreciated the moments I shared with the great friends I made through this workshop!

Participants take a break from the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy workshop to explore Bengaluru, India. The four-day workshop was held at International Centre for Theoretical Sciences (ICTS) and featured speakers in the field of astronomy from around the world.

Participants take a break from the Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy workshop to explore Bengaluru, India. The four-day workshop was held at International Centre for Theoretical Sciences (ICTS) and featured speakers in the field of astronomy from around the world.

Rutgers Mathematician Undergraduate uses Workshop to Plan Future

Francesca Falzon

Contributed by: Francesca Falzon, Mathematics Major, Undergraduate, Rutgers University

As an aspiring mathematician, I was very excited to be participating in my first-ever mathematical conference at the Statistical and Applied Mathematical Sciences Institute (SAMSI). The two-day Optimization Undergraduate Workshop took place from February 27-28 in Durham, N.C. The workshop was part of SAMSI’s education and outreach initiative and included approximately 40 students from across the United States. The students convened to learn more about cutting edge research topics relating to optimization methods for large-scale statistical analysis.

A Busy Day…

After having a hearty breakfast at the hotel, we were whisked away to SAMSI where we began our day with an introduction from the Deputy Director, Sujit Ghosh. This was closely followed by a hands-on R tutorial, led by Paul Brooks, an Associate Professor at Virginia Commonwealth University. He began his talk with an exercise in finding the optimal point in a real coordinate space. He also thoughtfully carved out some time at the end of his lecture to provide insight and guidance to those of us considering graduate school.

My brief time at SAMSI was a whirl-wind experience packed to the brim with fascinating lectures and engaging workshops.”

Fueled up with tea and coffee during the break, we delved back into more optimization, this time as it relates to Bayesian Linear Inverse problems. Professor Alen Alexanderian of North Carolina State University explained how inverse problems, governed by PDE’s, can allow us to determine A-optimal sensor placement when the starting conditions are unknown.

As someone with a budding interest in computer science, but a limited knowledge of programming, I was very excited to partake in a workshop that taught us the Fundamentals of Scientific Python. Ahmed Attia, a SAMSI Postdoc, walked us through the basics of the Python language as well as the implementation of various tools/functions available through the NumPy package. This segued into a lecture series on the subject of Neural Networks and Optimization in Data Analysis given by Peter Diao and Sercan Yildiz, who are both also Postdocs at SAMSI. Machine learning is proving to be a popular research area, so it was great to be exposed to this rapidly growing field in mathematics and computer science. The program coordinators were extremely thoughtful in the inclusion of a career panel to wrap up the day. Applying for graduate school is often a daunting process so I welcomed the panel discussion on career opportunities, as we got to hear about ‘tips and tricks’ from graduate school application veterans – the SAMSI Postdocs and Graduate Fellows helping to run the program.

The Wrap Up…

Day two was quite a different change of pace. On Tuesday we got the chance to visit the SAS Institute campus in Cary, N.C. During our time there, we heard various presentations on current research being done in optimization from Manoj Chari, Yan Xu, and the other members of the Numerical Optimization team at SAS. I am not sure whether I would ultimately like to conduct research in an academic setting or an industry setting, so I found the exposure to both work environments very instructive.

My brief time at SAMSI was a whirl-wind experience packed to the brim with fascinating lectures and engaging workshops. It also proved to be a wonderful opportunity for not only learning about optimization methods, but also for networking with individuals at all academic stages – from fellow undergraduates to graduate students to associate professors. Needless to say, the experience at SAMSI exceeded my high-expectations and instilled in me a new-found excitement about my pursuit of mathematics upon returning back to my home institution!

6

Paul Brooks, Associate Professor at Virginia Commonwealth University begins his lecture by challenging students to find an optimal point in a real coordinate space at the Optimization Undergraduate Workshop.

SAMSI Postdoctoral Fellows Ready for Next Step in Careers

Postdoctoral fellows are a big part of the SAMSI family. This year we would like to recognize previous postdoctoral fellows as they continue on in their given fields of study. Find out what’s on the horizon for these young professionals:

lucas-mentch_B&W

 

Lucas Mentch
Currently serving as an Assistant Professor, Department of Statistics, University of Pittsburgh
SAMSI Postdoctoral Fellow: 2015

 

ben-risk_B&W

 

Ben Risk
Accepted upcoming position as an Assistant Professor, Departments of Biostatistics & Bioinformatics, Emory University
SAMSI Postdoctoral Fellow: 2015

 

zhengwu_zhang_B&W

 

Zhengwu Zhang
Accepted tenure-track position in the Department of Biostatistics & Computational Biology,University of Rochester
SAMSI Postdoctoral Fellow: 2015

VA Tech Graduate Student uses Inverse Problems Workshop to Influence Personal Research

slagelj_2

Contributed by: Joseph Tanner Slagel, Graduate Student, Dept. of Mathematics, Virginia Polytechnic Institute & State University

As a graduate student at Virginia Tech, my research interests are in large-scale numerical linear algebra. In particular, I have recently been studying stochastic approximation methods for solving very large least square problems.  These are least square problems where the data size is terabytes (or even petabytes!) in size, and thus cannot fit in a computer CPU’s memory all at once.

I looked at the Statistical Inverse Problems Workshop at the Statistical and Applied Mathematical Sciences Institute (SAMSI) as an excellent opportunity to learn about emerging topics and techniques that I could apply to my own personal research.

I attended the Statistical Inverse Problems Workshop from January 26-27, 2017. The friendly and open atmosphere of the SAMSI workshop made it easy for me to make new connections and to discuss topics related to my research. The experience gave me the opportunity to work with a dynamic range of professionals (graduate students, postdocs and junior/senior level faculty), which helped me gain perspective from mathematicians at various stages of their career.

“Attending this SAMSI workshop was a great way for me to connect with other researchers whose interests overlap with my own.”

A distinguishing feature of the workshop was that, in addition to plenary and research talks, there was a lot of time for research discussions and collaborations on current projects – small groups were encouraged to find an open room and work for multiple hours on emerging research questions.

The talks at the workshop gave me an opportunity to learn about important problems in Bayesian inverse problems. Despite the small group, I heard a range of talks that provided an overview of open problems in the field, expounded on main computational and algorithm challenges, and described lots of cool real-life applications. The most beneficial part of the workshop for me was getting to speak to others about my research. I received a lot of helpful input and pointers to resources that have helped me see where my work fits into the larger statistical inversion community.

170126_stat-inverse_attia-lecture2

Ahmed Attia (instructing), a Postdoctoral Fellow at SAMSI, gives a research lecture at the Statistical Inverse Problems Workshop

Attending this SAMSI workshop was a great way for me to connect with other researchers whose interests overlap with my own. I look forward to returning to SAMSI in the future for more collaborations and discussions!

NOTE: To see what subjects were presented at this workshop visit the SAMSI website at: www.samsi.info/opt-inv-prob.

UT Mathematician Discusses Advances & Future of Super-computing

Jack Dongarra

Jack Dongarra, Director of the Center for Information Technology Research and Innovative Computing Laboratory – University of Tennessee (photo courtesy of University of Tennessee)

The Workshop on the Interface of Statistics and Optimization (WISO) at Duke University’s Penn Pavilion wrapped up recently. After we said good bye to all of our participants, I was reminded of an interesting talk given by one of the twelve insightful speakers. The speaker in question was Jack Dongarra.

Jack is the Director of the Center for Information Technology Research and Innovative Computing Laboratory, from the University of Tennessee. He lectured on: “The Road to Exascale and Legacy Software for Dense Linear Algebra (or what we have been doing for the last 43 years).” Jack’s talk at WISO could be appreciated even by those who are not specialists in the esoteric matters of statistics and optimization. True to the title, Jack has been designing high-performance software for over 40 years. His pioneering work has been honored with many awards, including membership in the US National Academy of Engineering.

Twice a year, Jack co-publishes a benchmark of the 500 most powerful supercomputers in the world. The current number one, as of November 2016, is the Sunway TaihuLight machine in Wuxi, China. Built from over 10 million processors, it can execute more than 1016 operations in a single second. The Titan Cray XK7 at Oak Ridge National Lab, located in Oak Ridge, TN, is number three.

sunway-taihulight

The Sunway TaihuLight machine (above) is the number one super computer in the world. Its more than 10 million processors make it possible to execute millions operations in a single second!

For the many young people in the audience, Jack started his talk with a history of how hardware and software have developed over the years, and how he himself got into the business of benchmarking. A few photos from the 1970’s, featuring Jack and his colleagues in bell bottoms and side burns, proudly posing on a Ford Pinto with the license plate “Linpack” (after their groundbreaking public-domain software package) were well received.

Dongarra Friends

– Jack Dongarra and fellow software developers pose with his trusty Ford Pinto in the 1970’s identified with the license plate “LINPACK.” LINPACK was the group’s groundbreaking public-domain software package.

Jack’s advice for how to build good software: Keep the processors busy with arithmetic, make sure they coordinate their work schedules efficiently rather than sitting idle waiting for results from other processors.

Jack ended with a list of the many challenges for software design, including:

  • Efficiency (speed matters),
  • Scalability (keep up the efficiency, even in the face of growing work),
  • Reliability (with 10 million processors in use, some are bound to fail)
  • Portability (the particular hardware platform should not matter).

Jack Dongarra was one of many interesting speakers at the WISO. If you are interested in seeing the other presentations from the 3-day event, visit the WISO Video Page.

dongarra_top500c

ASTRO Workshop Brings Researchers together to Discuss Exoplanet Exploration

czekala_headshot

Contributed by: Ian Czekala, Kavli Institute for Particle Astrophysics and Cosmology (KIPAC), Postdoctoral Fellow, Stanford University

My research focuses on understanding young stars and their protoplanetary disks during the planet formation epoch. For a number of reasons, I was particularly excited about attending the Statistical and Applied Mathematics Institute (SAMSI) Hierarchical Bayesian Modeling of Exoplanet Populations, October 17-28. The main reason was that my previous experiences at SAMSI have always been so positive. For example, I first learned about many of the topics and techniques that I use on a regular basis in my research through a similar SAMSI workshop in 2013. Three years later, I was again eager to learn new analysis methods from the statistical expertise gathered at SAMSI. Although two weeks may seem like a long time for a workshop, I knew that the close-knit environment would foster collaboration, catalyze many new projects, and make the conference pass by way too quickly. The following is a brief account of the conference with highlights of aspects that I found particularly interesting, by no way is this a complete or unbiased survey of all that transpired!

First, allow me to explain the context for our workshop. In August, 80 researchers from the fields of exoplanets, gravitational waves, and statistics converged upon Research Triangle Park to kick off the year-long SAMSI program on Statistical, Mathematical and Computational Methods for Astronomy. At the Opening Workshop for this program, we explored ideas and statistical techniques common to these fields and brainstormed interesting projects to work on over the next year. We splintered into five “working groups,” each focused on a particular topic or technique. I joined Working Group IV – Astrophysical Populations, which was focused on hierarchical Bayesian inference of exoplanet populations. Each working group has maintained momentum through weekly teleconferences, and most groups will have a workshop at SAMSI at some point during the academic year. The year-long program will be capped by a “transition” workshop in May 2017.

Angie Wolfgang, a National Science Foundation Fellow at Penn State University and Eric Ford, a professor, also at Penn State, were the main organizers of the Astrophysical Populations workshop. We had about 20 participants split equally between astrophysics and statistics. Our first morning was spent discussing our research interests and what we hoped to accomplish over the next two weeks. Two major groups evolved from this discussion. The first was centered on exploring the mass-radius relationship of exoplanets from photometric transit and radial velocity datasets. The second was focused on spectroscopic techniques to characterize stars and measure their radial velocity. Although our workshop was nominally about exoplanets, it turns out that a proper understanding of stars is fundamental to detecting and understanding the exoplanets that orbit them.

Understanding the Planet Mass-Radius Relationship…
In the past decade, astronomers have transitioned from knowing of the existence of only a handful of exoplanets to discovering a vast collection of several thousand. Most planets have been discovered by the Kepler Mission, which finds planets by measuring the dip in light as a planet transits its host star. It is most informative about a planet’s radius. For a select subset of these planets, precise radial velocity monitoring yields the masses of the planets as well. Because we are necessarily operating at the detection limit of our telescopes when studying small planets, it is very important to utilize proper statistical analysis lest our interpretation be biased.  The fundamental unknown that links a planet’s mass and radius is the planets composition, and so with a proper statistical framework we might hope to infer how planet composition varies amongst the thousands of known exoplanets, telling us something deep about the planet formation process in general.

rogers

Leslie Rogers, an Assistant Professor in the Department of Astronomy and Astrophysics at the University of Chicago, speaks about planet composition distribution.

Angie Wolfgang, Bo Ning, a Ph.D. candidate in the Department of Statistics at N.C. State University, and Sujit Ghosh, SAMSI Deputy Director, explored using Bernstein Polynomials to model the planet mass-radius relationship non-parametrically, and showed promising results that included measurement uncertainties. Leslie Rogers, an Assistant Professor in the Department of Astronomy and Astrophysics at the University of Chicago, talked about the planet composition distribution. In addition, she also discussed how to link physically motivated models of planet composition to data and determine if this composition changes as a function of planet formation mechanism. Kaisey Mandel, a Postdoctroal Fellow at the Harvard-Smithsonian Center for Astrophysics, worked on understanding selection effects as they apply to exoplanet surveys. This was his focus since he is also interested in selection effects of Type Ia supernovae surveys.

A sizable group of people worked on translating hierarchical sampling code into the new language STAN. In particular, Megan Shabram, a Postdoctoral Fellow with NASA’s Kepler Mission and Joe Catanzarite, a SOC Scientific Programmer with NASA’s Kepler team, produced an open-source Jupyter notebook that implemented planet occurrence rate calculations in PySTAN.

Central to many of our problems discussed at this workshop was the topic of “emulation” or “uncertainty quantification,” which is actually the primary topic of Working Group I – Uncertainty Quantification and Astrophysical Emulation. Bekki Dawson, an Assistant Professor in the Penn State Department of Astronomy and Astrophysics, and Assistant Professor, Anirban Mondal of the Mathematics, Applies Mathematics and Statistics Department at Case Western Reserve University, worked on developing astrophysical emulators for planet formation models, so that more accurate (and computationally expensive) models could be used in hierarchical Bayesian inference to understand the formation of super-Earths and mini-Neptunes. Related to this problem, Jessi Cisewski, Assistant Professor in Yale’s Department of Statistics, made several informative presentations on Approximate Bayesian Computing (ABC) to solve inference problems where it is difficult to write down a likelihood function.

stenning

David Stenning, one of two SAMSI Postdoctoral fellows at the workshop, presented talks on techniques using astrostatistics to improve exoplanet analysis.

Hierarchical Spectroscopic Inference with Time Series Stellar Spectroscopy…
A large group of astronomers and statisticians worked on techniques to improve radial velocity precision, with the hopes of finding planets with the mass of earth and below. Eric Ford, Jessi Cisewski, David Stenning and David Jones, Postdoctoral Fellows at SAMSI, Robert Wolpert, a Professor of Statistical Science and the Environment at Duke Univesity; Tom Loredo, a Senior Research Associate in Astronomy at Cornell; Ben Montet, a Postdoctoral Researcher from the University of Chicago and I worked on radial velocity fitting using mock spectral datasets with known statistical characteristics. These datasets are comprised of real stellar spectra of the sun to which have been added planets (the signal of interest) and star spots (a confounding signal). We examined interesting principal component analysis with the hope of isolating the orbiting planet from stellar activity. During this period, we were also treated to two presentations by the SAMSI postdocs David Stenning and David Jones about using Gaussian processes to correlate stellar activity indicators with radial velocity jitter and using diffusion mapping to understand stellar variability.

By the end of the workshop, we were all knee-deep in immersive projects that we had started just 10 days prior – we were reluctant to leave!  The collaborative working environment, with daily updates of what we had accomplished certainly fueled an exciting work schedule, since everyone was motivated to complete new ideas to share with the group. By the end of the workshop, several of us remarked that in fact two weeks was not a long enough period for us to get anything done – we were all so dedicated to the research, we wanted to stay! To cap it all off, we were treated to a tasty “special presentation” by Tom Loredo, who shared with us how chocolate is made.

These researchers will collaborate over the next several months on this continued analysis of exoplanet discovery.

chocolate_samsi

Workshop participants were treated to a chocolate tasting from Tom Loredo, a Senior Research Associate in Astronomy at Cornell. Loredo is a hobbyist chocolatier and candymaker and his confections were enjoyed by the group.

E&O Undergraduate Astrostatistics Workshop: A Stellar Learning Experience

rachel_m

Contributed by: Rachel Matheson, Mathematics Undergraduate Student, Vassar College – Poughkeepsie, NY

As a math major at a liberal arts school, choosing my classes for the next semester always feels like a lot is at stake. I want to take physics, neuroscience, astronomy and biology, but I also want to take social sciences and humanities. Dipping in to the Statistical and Applied Mathematical Sciences Institute’s (SAMSI) Undergraduate Workshops gives me a chance to experience the different flavors of applied math and statistics without the commitment of a class. I was therefore extremely delighted to be invited to come back to SAMSI this October for a two-day undergraduate workshop focused on Statistical, Mathematical and Computational Methods for Astronomy (ASTRO).

SAMSI Workshops – Full of Information…

14702470_10154660146624931_4260953299613751705_n

Jessi Cisewski, an Assistant Professor in Yale University’s Department of Statistics, conducts a lecture on Approximate Bayesian Computation in Astrostatistics. This talk was one of several performed during the workshop.

Though the workshop was brief, it was packed with interesting lectures and hands-on activities. After a yummy breakfast at the hotel, we shuttled off to SAMSI’s campus and were greeted by SAMSI Deputy Director, Sujit Ghosh, who delivered his opening remarks to our group. From there, we quickly transitioned into a lecture from Jessi Cisewski, an Assistant Professor in Yale University’s Department of Statistics, on Approximate Bayesian Computation in Astrostatistics. The lecture was very enjoyable and informative – it served as a reflection and an extension of what I have been learning in my probability class, applied to the stellar initial mass function.  Bekki Dawson, an Assistant Professor from Penn State University’s Department of Astronomy and Astrophysics, then dazzled my mind with stellar facts during her lecture titled Time Domain Challenges for Exoplanets. I was surprised to learn that this is an area where technology is good and up-to-date but we still don’t have the statistical methods to interpret noise in the data properly in order to detect exoplanets similar to Earth.

“SAMSI serves largely as a space for me to feel motivated about my pursuit of applied math and connect with people who feel just as excited as I do about it.”

After a short break, we delved back in to a tutorial on R led by SAMSI Post-Doctorate fellows David Jones, David Stenning and Hyungsuk Tak. This was a helpful overview to lead up to the intensive, hands-on workshop of modeling Gaussian processes. Line-by-line comments in the R code kept me from feeling lost as the lecture sped on, deep into the mathematics and emulator needed in order to make this model run. I could easily go back and gain understanding as post-doctorate fellows stood throughout the room ready to help at an arm’s wave. It felt like a really positive learning environment, despite the high speed at which the material was presented.

14639628_10154672600239931_8311141546231288361_n

On Day 2 of the Undergraduate Workshop, students got to visit the Morehead Planetarium on the campus of UNC-Chapel Hill. The students enjoyed two presentations on the universe and the existence of blackholes.

Opportunities and Guidance from those who have done it…

One of my favorite aspects of coming to SAMSI is being able to talk to the post-doctorate fellows, SAMSI faculty, and my peers, about anything from career path choices to, quite literally, the stars in the sky. The panel on career opportunities led by some of the graduate fellows was a wealth of information for nervous undergraduates to seek advice from those who have “made it,” as well as to start conversations to continue later on. I ended up eating dinner with two post-doctorate fellows, who advised me on everything from which classes to take to not worrying too much.

Leaving with a new sense of purpose…

After a visit to the Morehead Planetarium, I felt sad to be leaving almost as quickly as it began. It is always so reassuring to talk to people who are pursuing what I am interested in, not to mention truly inspiring and exciting. SAMSI serves largely as a space for me to feel motivated about my pursuit of applied math and connect with people who feel just as excited as I do about it. It forges what may be a 2-day community, but that community gets to live on through email and LinkedIn. I am too glad to have had the opportunity to experience SAMSI as a community and as a learning space – it excels at being both!

To see more about what happened at this workshop, visit: ww.samsi.info/astro-undergrad. To see past and upcoming workshops in our ASTRO Program, visit: www.samsi.info/astro.

astro-undergrad-group_sun-dial

Students from the Undergraduate Workshop pose for a picture at the working sun dial located at the Morehead Planetarium on the campus of UNC-Chapel Hill. The students visited the planetarium as part of their workshop activities. The 2-day undergraduate workshop was part of the Education and Outreach for SAMSI’s Program on Statistical, Mathematical and Computational Methods for Astronomy (ASTRO).

DPDA Workshop: Reinforcing the Importance of Statistics and Applied Mathematics in Distributed Computing

alexander-terenin_headshot

Contributed by: Alexander Terenin, Statistics and Applied Mathematics PhD student, University of California – Santa Cruz

I am a PhD student in Statistics and Applied Mathematics at the University of California – Santa Cruz (UCSC). My research focuses on Bayesian statistics – specifically, Markov Chain Monte Carlo methods at scale in parallel and distributed environments for big data applications. I had heard about the workshop from a fellow graduate student in my department, and attending was a very natural choice given my area of research.

The Workshop…
On September 20 – 23, I had the privilege of attending a 4-day Workshop on Distributed and Parallel Data Analysis (DPDA) hosted by Statistical and Applied Mathematical Sciences Institute (SAMSI) at North Carolina State University in Raleigh, N.C. I would like to take this time to reflect on my observations after attending the workshop in this piece.

Upon arrival, the workshop proceeded as workshops usually do: various speakers gave talks on different topics, intertwined with breaks that give participants the opportunity to take a moment to think about the talks, as well as time to talk to one another about ideas. I was intrigued to see that the DPDA workshop had no parallel sessions – a format I much prefer because it brings people together that may otherwise never end up in the same room.

img_1098

Participants at the 2016 DPDA Workshop network during one of the scheduled times of the series. Participants used these opportunities to network and collaborate on ideas.

Informative and Engaging Discussions…
A number of these talks and discussions stood out to me – I’ll highlight three of them, in order of occurrence.

Wotao Yin, a faculty from UCLA’s Mathematics Department, gave a talk on “Asynchronous Parallel Coordinate Update Algorithms.” In this talk, he described a particular class of parallel versions of optimization algorithms – asynchronous iterative algorithms.

To understand what these are, let’s first back up and speak for a moment on iterative algorithms: these are algorithms where some sequence of steps is repeated until convergence. To take the next step, we need to have completed the previous one – so how can iterative algorithms be parallelized? It turns out, one way to do so is to make them asynchronous. For example, a set of workers perform a set of iterative steps as fast as they can, talking to each other as much as possible, with no control over what order these steps occur in. So then the question is asked, can such processes converge? Sometimes this is possible. If the algorithm’s state space forms a box, and if individual steps shrink the box, then the algorithm will converge even if performed asynchronously. After recalling these results, Prof.Yin illustrated that certain coordinate ascent algorithms satisfy these conditions. This talk was very interesting for me to listen to as I have written a paper about the asynchronous variant of Gibbs Sampling, an algorithm for Bayesian computation, the analysis of which is complicated but involves the same conditions. Seeing the same ideas used in a different context was very interesting and got me to think about similarities and differences with my own work.

Eric Xing, a faculty from Carnegie Mellon’s Computer Science Department, gave a talk on “Strategies and Principles for Distributed Machine Learning.” His lecture focused on a description of a variety of computational software environments used in big data setting, and how different implementation choices can yield vastly different levels of performance. This topic was interesting, because it bridged the theory of statistical computation with software engineering considerations that end up having substantially more implications for performance than might be expected. For example, in a distributed setting, having a master node that manages and coordinates workers can yield different performance characteristics than a peer-to-peer model where all of the workers talk to each other – even if the exact same algorithm is used in both cases. Similar lines of thought have been highly relevant in my own work as well. Having written papers on performing Markov Chain Monte Carlo algorithms in two different parallel settings – compute clusters and graphics cards – I have learned that software engineering considerations are an inherent part of parallel computing and it is important to study them.

I also found the discussion panel toward the end of the workshop to be particularly memorable. My PhD Advisor at UCSC, David Draper, was on the panel, along with a number of distinguished faculty members from several universities – moderated by Sujit Ghosh, Deputy Director of SAMSI. Draper made the point that for the field of statistical computation to advance, “statisticians need to become better computer scientists, and computer scientists need to become better statisticians.” This point resonated with me because as a student in a graduate program in statistics, we are largely not taught anything about high performance computing, whether in traditional supercomputer or Silicon Valley style hardware environments. I however, have been fortunate that I have had the privilege of working in both settings through an academic collaboration with Shawfeng Dong, an astrophysicist at UCSC, and my time at eBay, Inc. – many statisticians have not had this comparable opportunity.

This makes statistical high performance computing a specialty area, which in my view causes two discipline-wide consequences: (1) it’s easy for non-specialists to write code and design algorithms that scale poorly, and (2) the typical software stack that statisticians are taught and use in practice is filled with out-of-date tools and programmatic concepts that make coding and debugging unnecessarily difficult.

It was very interesting to hear similar ideas brought up and discussed as part of the panel. The experience was vital because the panel emphasized the implications on statistical education, a topic I do not have many opinions about, because I am still a student. The discussion panel gave me the opportunity to think about our field as statisticians and applied mathematicians and where our discipline is headed.  This new information and insight is important for a young person, such as myself, because it tells me what to study and spend my time on throughout my graduate program.

Participants at the 2016 DPDA Workshop discuss various topics on distributed computing during the Workshop Reception and Poster session.

Participants at the 2016 DPDA Workshop discuss various topics on distributed computing during the Workshop Reception and Poster session.

“Statisticians need to become better computer scientists, and computer scientists need to become better statisticians.”

A Good Experience Overall…
Overall, I found the workshop highly memorable. The points highlighted merely scratched the surface of topics I wanted to discuss. An honorable mention was the lecture by Han Liu, a faculty at the Statistical Machine Learning Lab at Princeton University. Liu’s talk was called “Blessing of Massive Scale” and he demonstrated that some problems become much easier when they are big. Faming Liang, a faculty at the University of Florida’s Department of Biostatistics, spoke about “Bayesian Neural Networks for High Dimensional Variable Selection.” I found  Liang’s treatment of Bayesian asymptotics interesting.

Finally, Samuel Franklin’s, of 360i: Digital Marketing Agency, presented a talk called “HDPA Growth Constraints in Digital Marketing.” The subject was surprisingly interesting for a talk that involved no mathematics. He called upon all of us in the room, the next generation of statisticians, engineers and applied mathematicians to be champions for increased education on high performance computing foreshadowed some of what was later said in the panel.

Data Science at 360i, lectures on the importance high speed computing as a resource for digital marketing strategies.

Samuel Franklin, Vice President of Data Science at 360i, lectures on the importance of high speed computing as a resource for digital marketing strategies.

I was thankful that I had the opportunity to attend and listen to all of the wonderful perspectives that were offered on our field of study, as well as the opportunity to try North Carolina BBQ during one of the evenings. I would also like to thank SAMSI for compiling and sharing the approved lectures from this event online. For more information about the DPDA Workshop or simply to review what was presented, visit: www.samsi.info/dpda.

SAMSI Undergraduate Workshop inspires Student Growth

joanna-itzel_rev

 Contributed by: Joanna Itzel Navarro, Statistics Undergraduate, University of California – Los Angeles

From May 22-26, 2016, I had the privilege of participating in the SAMSI (Statistical and Applied Mathematical Sciences Institute) Interdisciplinary Workshop for Undergraduate Students.

In my quest for statistical research, I learned about SAMSI after coming across a paper on Markov chain Monte Carlo (MCMC) methods written by the Deputy Director of SAMSI, Sujit K. Ghosh.  A statistics alumnus from UCLA had previously mentioned SAMSI to me before, so when I came across Dr. Ghosh’s paper, I was compelled to find out more about this program he and Dr. Ghosh endorsed.  A few months later, I found myself at SAMSI learning about random walks and the Metropolis-Hastings algorithm from Dr. Ghosh himself.

The SAMSI Experience…
The day after arriving in North Carolina, the workshop commenced with a presentation by the Director of SAMSI, Dr. Richard Smith, on statistical reasoning in public and the complexity of small and large data sets. Throughout this first day of the workshop, we heard more data talks from different sources in order to investigate a variety of questions related to several exciting and emerging areas of research.  The research projects available to us ranged from the overall complex dynamic behavior of the brain and nervous system to measuring climate change through dolphin migration patterns. After the talks ended, the other students and I broke up into groups of 5-9 and were assigned to the research project we selected.  Before the first day was over, we got to know our group members and learned of all the different majors we were.  This miscellany of majors initially struck us as inexpedient, but throughout the week, we learned that bringing together minds from different backgrounds, qualifications, and experiences is key to effective problem-solving.

“When we found ourselves stumped, all it took was one group member to pose a provoking question or novel information to furnish the impetus that moved us forward.”

Reinforcing Effective Foundations in Statistics…
The following days entailed a wealth of R, MATLAB, presentations on giving effective presentations, and panels on graduate school programs and graduate school life. Additionally, we toured neighboring research institutions in North Carolina’s Research Triangle Park and reconnoitered the campus NC State University.

Research Group Projects…
While our morning and afternoon activities varied, our evenings remained dutifully allotted for our research projects and group work.  After an eventful day, we came back every evening to find ourselves huddled around desks and ripe for our research projects.

lecture-at-undegrad-workshop_rev

– Joanna Itzel Navarro presents findings on her Research Group’s Project at SAMSI Interdisciplinary Workshop for Undergraduate Students, May 22-26, 2016.                               (photo provided by Navarro)

My research group was under the guidance of Duke’s newest, congenial statistics postdoctoral fellow, Dr. Adam Jaeger, and our research examined how various environmental factors predict behaviors of bottlenose dolphins in the Northern North Carolina Estuarine System (NNCES) stock in Roanoke Sound, North Carolina.  Furthermore, our research sought to discover how water temperature relates to the presence of dolphins and whether a change in the frequency of dolphins could be indicative of climate change.

Learning Through Diverse Perspectives…
The amalgam of majors in our group was certainly a recipe for a wide range of questions and approaches, and we noticed this especially in the beginning.  This led us to adopt a multidisciplinary approach, and by the end of the program, we had molded ourselves into your quintessential, diverse research team. When we found ourselves stumped, all it took was one group member to pose a provoking question or novel information to furnish the impetus that moved us forward.  We were all challenged to work out our differences and use our diversions as opportunities; we learned to anticipate alternative viewpoints and to expect that reaching a consensus would take effort and strong reasoning.

The End…

itzel-listening-to-lecture_rev

– Joanna Itzel Navarro listens to one of many lectures presented at SAMSI Interdisciplinary Workshop for Undergraduate Students, May 22-26, 2016. (photo provided by Navarro)

On the last day of the workshop, every group presented their research findings.  The presentations were interactive and the questions were provoking.  After a series of group photos and goodbyes, we all parted our separate ways. This was not the end of things for us though. Currently, many of us remain connected.  Whether through our Facebook group we’re all part of or through email, we continuously share with each other and let each other know about other opportunities.

Participating in this interdisciplinary workshop has highlighted the role of mathematical sciences, particularly statistics, in solving a gamut of important problems.  Through the tours, presentations, group research, and interacting with erudite people from academia and industry, this workshop has imparted an educational experience that I cannot image receiving elsewhere. This was an indelible experience and a worthwhile way to spend my degrees of freedom.

2016-samsi-undergrad-workshop

– Group photo of students and mentors at the SAMSI Interdisciplinary Workshop for Undergraduate Students, May 22-26, 2016. (photo provided by Navarro)

SAMSI/Harvard Workshop on Environmental Health Data: A Lasting Impression – 9 Months in the Making

Contributed by Krista Coleman, MSM; Associate Director of Research Strategy and Development, Harvard T.H. Chan School of Public Health

I facilitated the ‘Introductions’ on the morning of Day 1 of the Statistical Methods and Analysis of Environmental Health Data in Mumbai, India, and I can’t express how satisfying it was to see nine months of planning come to life. Once everyone provided their brief introduction including their name, professional title, institution, and area of research interest, I recall saying, “Well, it sounds like we’ve gathered the right group of researchers together!” That statement held true throughout the week as I watched existing colleagues reconnect, new collaborations form, and treasured friendships develop – all because we came together around the topic of India’s pressing public health challenges related to indoor and outdoor air pollution.

Krista Coleman_SAMSI

Dr. Francesca Dominici, Professor of Biostatistics and Senior Associate Dean of Research at the Harvard T.H. Chan School of Public Health speaking at the workshop.

This workshop was the product of a identifying a unique opportunity, the pooling of ideas and resources, strategic planning and dedication from the organizers at the Harvard T.H. Chan School of Public Health, SAMSI, and ISI-Kolkata. An incredible amount of care an attention went into the identification and selection of workshop participants – each of us leveraged the networks of our colleagues in the U.S. and India to recommend participants that would get the most out of their investment in the week, while also contributing to the benefit of others. Once we had a tentative roster, we worked with precision to create a program and recruit speakers that would meet the needs of all of those in attendance and seed collaborations. We were able to leverage the Harvard T.H. Chan School of Public Health’s India Research Center in Mumbai and with an incredible amount of communication across time zones, plan and confirm all of the logistical arrangements for the workshop. Having never planned a workshop, let alone an international event, it was quite an experience to invest so much of myself in watching the seed of an idea be nurtured along the way and blossom into a wildly successful effort!

Touring Mumbai_SAMSI

Workshop attendees touring Mumbai.

In my role at the Harvard Chan School, I rarely get to so closely observe learning and research in action. It was such a gift to observe the lectures – watching researchers (spanning from students to professors) engage and learn from each other. I was amazed by how quickly the working groups began their collaborative efforts and was in awe of how much they were able to accomplish in just a few days – again, honoring the fact that we were all in the right place at the right time.

Dr. Prabhakaran Dorairaj_SAMSI

Dr. Prabhakaran Dorairaj of Public Health Foundation of India (PHFI) speaking at the workshop.

It’s my nature to set high expectations for projects I engage in, and having never done this before, I wanted it to be perfect. I can say with great confidence based on my own experience and from the feedback we received, that we exceeded our expectations in Mumbai. I’m deeply grateful for all of the contributions from the organizers, speakers, and participants. This wouldn’t have been a success without the engagement from all of those who attended. Thank you all for being part of such an incredibly rewarding experience!