The 20th Industrial Mathematical and Statistical Modeling Workshop for Graduate Students (IMSM) just wrapped up its workshop last week. The students met for 10 days and broke into five teams, working with mentors from government and industry on real-world problems.
Thirty one graduate students from 28 different institutions participated in this year’s workshop. The first day the representatives from industry and government presented their projects, which ranged from developing a water purification system to finding where a meteor might have crashed in Russia in 2013.
The “Hunt for Red Hot Rock-tober” group was mentored by John Peach, MIT Lincoln Laboratory, and Minh Pham, SAMSI, included: Hossein Aghakhani, SUNY at Buffalo; Jingnan Fan, Rutgers; Alex Farrell, Arizona State; Ya-Ting Huang, Stony Brook; Benjamin Levy, U. Tennessee; Het Mankad, U. of Texas at Dallas; and Michael Minner, Drexel.
They tried to figure out exactly where a meteor landed that had exploded in an airburst on February 15, 2013 somewhere south of the city of Chelyabinsk, Russia. The group used Bayesian search methods to formulate as many hypotheses they could about what happened to each object assuming that it most likely broke up into several smaller chunks as it entered the atmosphere. For each hypothesis, they constructed a probability density function for the location of each object. The other scenario is that it stayed in one piece and hasn’t been found yet. The group used Google Earth images and created a Google Earth sensor to detect meteor-like shapes. They made a probability map of where chunks of the meteor may have landed, sorted by the highest probability down. They only searched the top 90% and then looked at images before and after the event. They needed to reduce the false alarms, so they converted the images to gray scale and then to binary. They re-grayed the imaged and used a Gaussian blur to detect differences in the before and after images that were round-shaped like a crater would be. This reduced the false alarms from 71 to 27. Seven of these images seemed acceptable, but none of the images they looked at ultimately were craters. They concluded that there was a 57.2% chance that there was no crater in the area.
The group working with Matthew Farthing, U.S. Army Corps of Engineers and Lea Jenkins, Clemson University, on the project entitled “Water purification via Membrane Separation,” included: Fei Cao, Pennsylvania State; Caleb Class, MIT; Tyson Loudon, Colorado School of Mines; Monica Nadal-Quiros, U. of Puerto Rico; Star-Lena Quintana, Temple; Benjamin Ritz, Clarkson; and Xiangming Zeng, North Carolina State U.
They were looking at a way to create the best water purification system. While filtration is typically used to remove a particular contaminant, it can also be used to retrieve valuable components. This would be used for other industries, such as the pharmaceutical industry, or polymer processing. The group used a simulation-based optimization to look at how to improve membrane performance for filtration and separation processes. One of the important applications for this project was to purify water for army personnel in the field who need to reduce pathogens, quickly purify water and reduce the incidence of clogging the membrane. Due to time restraints, the group focused on one-dimensional models, but suggested that future work would use 2-D or 3-D models to better represent the dynamics of the separation process.
The “Geographic and Racial Differences of Persons Living with HIV in the Southern United States” group was mentored by Simone Gray, Centers for Disease Control and Prevention (CDC) and Howard Chang, Emory. The group included: Isabel Chen, Emory; Christina Edholm, U. Lincoln-Nebraska; Rachel Grotheer, Clemson; Tyler Massaro, U. Tennessee; Yiqiang Zhen, Purdue.
The group was tasked to quantify the contribution of race and socioeconomic determinants to the overall presence of HIV, particularly focusing on the Southeast. They used the 2010 U.S. Census data and the American Community Survey, along with the DCD’s National Center for HIV/AIDS, Viral Hepatitis, STD and TB Prevention (NCHHSTP) Atlas and looked at several variables including unemployment, education level, race, urban status, poverty and income at the county level, which included 1,422 counties in 16 states. They used three types of regression modeling including multiple linear, conditional autoregressive, Bayesian Poisson hierarchical mode; non-metric multidimensional scaling and two types of cluster analysis (K-Means and Besag-Newell) to analyze the data. They concluded that the non-Hispanic black ethnicity remained the most important indicator of HIV prevalence rate in the southern United States.
Another group worked on the “Allergy, Asthma and Exposures in the Homes of the US Population” problem. The group, mentored by Agustin Calatroni, Herman Mitchell and Russ Helms of Rho, Inc. and Sanvesh Srivastava of SAMSI, included: Alexej Gossmann, Tulane; Tamra Heberling, Montana State; Nancy Hernandez Ceron, Purdue; Yuanzhi Li, Utah State; Anastasia Wilson, Clemson; and Hongjuan Zhou, U. of Kansas.
From 1980-2012, cases of asthma in the U.S. has increased by 171% . Allergies and asthma cost about $56 billion a year. An extensive study called the National Health and Nutrition Examination Study (NHANES) was conducted in 2005-06 to develop a prediction model for asthma based on allergies and exposures in the home. They surveyed about 10,000 people to determine the prevalence of major diseases and the risk factors for those diseases. Rooms in the participant’s homes were vacuumed to collect dust samples. The students used logistic regression, LASSO regression and random forest models to examine the data. They concluded that the random forest models had the highest accuracy rate for prediction.
Another group worked on the “Analysis of Self-Reported Health Outcomes Data ” project. The group that was mentored by Mark Wolf, SAS, and Kenneth Lopiano, SAMSI and Duke, included: Fatena El-Masri, George Mason; Karianne Bergen, Stanford; Obeng Addai, Youngstown State; Piaomu Liu, South Carolina; Shrabanti Chowdhury, U. California at Riverside and Xin Huang, U. Texas at Dallas.
This group looked at self-reported health outcomes data from web based media sources. Usually clinical outcomes are derived from surveys of patients and formal reports from physicians when a side effect occurs from taking a drug, for example. However, many people are on forums, bulletin boards and social media outlets talking about drug-related or health-related data that gives more instantaneous feedback about how a drug may be performing. Text mining techniques are very important to get this kind of feedback. The group used SAS Enterprise Miner to parse, filter and identify topics in each document they examined. They proposed a set of methods taking advantage of SAS Text Miner to break the words up into nouns, verbs, adjectives, etc. They then used a filter to decide whether to keep or drop the word, and then had the program classify the word into a category. They looked at author interactions and applied a page rank algorithm. They then conducted a sentiment analysis to gather any emotion around the posts and then took out the useless posts and just kept the ones that seemed to be noteworthy. They looked at topics trending to see if there was increased chatter on a topic using a burst detection method, then used a Markov model to analyze the inter arrival gaps.
To get a much better understanding of the work that was conducted during this workshop, read the final report here.