Associate Director of SAMSI and professor of mathematics at North Carolina State University, Ilse Ipsen, recently spoke at the Sigma Xi pizza lunch. The lunch is a monthly gathering co-sponsored by the Science Communicators of North Carolina (SCONC) and the RTP chapter of Sigma Xi.
Ilse’s talk, “Rolling the Dice on Big Data” focused on how big data is permeating all aspects of our daily lives. From going to the grocery store, where supermarkets are gathering data on our personal buying habits, to analyzing images from space, to the Internet where Google receives 2 million inquiries a minute and 347 blog posts are happening every minute of the day. Facebook processes 500 terrabytes of information each day and 30 billion pieces of information are shared on Facebook each month.
To give her audience an understanding of how applied mathematicians approach this enormous problem of sifting through data, she used an example of trying to match an e-mail that comes from an unknown source to a series of e-mails that were received from known authors. The e-mail from the unknown source has three key words in it. In her example, she looks at the three e-mails and counts the number of times the key words were used. Then, the length of the sentence is measured to see how many words were used in each e-mail and in the query. Each word in each e-mail is counted and multiplied by the query to get a number. The words that are found in each e-mail and the query are squared and then divided by the sum. This method will help determine which of the e-mails is the author of the query.
If one were to look at every e-mail written each day, there would be about 294 billion e-mails to sort through and there is about 250,000 words in the English language, so it would be an enormous task to accomplish, but many mathematicians and statisticians use the Monte Carlo method to sample and narrow down the search.
She explained that using a randomized algorithmic approach was fast, easy to implement and simple to use and is as good as, or perhaps even better, than using a deterministic approach.
Ilse spoke to a packed room, including many science writers from the Triangle region, members of Sigma Xi and a high school physics class from Kestrel Heights, a local charter school.