Data Science Challenges - Emotion, Obfuscation, and Pollution Event as iCalendar

(Seminars, Computer Science, Science Event Tags)

15 November 2017

12 - 1pm

Venue: Building 303S, Room 561

Location: City Campus

Host: Department of Computer Science

Speaker : Jörg Wicker

Abstract

In this talk, I will present data science challenges from the domains of emotional response analysis and privacy, and give an outlook on an application in the domain of biodegradation pathway prediction.

In a collaboration with the Max-Planck-Institute for Chemistry in Germany, we are working on the analysis of small molecules - volatile organic compounds (VOCs) - that are exhaled by humans.

While the response and communication among plants and animals using VOCs are widely studied, little is known about the response of humans to emotional stimuli in terms of exhalation of VOCs. To address this, we generated and analysed multiple large data sets, measured in a cinema during the screenings of movies.

I will give an overview on the initial analysis of the data set based on time series prediction, current work based on film features and change detection, and future plans in this project.

The second part will address an unsolved issue in data mining, namely the privacy of users on the internet. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data are collected and assembled by numerous service providers.

Existing solutions focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy.

In a a recent publication, we suggested a new approach, empowering the user to take advantage of the same tool the other side has, namely data mining to produce data which obfuscates the user's identity.

We apply this approach to search engine queries and use feedback of the search engines in terms of personalised advertisements in a reinforcement learning algorithm to generate new queries potentially confusing the search engine.