Machine Learning Coffee Seminar: Kai Puolamäki, University of Helsinki "Human-guided Data Exploration"
Human-guided data exploration
Professor of Computer Science, University of Helsinki
The outcome of the explorative data analysis (EDA) phase is vital for successful data analysis. EDA is more effective when the user interacts with the system used to carry out the exploration. A good EDA system has three requirements: (i) it must be able to model the information already known by the user and the information learned by the user, (ii) the user must be able to formulate the objectives, and (iii) the system must be able to show the user views that are maximally informative about desired features data that are not already know for the user. Furthermore, the system should be fast if used in interactive system. We present the Human Guided Data Exploration framework which satisfies these requirements and generalizes previous research,. This framework allows the user to incorporate existing knowledge into the exploration process, focus on exploring a subset of the data, and compare different complex hypotheses concerning relations in the data. The framework utilises a computationally efficient constrained randomization scheme. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world data sets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.
For references see:
Puolamäki Kang, Lijffijt, De Bie, 2016, Interactive Visual Data Exploration with Subjective Feedback. In Proc ECML PKDD, https://doi.org/10.1007/978-3-319-46227-1_14
Puolamäki, Oikarinen, Kang, Lijffijt, De Bie, 2018. Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach. In Proc ICDE 2018, https://arxiv.org/abs/1710.08167
Puolamäki, Oikarinen, Atli, Henelius, 2018. Human-guided data exploration using randomization. https://arxiv.org/abs/1805.07725