Hebbian Principal Component Clustering for Information Retrieval on a Crowdsourcing Platform

Konferenz: NDES 2012 - Nonlinear Dynamics of Electronic Systems
11.07.2012 - 13.07.2012 in Wolfenbüttel, Germany

Tagungsband: NDES 2012

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Niederberger, Thomas; Stoop, Norbert; Ott, Thomas (ZHAW Zurich University of Applied Sciences, Switzerland)
Christen, Markus (University of Zurich, Switzerland & University of Notre Dame, USA)

Crowdsourcing, a distributed process that involves outsourcing tasks to a network of people, is increasingly used by companies for generating solutions to problems of various kinds. In this way, thousands of people contribute a large amount of text data that needs to already be structured during the process of idea generation in order to avoid repetitions and to maximize the solution space. This is a hard information retrieval problem as the texts are very short and have little predefined structure. We present a solution that involves three steps: text data preprocessing, clustering, and visualization. In this contribution, we focus on clustering and visualization by presenting a Hebbian network approach that is able to learn the principal components of the data while the data set is continuously growing in size. We compare our approach to standard clustering applications and demonstrate its superiority with respect to classification reliability on a real-world example.