Mutual Information-Based Clustering: Hard or Soft?

Konferenz: SCC 2017 - 11th International ITG Conference on Systems, Communications and Coding
06.02.2017 - 09.02.2017 in Hamburg, Germany

Tagungsband: SCC 2017

Seiten: 6Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Geiger, Bernhard C.; Amjad, Rana Ali (Institute for Communications Engineering, Technical University of Munich, Germany)

Inhalt:
We investigate mutual information as a cost function for clustering, and show in which cases hard, i.e., deterministic, clusters are optimal. Using convexity properties of mutual information, we show that certain formulations of the information bottleneck problem are solved by hard clusters. Similarly, hard clusters are optimal for the information-theoretic co-clustering problem that deals with simultaneous clustering of two dependent data sets. Hard clusters are not optimal in general for clustering a single dataset based on pairwise (dis-)similarities. We point at interesting and practically relevant special cases of this socalled pairwise clustering problem, for which we can either prove or have evidence that hard clusters are optimal. Our results thus show that one can relax the otherwise combinatorial hard clustering problem to a real-valued optimization problem with the same global optimum.