Eric C. Chi
Eric C. Chi
, Rice UniversityIn this talk, I will discuss a convex formulation of the clustering problem. Formulating clustering as a convex program addresses well-known issues of instability and parameter selection in mainstream approaches. Moreover, the convex clustering solution can be integrated seamlessly into more sophisticated data analysis methods. To illustrate
this point, I show how convex clustering can be used to construct a well-behaved solution to the biclustering problem. In the biclustering problem, we seek to simultaneously group observations and features. The biclustering problem arises in many domains, ranging from text mining and collaborative filtering to identifying structure in high-dimensional
genomic data. In the latter, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. I present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. My approach generates an
entire solution path of possible biclusters as a single tuning parameter is varied. I also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of this work are its simplicity, interpretability, and algorithmic guarantees. I demonstrate the advantages of my approach, which include stably and reproducibly identifying biclusterings on simulated and real microarray data. To conclude, I discuss how the convex biclustering problem is just one example of a broader class of problems with an unknown grouping structure, and how the basic convex clustering formulation can be generalized to solve them.