Three papers accepted to AISTATS 2016

Three papers are accepted at AI & Statistics (AISTATS) conference, that will be held in Cadiz, Spain. More information (explanation, codes, etc) will be posted soon!

Learning sparse additive models with interactions in high dimensions: Joint work with Hemant Tyagi, Bernd Gartner and Andreas Krause.

Abstract. A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is referred to as a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$, where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $\phi_l$’s and $\mathcal{S}$ to be unknown, the problem of estimating $f$ from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l_1, l_2) \in \mathcal{S}_2}\phi_{(l_1, l_2)} \mathbf{x}_{(l_1, l_2)}.$ Assuming $\phi_{p},\phi_{(l_1, l_2)}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and exactly recovers $\mathcal{S}_1,\mathcal{S}_2$. Consequently, this also enables us to estimate the underlying $\phi_p, \phi_{(l_1, l_2)}$. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise – either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.

Convex block-sparse linear regression with expanders, provably: Joint work with Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, and Volkan Cevher.

Abstract. Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in spite of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution leaves in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting – as opposed to using dense sensing matrices, while showing a competitive recovery performance.

Bipartite correlation clustering – Maximizing agreements: Joint work with Megasthenis Asteris, Dimitris Papailiopoulos, and Alex Dimakis.

Abstract. In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph $G$ with ‘+’ and ‘-‘ edges, and we seek a vertex clustering that maximizes the number of *agreements: the number of all ‘+’ edges within clusters plus all ‘-‘ edges cut across clusters. BCC is known to be NP-hard. We present a novel algorithm for $k$-BCC, a variant of BCC with an upper bound $k$ on the number of clusters. Our algorithm outputs a $k$-clustering that provably achieves a number of agreements within a multiplicative ${(1-\delta)}$-factor from the optimal, for any desired accuracy $\delta$. It relies on solving a bilinear maximization on the bi-adjacency matrix of the graph, subject to combinatorial constraints. Our algorithm runs in time exponential in $k$ and $\frac{1}{\delta}$, but linear in the size of the input. In the (unconstrained) BCC setting, we show that $O(\delta^{-1})$ clusters suffice to achieve an ${(1-\delta)}$-factor approximation regardless of the size of the graph. In turn, our $k$-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements. We supplement our theoretical results with experiments on synthetic and real datasets.