I will be starting as an assistant professor of CS at Rice University in Fall 2018.

(Only peer-reviewed and accepted papers; for the most recent drafts, please check my Google Scholar profile)

## Journals

• Quoc Tran Dinh, Anastasios Kyrillidis, and Volkan Cevher, A single-phase, proximal path-following framework”, Mathematics of Operations Research, 2017.

We propose a new proximal, path-following framework for a class of constrained convex problems. We consider settings where the nonlinear—and possibly non-smooth—objective part is endowed with a proximity operator, and the constraint set is equipped with a self-concordant barrier. Our approach relies on the following two main ideas. First, we re-parameterize the optimality condition as an auxiliary problem, such that a good initial point is available; by doing so, a family of alternative paths towards the optimum is generated. Second, we combine the proximal operator with path-following ideas to design a single-phase, proximal, path-following algorithm. We prove that our scheme has the same $O(\sqrt{\nu} \log(1/\varepsilon))$ worst-case iteration-complexity with standard approaches [34, 38] without requiring an initial phase, where $\nu$ is the barrier parameter and $\varepsilon$ is a desired accuracy. Our framework allows errors in the calculation of proximal- Newton directions, without sacrificing the worst-case iteration complexity. We demonstrate the merits of our algorithm via three numerical examples, where proximal operators play a key role.


@article{tran2016single,
title={A single-phase, proximal path-following framework},
author={Tran-Dinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan},
journal={arXiv preprint arXiv:1603.01681},
year={2016}
}

• Hemant Tyagi, Anastasios Kyrillidis, Bernd Gartner, and Andreas Krause, Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions”, Information and Inference: A journal of the IMA, 2017.

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $\phi$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allows for the presence of a sparse number of second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, with $|\mathcal{S}_1| \ll d, |\mathcal{S}_2| \ll d^2$, the function $f$ is now assumed to be of the form: $\sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}\phi_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$. Assuming we have the freedom to query $f$ anywhere in its domain, we derive efficient algorithms that provably recover $\mathcal{S}_1,\mathcal{S}_2$ with finite sample bounds. Our analysis covers the noiseless setting where exact samples of $f$ are obtained, and also extends to the noisy setting where the queries are corrupted with noise. For the noisy setting in particular, we consider two noise models namely: i.i.d Gaussian noise and arbitrary but bounded noise. Our main methods for identification of $\mathcal{S}_2$ essentially rely on estimation of sparse Hessian matrices, for which we provide two novel compressed sensing based schemes. Once $\mathcal{S}_1, \mathcal{S}_2$ are known, we show how the individual components $\phi_p$, $\phi_{(l,l^{\prime})}$ can be estimated via additional queries of $f$, with uniform error bounds. Lastly, we provide simulation results on synthetic data that validate our theoretical findings.


@article{tyagi2017algorithms,
title={Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions},
author={Tyagi, Hemant and Kyrillidis, Anastasios and G{\"a}rtner, Bernd and Krause, Andreas},
journal={Information and Inference: A journal of the IMA},
year={2017}
}

• Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis and Siddhartha Satpathi, Group-sparse model selection: Hardness and relaxations”, IEEE Trans. on Information Theory, 2016. (Authors listed in alphabetical order.)

Group-based sparsity models are instrumental in linear and non-linear regression problems. The main premise of these models is the recovery of “interpretable” signals through the identification of their constituent groups, which can also provably translate in substantial savings in the number of measurements for linear models in compressive sensing. In this paper, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues. In particular, we show that the group-model selection problem is equivalent to the well-known NP-hard weighted maximum coverage problem (WMC). Leveraging a graph-based understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, we show that popular groups structures can be explained by linear inequalities involving totally unimodular matrices, which afford other polynomial time algorithms based on relaxations. We also present a generalization of the group-model that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier between approximation error and sparsity budget of group- sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation trade-offs between our framework and the existing convex relaxations.


@article{baldassarre2013group,
title={Group-sparse model selection: Hardness and relaxations},
author={Baldassarre, Luca and Bhan, Nirav and Cevher, Volkan and Kyrillidis, Anastasios and Satpathi, Siddhartha},
journal={arXiv preprint arXiv:1303.3207},
year={2013}
}

• Georgios Skoumas, Dieter Pfoser, Anastasios Kyrillidis and Timos Sellis, Location estimation using crowdsourced spatial relations”, ACM Transactions on Spatial Algorithms and Systems, vol. 2, issue 2, 2016.

The “crowd” has become a very important geospatial data provider. Specifically, nonexpert users have been providing a wealth of quantitative geospatial data (e.g., geotagged tweets or photos, online). With spatial reasoning being a basic form of human cognition, textual narratives expressing user travel experiences (e.g., travel blogs) would provide an even bigger source of geospatial data. Narratives typically contain qualitative geospatial data in the form of objects and spatial relations (e.g., “St. John’s church is to the North of the Acropolis museum.” The scope of this work is (i) to extract these spatial relations from textual narratives, (ii) to quantify (model) them, and (iii) to reason about object locations based only on the quantified spatial relations. We use information extraction methods to identify toponyms and spatial relations, and we formulate a quantitative approach based on distance and orientation features to represent the latter. Probability density functions (PDFs) for spatial relations are determined by means of a greedy expectation maximization (EM)-based algorithm. These PDFs are then used to estimate unknown object locations. Experiments using a text corpus harvested from travel blog sites establish the considerable location estimation accuracy of the proposed approach on synthetic and real-world scenarios.


@article{skoumas2016location,
author = {Skoumas, Georgios and Pfoser, Dieter and Kyrillidis, Anastasios and Sellis, Timos},
title = {Location Estimation Using Crowdsourced Spatial Relations},
journal = {ACM Trans. Spatial Algorithms Syst.},
volume = {2},
number = {2},
year = {2016},
pages = {5:1--5:23},
publisher = {ACM},
}

• Quoc Tran Dinh, Anastasios Kyrillidis and Volkan Cevher, Composite self-concordant minimization”, Journal of Machine Learning Research, 16(Mar):371−416, 2015.

We propose a variable metric framework for minimizing the sum of a self-concordant function and a possibly non-smooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic step-size selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.


@article{tran2015composite,
title={Composite Self-Concordant Minimization},
author={Tran-Dinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan},
journal={Journal of Machine Learning Research},
volume={16},
pages={371--416},
year={2015}
}

• Michail Vlachos, Nikolaos Freris and Anastasios Kyrillidis, Compressive mining: fast and optimal data mining in the compressed domain”, Very Large Data Bases (VLDB) Journal, Volume 24 Issue 1, February 2015.

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier and wavelets). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations directly in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest estimation of the Euclidean norm or the correlation. We show that typical data analysis operations, such as nearest-neighbor search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.


@article{vlachos2015compressive,
title={Compressive mining: fast and optimal data mining in the compressed domain},
author={Vlachos, Michail and Freris, Nikolaos M and Kyrillidis, Anastasios},
journal={The VLDB Journal—The International Journal on Very Large Data Bases},
volume={24},
number={1},
pages={1--24},
year={2015},
publisher={Springer-Verlag New York, Inc.}
}

• Quoc Tran-Dinh, Anastasios Kyrillidis and Volkan Cevher, An inexact proximal path-following algorithm for constrained convex minimization”, SIAM Journal on Optimization (SIOPT), vol. 24, num. 4, p. 1718-1745, 2014.

Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets. In this paper, we address an important instance of this broad class where we assume that the nonsmooth objective is equipped with a tractable proximity operator and that the convex constraint set affords a self-concordant barrier. We provide a new joint treatment of proximal and self-concordant barrier concepts and illustrate that such problems can be efficiently solved, without the need of lifting the problem dimensions, as in disciplined convex optimization approach. We propose an inexact path-following algorithmic framework and theoretically characterize the worst-case analytical complexity of this framework when the proximal subproblems are solved inexactly. To show the merits of our framework, we apply its instances to both synthetic and real-world applications, where it shows advantages over standard interior point methods. As a byproduct, we describe how our framework can obtain points on the Pareto frontier of regularized problems with self-concordant objectives in a tuning free fashion.


@article{tran2014inexact,
title={An inexact proximal path-following algorithm for constrained convex minimization},
author={Tran-Dinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan},
journal={SIAM Journal on Optimization},
volume={24},
number={4},
pages={1718--1745},
year={2014},
publisher={SIAM}
}

• Anastasios Kyrillidis and George. N. Karystinos, Fixed-rank Rayleigh quotient maximization by an M-PSK sequence”, IEEE Trans. on Communications, Volume:62, Issue:3, pages 961-975, 2014.

Certain optimization problems in communication systems, such as limited-feedback constant-envelope beamforming or noncoherent M-ary phase-shift keying (MPSK) sequence detection, result in the maximization of a fixed-rank positive semidefinite quadratic form over the MPSK alphabet. This form is a special case of the Rayleigh quotient of a matrix and, in general, its maximization by an MPSK sequence is NP-hard. However, if the rank of the matrix is not a function of its size, then the optimal solution can be computed with polynomial complexity in the matrix size. In this work, we develop a new technique to efficiently solve this problem by utilizing auxiliary continuous-valued angles and partitioning the resulting continuous space of solutions into a polynomial-size set of regions, each of which corresponds to a distinct MPSK sequence. The sequence that maximizes the Rayleigh quotient is shown to belong to this polynomial-size set of sequences, thus efficiently reducing the size of the feasible set from exponential to polynomial. Based on this analysis, we also develop an algorithm that constructs this set in polynomial time and show that it is fully parallelizable, memory efficient, and rank scalable. The proposed algorithm compares favorably with other solvers for this problem that have appeared recently in the literature.


@article{kyrillidis2014fixed,
title={Fixed-rank Rayleigh quotient maximization by an MPSK sequence},
author={Kyrillidis, Anastasios and Karystinos, George N},
journal={Communications, IEEE Transactions on},
volume={62},
number={3},
pages={961--975},
year={2014},
publisher={IEEE}
}

• Anastasios Kyrillidis and Volkan Cevher, Matrix recipes for hard thresholding methods”, Journal of Mathematical Imaging and Vision (JMIV), April 2013, Springer.

In this paper, we present and analyze a new set of low-rank recovery algorithms for linear inverse problems within the class of hard thresholding methods. We provide strategies on how to set up these algorithms via basic ingredients for different configurations to achieve complexity vs. accuracy tradeoffs. Moreover, we study acceleration schemes via memory-based techniques and randomized, ϵ-approximate matrix projections to decrease the computational costs in the recovery process. For most of the configurations, we present theoretical analysis that guarantees convergence under mild problem conditions. Simulation results demonstrate notable performance improvements as compared to state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.


@article{kyrillidis2014matrix,
title={Matrix recipes for hard thresholding methods},
author={Kyrillidis, Anastasios and Cevher, Volkan},
journal={Journal of mathematical imaging and vision},
volume={48},
number={2},
pages={235--265},
year={2014},
publisher={Springer}
}

• Nikolaos D. Sidiropoulos and Anastasios Kyrillidis, Multi-way compressed sensing for sparse low rank tensors”, IEEE Signal Processing Letters, 19(11):757-760, Oct. 2012.

For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements - in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, but no results are available for multilinear models of tensor data. In this contribution, we consider compressed sensing for sparse and low-rank tensors. More specifically, we consider low-rank tensors synthesized as sums of outer products of sparse loading vectors, and a special class of linear dimensionality-reducing transformations that reduce each mode individually. We prove interesting oracle' properties showing that it is possible to identify the uncompressed sparse loadings directly from the compressed tensor data. The proofs naturally suggest a two-step recovery process: fitting a low-rank model in compressed domain, followed by per-mode $\ell_0$ / $\ell_1$ de-compression. This two-step process is also appealing from a computational complexity and memory capacity point of view, especially for big tensor datasets.


@article{sidiropoulos2012multi,
title={Multi-way compressed sensing for sparse low-rank tensors},
author={Sidiropoulos, Nicholas D and Kyrillidis, Anastasios},
journal={Signal Processing Letters, IEEE},
volume={19},
number={11},
pages={757--760},
year={2012},
publisher={IEEE}
}


## Conference papers

• Rajiv Khanna and Anastasios Kyrillidis, IHT dies hard: Provable accelerated Iterative Hard Thresholding”, Twenty-first International Conference on Artificial Intelligence and Statistics (AISTATS-18), 2018.

We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --"rippling" and linear-- depending on the level of momentum.


@article{khanna2018IHT,
title={IHT dies hard: Provable accelerated Iterative Hard Thresholding},
author={Rajiv Khanna and Anastasios Kyrillidis},
journal={Twenty-first International Conference on Artificial Intelligence and Statistics (AISTATS-18)},
year={2018}
}

• Tianyang Li, Liu Liu, Anastasios Kyrillidis, and Constantine Caramanis, Statistical inference using SGD”, Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018.

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.


@article{li2018statistical,
title={Statistical inference using SGD},
author={Tianyang Li, Liu Liu, Anastasios Kyrillidis, and Constantine Caramanis},
journal={Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)},
year={2018}
}

• Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi, Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach”, AI & Statistics Conference (AISTATS), 2017.

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.


@article{park2017non,
title={Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach},
author={Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi},
journal={AI & Statistics Conference (AISTATS)},
year={2017}
}

• Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi, Dropping convexity for faster semi-definite optimization”, Conference on Learning Theory (COLT), 2016.

We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semi-definite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r\leq n$. We study the performance of gradient descent on $g$---which we refer to as Factored Gradient Descent (FGD)---under standard assumptions on the original function $f$. We provide a rule for selecting the step size and, with this choice, show that the local convergence rate of FGD mirrors that of standard gradient descent on the original $f$: i.e., after $k$ steps, the error is $O(1/k)$ for smooth $f$, and exponentially small in $k$ when $f$ is (restricted) strongly convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to $f$ via a first-order oracle; for several problem instances, such proper initialization leads to global convergence guarantees. FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.


@article{bhojanapalli2016dropping,
title={Dropping convexity for faster semi-definite optimization},
author={Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi},
journal={Conference on Learning Theory (COLT)},
year={2016}
}

• Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, and Russell Poldrack, A simple and provable algorithm for sparse diagonal CCA”, International Conference on Machine Learning (ICML), 2016.

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated. Sparse CCA is NP-hard. We propose a novel combinatorial algorithm for sparse diagonal CCA, i.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical data-dependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.


@article{asterix2016simple,
title={A simple and provable algorithm for sparse diagonal CCA},
author={Asterix, Megasthenis and Kyrillidis, Anastasios and Koyejo, Oluwasanmi and Poldrack, Russell},
journal={arXiv preprint arXiv:1605.08961},
year={2016}
}

• Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc Tran-Dinh, Luca Baldassarre, and Volkan Cevher, Convex block-sparse linear regression with expanders, provably”, AI & Statistics Conference (AISTATS), 2016.

Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution lives in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting -- as opposed to using dense sensing matrices, while showing a competitive recovery performance.


@article{kyrillidis2016convex,
title={Convex block-sparse linear regression with expanders--provably},
author={Kyrillidis, Anastasios and Bah, Bubacarr and Hasheminezhad, Rouzbeh and Tran-Dinh, Quoc and Baldassarre, Luca and Cevher, Volkan},
journal={arXiv preprint arXiv:1603.06313},
year={2016}
}

• Hemant Tyagi, Anastasios Kyrillidis, Bernd Gartner, and Andreas Krause, Learning sparse additive models with interactions in high dimensions”, AI & Statistics Conference (AISTATS), 2016.

A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is referred to as a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$, where $\mathcal{S} \subset [d]$, $|\mathcal{S}| \ll d$. Assuming $\phi_l$'s and $\mathcal{S}$ to be unknown, the problem of estimating $f$ from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for *second order* interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l_1, l_2) \in \mathcal{S}_2}\phi_{(l_1, l_2)} \mathbf{x}_{(l_1, l_2)}.$ Assuming $\phi_{p},\phi_{(l_1, l_2)}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and *exactly recovers* $\mathcal{S}_1,\mathcal{S}_2$. Consequently, this also enables us to estimate the underlying $\phi_p, \phi_{(l_1, l_2)}$. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise -- either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.


@article{kyrillidis2016convex,
title={Learning Sparse Additive Models with Interactions in High Dimensions},
author={Tyagi, Hemant and Kyrillidis, Anastasios and Gartner, Bernd and Krause, Andreas},
journal={arXiv preprint arXiv:1604.05307},
year={2016}
}

• Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, and Alex Dimakis, Bipartite correlation clustering: Maximizing agreements”, AI & Statistics Conference (AISTATS), 2016.

In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph G with +' and -' edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all +' edges within clusters plus all -' edges cut across clusters. BCC is known to be NP-hard. We present a novel approximation algorithm for k-BCC, a variant of BCC with an upper bound k on the number of clusters. Our algorithm outputs a k-clustering that provably achieves a number of agreements within a multiplicative (1−δ)-factor from the optimal, for any desired accuracy δ. It relies on solving a combinatorially constrained bilinear maximization on the bi-adjacency matrix of G. It runs in time exponential in k and δ−1, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an (1−δ)-approximation can be achieved by O(δ−1) clusters regardless of the size of the graph. In turn, our k-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.


@article{asteris2016bipartite,
title={Bipartite Correlation Clustering--Maximizing Agreements},
author={Asteris, Megasthenis and Kyrillidis, Anastasios and Papailiopoulos, Dimitris and Dimakis, Alexandros G},
journal={arXiv preprint arXiv:1603.02782},
year={2016}
}

• Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, and Alex Dimakis, Space PCA via bipartite matchings”, Neural Information Processing Systems (NIPS), 2015.

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. Such components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to $1$ from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the input data. We evaluate our algorithm on real datasets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.


@inproceedings{asteris2015sparse,
title={Sparse pca via bipartite matchings},
author={Asteris, Megasthenis and Papailiopoulos, Dimitris and Kyrillidis, Anastasios and Dimakis, Alexandros G},
booktitle={Advances in Neural Information Processing Systems},
pages={766--774},
year={2015}
}

• Megasthenis Asteris, Anastasios Kyrillidis, Alex Dimakis, Han-Gyol Yi and Bharath Chandrasekaran, Stay on path: PCA along graph paths”, International Conference on Machine Learning (ICML), 2015.

We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph~$G$ on $p$ vertices corresponding to variables, the non-zero entries of the extracted principal component must coincide with vertices lying along a path in $G$. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a non-convex quadratic maximization on the empirical covariance. We introduce a simple network and analyze the estimator under the spiked covariance model. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.


@inproceedings{asteris2015stay,
title={Stay on path: PCA along graph paths},
author={Asteris, Megasthenis and Kyrillidis, Anastasios and Dimakis, EDU Alexandros G and Yi, Han-Gyol and Chandrasekaran, Bharath},
booktitle={Proceedings of the 32nd International Conference on Machine Learning (ICML-15)},
volume={37},
year={2015}
}

• Michail Vlachos, Francesco Fusco, Harry Mavroforakis, Anastasios Kyrillidis and Vassilis Vasileiadis, Improving Co-Cluster Quality with Application to Product Recommendations”, ACM CIKM International Conference on Information and Knowledge Management, 2014.

Businesses store an ever increasing amount of historical customer sales data. Given the availability of such information, it is advantageous to analyze past sales, both for revealing dominant buying patterns, and for providing more targeted recommendations to clients. In this context, co-clustering has proved to be an important datamodeling primitive for revealing latent connections between two sets of entities, such as customers and products. In this work, we introduce a new algorithm for co-clustering that is both scalable and highly resilient to noise. Our method is inspired by $k$-Means and agglomerative hierarchical clustering approaches: $(i)$ first it searches for elementary co-clustering structures and $(ii)$ then combines them into a better, more compact, solution. The algorithm is flexible as it does not require an explicit number of co-clusters as input, and is directly applicable on large data graphs. We apply our methodology on real sales data to analyze and visualize the connections between clients and products. We showcase a real deployment of the system, and how it has been used for driving a recommendation engine. Finally, we demonstrate that the new methodology can discover co-clusters of better quality and relevance than state-of-the-art co-clustering techniques.


@inproceedings{vlachos2014improving,
title={Improving co-cluster quality with application to product recommendations},
author={Vlachos, Michail and Fusco, Francesco and Mavroforakis, Charalambos and Kyrillidis, Anastasios and Vassiliadis, Vassilios G},
booktitle={Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management},
pages={679--688},
year={2014},
organization={ACM}
}

• Dimitris Papailiopoulos, Anastasios Kyrillidis and Christos Boutsidis, Provable deterministic leverage scores sampling”, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2014.

We explain theoretically a curious empirical phenomenon: “Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate”. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.


@inproceedings{papailiopoulos2014provable,
title={Provable deterministic leverage score sampling},
author={Papailiopoulos, Dimitris and Kyrillidis, Anastasios and Boutsidis, Christos},
booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining},
pages={997--1006},
year={2014},
organization={ACM}
}

• Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc Tran-Dinh and Volkan Cevher, Scalable sparse covariance estimation via self-concordance”, AAAI Conference on Artificial Intelligence (AAAI-14), 2014.

We explain theoretically a curious empirical phenomenon: “Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate”. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.


@inproceedings{kyrillidis2014scalable,
title={Scalable Sparse Covariance Estimation via Self-Concordance},
author={Kyrillidis, Anastasios and Mahabadi, Rabeeh Karimi and Dinh, Quoc Tran and Cevher, Volkan},
booktitle={Twenty-Eighth AAAI Conference on Artificial Intelligence},
year={2014}
}

• Anastasios Kyrillidis, Michail Vlachos and Anastasios Zouzias, Approximate matrix multiplication with application to linear embeddings”, IEEE ISIT Symposium, 2014.

In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos, introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace – on which we project the input matrices – has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.


@inproceedings{kyrillidis2014approximate,
title={Approximate matrix multiplication with application to linear embeddings},
author={Kyrillidis, Anastasios and Vlachos, Michail and Zouzias, Anastasios},
booktitle={2014 IEEE International Symposium on Information Theory},
pages={2182--2186},
year={2014},
organization={IEEE}
}

• Anastasios Kyrillidis and Anastasios Zouzias, Non-uniform feature sampling in decision tree ensembles”, IEEE ICASSP, 2014.

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ leverage scores-based and $(ii)$ norm-based feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm.


@inproceedings{kyrillidis2014non,
title={Non-uniform feature sampling for decision tree ensembles},
author={Kyrillidis, Anastasios and Zouzias, Anastasios},
booktitle={2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={4548--4552},
year={2014},
organization={IEEE}
}

• George Skoumas, Dieter Pfoser and Anastasios Kyrillidis, On quantifying qualitative geospatial data: A probabilistic approach”, ACM GEOCROWD, 2013.

Living in the era of data deluge, we have witnessed a web content explosion, largely due to the massive availability of User-Generated Content (UGC). In this work, we specifically consider the problem of geospatial information extraction and representation, where one can exploit diverse sources of information (such as image and audio data, text data, etc), going beyond traditional volunteered geographic information. Our ambition is to include available narrative information in an effort to better explain geospatial relationships: with spatial reasoning being a basic form of human cognition, narratives expressing such experiences typically contain qualitative spatial data, i.e., spatial objects and spatial relationships. To this end, we formulate a quantitative approach for the representation of qualitative spatial relations extracted from UGC in the form of texts. The proposed method quantifies such relations based on multiple text observations. Such observations provide distance and orientation features which are utilized by a greedy Expectation Maximization-based (EM) algorithm to infer a probability distribution over predefined spatial relationships; the latter represent the quantified relationships under user-defined probabilistic assumptions. We evaluate the applicability and quality of the proposed approach using real UGC data originating from an actual travel blog text corpus. To verify the result quality, we generate grid-based “maps” visualizing the spatial extent of the various relations.


@inproceedings{skoumas2013quantifying,
title={On quantifying qualitative geospatial data: a probabilistic approach},
author={Skoumas, Georgios and Pfoser, Dieter and Kyrillidis, Anastasios},
booktitle={Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information},
pages={71--78},
year={2013},
organization={ACM}
}

• Stephen Becker, Volkan Cevher and Anastasios Kyrillidis, Randomized low-memory singular value projection”, 10th International Conference on Sampling Theory and Applications (SampTA), 2013. (Authors listed in alphabetical order.)

Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recovery scheme that merges the two steps with randomized approximations, and as a result, operates on space proportional to the degrees of freedom in the problem. We theoretically establish the estimation guarantees of the algorithm as a function of approximation tolerance. While the theoretical approximation requirements are overly pessimistic, we demonstrate that in practice the algorithm performs well on the quantum tomography recovery problem.


@inproceedings{becker2013randomized,
title={Randomized Low-Memory Singular Value Projection},
author={Becker, Stephen and Cevher, Volkan and Kyrillidis, Anastasios},
booktitle={10th International Conference on Sampling Theory and Applications (Sampta)},
year={2013}
}

• Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and Christoph Koch, Sparse projections onto the simplex”, International Conference on Machine Learning (ICML), 2013.

Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$-norm. However, several important learning applications cannot benefit from this approach as they feature these convex norms as constraints in addition to the non-convex rank and sparsity constraints. In this setting, we derive efficient sparse projections onto the simplex and its extension, and illustrate how to use them to solve high-dimensional learning problems in quantum tomography, sparse density estimation and portfolio selection with non-convex constraints.


@inproceedings{kyrillidis2013sparse,
title={Sparse projections onto the simplex},
author={Kyrillidis, Anastasios and Becker, Stephen and Cevher, Volkan and Koch, Christoph},
booktitle={Proceedings of The 30th International Conference on Machine Learning},
pages={235--243},
year={2013}
}

• Quoc Tran Dinh, Anastasios Kyrillidis and Volkan Cevher, A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions”, International Conference on Machine Learning (ICML), 2013.

We propose an algorithmic framework for convex minimization problems of composite functions with two terms: a self-concordant part and a possibly nonsmooth regularization part. Our method is a new proximal Newton algorithm with local quadratic convergence rate. As a specific problem instance, we consider sparse precision matrix estimation problems in graph learning. Via a careful dual formulation and a novel analytic stepsize selection, we instantiate an algorithm within our framework for graph learning that avoids Cholesky decompositions and matrix inversions, making it attractive for parallel and distributed implementations.


@inproceedings{dinh2013proximal,
title={A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions},
author={Dinh, Quoc T and Kyrillidis, Anastasios and Cevher, Volkan},
booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML-13)},
pages={271--279},
year={2013}
}

• Anastasios Kyrillidis and Volkan Cevher, Fast proximal algorithms for self-concordant minimization with application to sparse graph selection”, IEEE ICASSP, 2013.

The convex $\ell_1$-regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the $\ell_1$-regularization is nonsmooth, the log det objective is not globally Lipschitz gradient function, and the problem is high-dimensional. Using the selfconcordant property of the objective, we propose a new adaptive step size selection and present the (F)PS ((F)ast Proximal algorithms for Self-concordant functions) algorithmic framework which has linear convergence and exhibits superior empirical results as compared to state-of-the-art first order methods.


@inproceedings{kyrillidis2013fast,
title={Fast proximal algorithms for self-concordant function minimization with application to sparse graph selection},
author={Kyrillidis, Anastasios and Cevher, Volkan},
booktitle={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
pages={6585--6589},
year={2013},
organization={IEEE}
}

• Anastasios Kyrillidis and Volkan Cevher, Matrix ALPS: Accelerated low rank and sparse matrix reconstruction”, IEEE SSP, 2012.

We propose Matrix ALPS for recovering a sparse plus low-rank decomposition of a matrix given its corrupted and incomplete linear measurements. Our approach is a first-order projected gradient method over non-convex sets, and it exploits a well-known memory-based acceleration technique. We theoretically characterize the convergence properties of Matrix ALPS using the stable embedding properties of the linear measurement operator. We then numerically illustrate that our algorithm outperforms the existing convex as well as non-convex state-of-the-art algorithms in computational efficiency without sacrificing stability.


@inproceedings{kyrillidis2012matrix,
title={Matrix {ALPS}: {A}ccelerated low rank and sparse matrix reconstruction},
author={Kyrillidis, Anastasios and Cevher, Volkan},
booktitle={2012 IEEE Statistical Signal Processing Workshop (SSP)},
pages={185--188},
year={2012},
organization={IEEE}
}

• Anastasios Kyrillidis and Volkan Cevher, Combinatorial selection and least absolute shrinkage via the CLASH algorithm”, IEEE ISIT, 2012.

The least absolute shrinkage and selection operator (LASSO) for linear regression exploits the geometric interplay of the $\ell_2$-data error objective and the $\ell_1$-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of LASSO to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) operator. We provide concrete guidelines how to leverage combinatorial constraints within CLASH, and characterize CLASH’s guarantees as a function of the set restricted isometry constants of the sensing matrix. Finally, our experimental results show that CLASH can outperform both LASSO and model-based compressive sensing in sparse estimation.


@inproceedings{kyrillidis2012combinatorial,
title={Combinatorial selection and least absolute shrinkage via the {CLASH} algorithm},
author={Kyrillidis, Anastasios and Cevher, Volkan},
booktitle={Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on},
pages={2216--2220},
year={2012},
organization={IEEE}
}

• Anastasios Kyrillidis, Gilles Puy and Volkan Cevher, Hard thresholding with norm constraints”, IEEE ICASSP, 2012.

We introduce a new sparse recovery paradigm, called Normed Pursuits, where efficient algorithms from combinatorial and convex optimization interface for interpretable and model-based solutions. Synthetic and real data experiments illustrate that Normed Pursuits can significantly enhance the performance of both hard thresholding methods and convex solvers in sparse recovery.


@inproceedings{kyrillidis2012hard,
title={Hard thresholding with norm constraints},
author={Kyrillidis, Anastasios and Puy, Gilles and Cevher, Volkan},
booktitle={2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={3645--3648},
year={2012},
organization={IEEE}
}

• Anastasios Kyrillidis and Volkan Cevher, Recipes on hard thresholding methods”, 4th IEEE CAMSAP, 2011.

Compressive sensing (CS) is a data acquisition and recovery technique for finding sparse solutions to linear inverse problems from sub-Nyquist measurements. CS features a wide range of computationally efficient and robust signal recovery methods, based on sparsity seeking optimization. In this paper, we present and analyze a class of sparse recovery algorithms, known as hard thresholding methods. We provide optimal strategies on how to set up these algorithms via basic “ingredients” for different configurations to achieve complexity vs. accuracy tradeoffs. Simulation results demonstrate notable performance improvements compared to state-of-the-art algorithms both in terms of data reconstruction and computational complexity.


@inproceedings{kyrillidis2011recipes,
title={Recipes on hard thresholding methods},
author={Kyrillidis, Anastasios and Cevher, Volkan},
booktitle={4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)},
year={2011}
}

• Anastasios Kyrillidis and George. N. Karystinos, Rank-deficient quadratic-form maximization over M-phase alphabet: Polynomial-complexity solvability and algorithmic developments”, IEEE ICASSP, 2011.

The maximization of a positive (semi)definite complex quadratic form over a finite alphabet is NP-hard and achieved through exhaustive search when the form has full rank. However, if the form is rank-deficient, the optimal solution can be computed with only polynomial complexity in the length $N$ of the maximizing vector. In this work, we consider the general case of a rank-$D$ positive (semi)definite complex quadratic form and develop a method that maximizes the form with respect to a $M$-phase vector with polynomial complexity. The proposed method efficiently reduces the size of the feasible set from exponential to polynomial. We also develop an algorithm that constructs the polynomial-size candidate set in polynomial time and observe that it is fully parallelizable and rank-scalable.


@inproceedings{kyrillidis2011rank,
title={Rank-deficient quadratic-form maximization over $M$-phase alphabet: Polynomial-complexity solvability and algorithmic developments},
author={Kyrillidis, Anastasios and Karystinos, George},
booktitle={2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={3856--3859},
year={2011},
organization={IEEE}
}


## Book chapters

• Volkan Cevher, Sina Jafarpour and Anastasios Kyrillidis, Linear inverse problems with norm and sparsity constraints”, in Practical Applications of Sparse Modeling, Sept. 2014, MIT Press, (Authors listed in alphabetical order).

We describe two nonconventional algorithms for linear regression, called GAME and CLASH. The salient characteristics of these approaches is that they exploit the convex $\ell_1$-ball and non-convex $\ell_0$-sparsity constraints jointly in sparse recovery. To establish the theoretical approximation guarantees of GAME and CLASH, we cover an interesting range of topics from game theory, convex and combinatorial optimization. We illustrate that these approaches lead to improved theoretical guarantees and empirical performance beyond convex and non-convex solvers alone.


@article{cevher2014linear,
title={Linear inverse problems with norm and sparsity constraints},
author={Cevher, Volkan and Jafarpour, Sina and Kyrillidis, Anastasios},
journal={Practical Applications of Sparse Modeling},
pages={179},
year={2014},
publisher={MIT Press}
}

• Anastasios Kyrillidis, Luca Baldassarre, Marwa El-Halabi, Quoc Tran-Dinh and Volkan Cevher, Structured sparsity: discrete and convex approaches”, in Compressed sensing and its application, Springer, 2014.

Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated structured sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.


@incollection{kyrillidis2015structured,
title={Structured sparsity: Discrete and convex approaches},
author={Kyrillidis, Anastasios and Baldassarre, Luca and El Halabi, Marwa and Tran-Dinh, Quoc and Cevher, Volkan},
booktitle={Compressed Sensing and its Applications},
pages={341--387},
year={2015},
publisher={Springer}
}
`