(Only peerreviewed and accepted papers; for the most recent drafts, please check my Google Scholar profile)
Journals

Quoc Tran Dinh, Anastasios Kyrillidis, and Volkan Cevher, ``A singlephase, proximal pathfollowing framework”, Mathematics of Operations Research, 2017.
We propose a new proximal, pathfollowing framework for a class of constrained convex problems. We consider settings where the nonlinear—and possibly nonsmooth—objective part is endowed with a proximity operator, and the constraint set is equipped with a selfconcordant barrier. Our approach relies on the following two main ideas. First, we reparameterize the optimality condition as an auxiliary problem, such that a good initial point is available; by doing so, a family of alternative paths towards the optimum is generated. Second, we combine the proximal operator with pathfollowing ideas to design a singlephase, proximal, pathfollowing algorithm. We prove that our scheme has the same $O(\sqrt{\nu} \log(1/\varepsilon))$ worstcase iterationcomplexity with standard approaches [34, 38] without requiring an initial phase, where $\nu$ is the barrier parameter and $\varepsilon$ is a desired accuracy. Our framework allows errors in the calculation of proximal Newton directions, without sacrificing the worstcase iteration complexity. We demonstrate the merits of our algorithm via three numerical examples, where proximal operators play a key role.
@article{tran2016single, title={A singlephase, proximal pathfollowing framework}, author={TranDinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan}, journal={arXiv preprint arXiv:1603.01681}, year={2016} }

Hemant Tyagi, Anastasios Kyrillidis, Bernd Gartner, and Andreas Krause, ``Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions”, Information and Inference: A journal of the IMA, 2017.
A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$ where $\mathcal{S} \subset [d]$, $\mathcal{S} \ll d$. Assuming $\phi$'s, $\mathcal{S}$ to be unknown, there exists extensive work for estimating $f$ from its samples. In this work, we consider a generalized version of SPAMs, that also allows for the presence of a sparse number of second order interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, with $\mathcal{S}_1 \ll d, \mathcal{S}_2 \ll d^2$, the function $f$ is now assumed to be of the form: $\sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l,l^{\prime}) \in \mathcal{S}_2}\phi_{(l,l^{\prime})} (x_l,x_{l^{\prime}})$. Assuming we have the freedom to query $f$ anywhere in its domain, we derive efficient algorithms that provably recover $\mathcal{S}_1,\mathcal{S}_2$ with finite sample bounds. Our analysis covers the noiseless setting where exact samples of $f$ are obtained, and also extends to the noisy setting where the queries are corrupted with noise. For the noisy setting in particular, we consider two noise models namely: i.i.d Gaussian noise and arbitrary but bounded noise. Our main methods for identification of $\mathcal{S}_2$ essentially rely on estimation of sparse Hessian matrices, for which we provide two novel compressed sensing based schemes. Once $\mathcal{S}_1, \mathcal{S}_2$ are known, we show how the individual components $\phi_p$, $\phi_{(l,l^{\prime})}$ can be estimated via additional queries of $f$, with uniform error bounds. Lastly, we provide simulation results on synthetic data that validate our theoretical findings.
@article{tyagi2017algorithms, title={Algorithms for Learning Sparse Additive Models with Interactions in High Dimensions}, author={Tyagi, Hemant and Kyrillidis, Anastasios and G{\"a}rtner, Bernd and Krause, Andreas}, journal={Information and Inference: A journal of the IMA}, year={2017} }

Luca Baldassarre, Nirav Bhan, Volkan Cevher, Anastasios Kyrillidis and Siddhartha Satpathi, ``Groupsparse model selection: Hardness and relaxations”, IEEE Trans. on Information Theory, 2016. (Authors listed in alphabetical order.)
Groupbased sparsity models are instrumental in linear and nonlinear regression problems. The main premise of these models is the recovery of “interpretable” signals through the identification of their constituent groups, which can also provably translate in substantial savings in the number of measurements for linear models in compressive sensing. In this paper, we establish a combinatorial framework for groupmodel selection problems and highlight the underlying tractability issues. In particular, we show that the groupmodel selection problem is equivalent to the wellknown NPhard weighted maximum coverage problem (WMC). Leveraging a graphbased understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, we show that popular groups structures can be explained by linear inequalities involving totally unimodular matrices, which afford other polynomial time algorithms based on relaxations. We also present a generalization of the groupmodel that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier between approximation error and sparsity budget of group sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation tradeoffs between our framework and the existing convex relaxations.
@article{baldassarre2013group, title={Groupsparse model selection: Hardness and relaxations}, author={Baldassarre, Luca and Bhan, Nirav and Cevher, Volkan and Kyrillidis, Anastasios and Satpathi, Siddhartha}, journal={arXiv preprint arXiv:1303.3207}, year={2013} }

Georgios Skoumas, Dieter Pfoser, Anastasios Kyrillidis and Timos Sellis, ``Location estimation using crowdsourced spatial relations”, ACM Transactions on Spatial Algorithms and Systems, vol. 2, issue 2, 2016.
The “crowd” has become a very important geospatial data provider. Specifically, nonexpert users have been providing a wealth of quantitative geospatial data (e.g., geotagged tweets or photos, online). With spatial reasoning being a basic form of human cognition, textual narratives expressing user travel experiences (e.g., travel blogs) would provide an even bigger source of geospatial data. Narratives typically contain qualitative geospatial data in the form of objects and spatial relations (e.g., “St. John’s church is to the North of the Acropolis museum.” The scope of this work is (i) to extract these spatial relations from textual narratives, (ii) to quantify (model) them, and (iii) to reason about object locations based only on the quantified spatial relations. We use information extraction methods to identify toponyms and spatial relations, and we formulate a quantitative approach based on distance and orientation features to represent the latter. Probability density functions (PDFs) for spatial relations are determined by means of a greedy expectation maximization (EM)based algorithm. These PDFs are then used to estimate unknown object locations. Experiments using a text corpus harvested from travel blog sites establish the considerable location estimation accuracy of the proposed approach on synthetic and realworld scenarios.
@article{skoumas2016location, author = {Skoumas, Georgios and Pfoser, Dieter and Kyrillidis, Anastasios and Sellis, Timos}, title = {Location Estimation Using Crowdsourced Spatial Relations}, journal = {ACM Trans. Spatial Algorithms Syst.}, volume = {2}, number = {2}, year = {2016}, pages = {5:15:23}, publisher = {ACM}, }

Quoc Tran Dinh, Anastasios Kyrillidis and Volkan Cevher, ``Composite selfconcordant minimization”, Journal of Machine Learning Research, 16(Mar):371−416, 2015.
We propose a variable metric framework for minimizing the sum of a selfconcordant function and a possibly nonsmooth convex function, endowed with an easily computable proximal operator. We theoretically establish the convergence of our framework without relying on the usual Lipschitz gradient assumption on the smooth part. An important highlight of our work is a new set of analytic stepsize selection and correction procedures based on the structure of the problem. We describe concrete algorithmic instances of our framework for several interesting applications and demonstrate them numerically on both synthetic and real data.
@article{tran2015composite, title={Composite SelfConcordant Minimization}, author={TranDinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan}, journal={Journal of Machine Learning Research}, volume={16}, pages={371416}, year={2015} }

Michail Vlachos, Nikolaos Freris and Anastasios Kyrillidis, ``Compressive mining: fast and optimal data mining in the compressed domain”, Very Large Data Bases (VLDB) Journal, Volume 24 Issue 1, February 2015.
Realworld data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier and wavelets). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations directly in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest estimation of the Euclidean norm or the correlation. We show that typical data analysis operations, such as nearestneighbor search or kMeans clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCAbased techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCAbased compression. The contributions of this work are generic as our methodology is applicable to any sequential or highdimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.
@article{vlachos2015compressive, title={Compressive mining: fast and optimal data mining in the compressed domain}, author={Vlachos, Michail and Freris, Nikolaos M and Kyrillidis, Anastasios}, journal={The VLDB Journal—The International Journal on Very Large Data Bases}, volume={24}, number={1}, pages={124}, year={2015}, publisher={SpringerVerlag New York, Inc.} }

Quoc TranDinh, Anastasios Kyrillidis and Volkan Cevher, ``An inexact proximal pathfollowing algorithm for constrained convex minimization”, SIAM Journal on Optimization (SIOPT), vol. 24, num. 4, p. 17181745, 2014.
Many scientific and engineering applications feature nonsmooth convex minimization problems over convex sets. In this paper, we address an important instance of this broad class where we assume that the nonsmooth objective is equipped with a tractable proximity operator and that the convex constraint set affords a selfconcordant barrier. We provide a new joint treatment of proximal and selfconcordant barrier concepts and illustrate that such problems can be efficiently solved, without the need of lifting the problem dimensions, as in disciplined convex optimization approach. We propose an inexact pathfollowing algorithmic framework and theoretically characterize the worstcase analytical complexity of this framework when the proximal subproblems are solved inexactly. To show the merits of our framework, we apply its instances to both synthetic and realworld applications, where it shows advantages over standard interior point methods. As a byproduct, we describe how our framework can obtain points on the Pareto frontier of regularized problems with selfconcordant objectives in a tuning free fashion.
@article{tran2014inexact, title={An inexact proximal pathfollowing algorithm for constrained convex minimization}, author={TranDinh, Quoc and Kyrillidis, Anastasios and Cevher, Volkan}, journal={SIAM Journal on Optimization}, volume={24}, number={4}, pages={17181745}, year={2014}, publisher={SIAM} }

Anastasios Kyrillidis and George. N. Karystinos, ``Fixedrank Rayleigh quotient maximization by an MPSK sequence”, IEEE Trans. on Communications, Volume:62, Issue:3, pages 961975, 2014.
Certain optimization problems in communication systems, such as limitedfeedback constantenvelope beamforming or noncoherent Mary phaseshift keying (MPSK) sequence detection, result in the maximization of a fixedrank positive semidefinite quadratic form over the MPSK alphabet. This form is a special case of the Rayleigh quotient of a matrix and, in general, its maximization by an MPSK sequence is NPhard. However, if the rank of the matrix is not a function of its size, then the optimal solution can be computed with polynomial complexity in the matrix size. In this work, we develop a new technique to efficiently solve this problem by utilizing auxiliary continuousvalued angles and partitioning the resulting continuous space of solutions into a polynomialsize set of regions, each of which corresponds to a distinct MPSK sequence. The sequence that maximizes the Rayleigh quotient is shown to belong to this polynomialsize set of sequences, thus efficiently reducing the size of the feasible set from exponential to polynomial. Based on this analysis, we also develop an algorithm that constructs this set in polynomial time and show that it is fully parallelizable, memory efficient, and rank scalable. The proposed algorithm compares favorably with other solvers for this problem that have appeared recently in the literature.
@article{kyrillidis2014fixed, title={Fixedrank Rayleigh quotient maximization by an MPSK sequence}, author={Kyrillidis, Anastasios and Karystinos, George N}, journal={Communications, IEEE Transactions on}, volume={62}, number={3}, pages={961975}, year={2014}, publisher={IEEE} }

Anastasios Kyrillidis and Volkan Cevher, ``Matrix recipes for hard thresholding methods”, Journal of Mathematical Imaging and Vision (JMIV), April 2013, Springer.
In this paper, we present and analyze a new set of lowrank recovery algorithms for linear inverse problems within the class of hard thresholding methods. We provide strategies on how to set up these algorithms via basic ingredients for different configurations to achieve complexity vs. accuracy tradeoffs. Moreover, we study acceleration schemes via memorybased techniques and randomized, ϵapproximate matrix projections to decrease the computational costs in the recovery process. For most of the configurations, we present theoretical analysis that guarantees convergence under mild problem conditions. Simulation results demonstrate notable performance improvements as compared to stateoftheart algorithms both in terms of reconstruction accuracy and computational complexity.
@article{kyrillidis2014matrix, title={Matrix recipes for hard thresholding methods}, author={Kyrillidis, Anastasios and Cevher, Volkan}, journal={Journal of mathematical imaging and vision}, volume={48}, number={2}, pages={235265}, year={2014}, publisher={Springer} }

Nikolaos D. Sidiropoulos and Anastasios Kyrillidis, ``Multiway compressed sensing for sparse low rank tensors”, IEEE Signal Processing Letters, 19(11):757760, Oct. 2012.
For linear models, compressed sensing theory and methods enable recovery of sparse signals of interest from few measurements  in the order of the number of nonzero entries as opposed to the length of the signal of interest. Results of similar flavor have more recently emerged for bilinear models, but no results are available for multilinear models of tensor data. In this contribution, we consider compressed sensing for sparse and lowrank tensors. More specifically, we consider lowrank tensors synthesized as sums of outer products of sparse loading vectors, and a special class of linear dimensionalityreducing transformations that reduce each mode individually. We prove interesting `oracle' properties showing that it is possible to identify the uncompressed sparse loadings directly from the compressed tensor data. The proofs naturally suggest a twostep recovery process: fitting a lowrank model in compressed domain, followed by permode $\ell_0$ / $\ell_1$ decompression. This twostep process is also appealing from a computational complexity and memory capacity point of view, especially for big tensor datasets.
@article{sidiropoulos2012multi, title={Multiway compressed sensing for sparse lowrank tensors}, author={Sidiropoulos, Nicholas D and Kyrillidis, Anastasios}, journal={Signal Processing Letters, IEEE}, volume={19}, number={11}, pages={757760}, year={2012}, publisher={IEEE} }
Conference papers

Rajiv Khanna and Anastasios Kyrillidis, ``IHT dies hard: Provable accelerated Iterative Hard Thresholding”, Twentyfirst International Conference on Artificial Intelligence and Statistics (AISTATS18), 2018.
We study both in theory and practice the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with nonconvex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and FrankWolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed "rippling" and linear depending on the level of momentum.
@article{khanna2018IHT, title={IHT dies hard: Provable accelerated Iterative Hard Thresholding}, author={Rajiv Khanna and Anastasios Kyrillidis}, journal={Twentyfirst International Conference on Artificial Intelligence and Statistics (AISTATS18)}, year={2018} }

Tianyang Li, Liu Liu, Anastasios Kyrillidis, and Constantine Caramanis, ``Statistical inference using SGD”, ThirtySecond AAAI Conference on Artificial Intelligence (AAAI18), 2018.
We present a novel method for frequentist statistical inference in $M$estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the OrnsteinUhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGDbased inference procedure is a first order method, and is wellsuited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
@article{li2018statistical, title={Statistical inference using SGD}, author={Tianyang Li, Liu Liu, Anastasios Kyrillidis, and Constantine Caramanis}, journal={ThirtySecond AAAI Conference on Artificial Intelligence (AAAI18)}, year={2018} }

Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi, ``Nonsquare matrix sensing without spurious local minima via the BurerMonteiro approach”, AI & Statistics Conference (AISTATS), 2017.
We consider the nonsquare matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the nonconvex formulation, where any rank$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In this paper, we complement recent findings on the nonconvex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.
@article{park2017non, title={Nonsquare matrix sensing without spurious local minima via the BurerMonteiro approach}, author={Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi}, journal={AI & Statistics Conference (AISTATS)}, year={2017} }

Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi, ``Dropping convexity for faster semidefinite optimization”, Conference on Learning Theory (COLT), 2016.
We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semidefinite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r\leq n$. We study the performance of gradient descent on $g$which we refer to as Factored Gradient Descent (FGD)under standard assumptions on the original function $f$. We provide a rule for selecting the step size and, with this choice, show that the local convergence rate of FGD mirrors that of standard gradient descent on the original $f$: i.e., after $k$ steps, the error is $O(1/k)$ for smooth $f$, and exponentially small in $k$ when $f$ is (restricted) strongly convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to $f$ via a firstorder oracle; for several problem instances, such proper initialization leads to global convergence guarantees. FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.
@article{bhojanapalli2016dropping, title={Dropping convexity for faster semidefinite optimization}, author={Srinadh Bhojanapalli, Anastasios Kyrillidis, and Sujay Sanghavi}, journal={Conference on Learning Theory (COLT)}, year={2016} }

Megasthenis Asteris, Anastasios Kyrillidis, Oluwasanmi Koyejo, and Russell Poldrack, ``A simple and provable algorithm for sparse diagonal CCA”, International Conference on Machine Learning (ICML), 2016.
Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated. Sparse CCA is NPhard. We propose a novel combinatorial algorithm for sparse diagonal CCA, i.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical datadependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.
@article{asterix2016simple, title={A simple and provable algorithm for sparse diagonal CCA}, author={Asterix, Megasthenis and Kyrillidis, Anastasios and Koyejo, Oluwasanmi and Poldrack, Russell}, journal={arXiv preprint arXiv:1605.08961}, year={2016} }

Anastasios Kyrillidis, Bubacarr Bah, Rouzbeh Hasheminezhad, Quoc TranDinh, Luca Baldassarre, and Volkan Cevher, ``Convex blocksparse linear regression with expanders, provably”, AI & Statistics Conference (AISTATS), 2016.
Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and runtime. Prompted by this observation, we study a convex optimization scheme for blocksparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few nonzeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a nonconvex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the modelbased norm, while assuring that solution lives in the model. To achieve this, we show that sparse modelbased matrices satisfy a group version of the nullspace property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting  as opposed to using dense sensing matrices, while showing a competitive recovery performance.
@article{kyrillidis2016convex, title={Convex blocksparse linear regression with expandersprovably}, author={Kyrillidis, Anastasios and Bah, Bubacarr and Hasheminezhad, Rouzbeh and TranDinh, Quoc and Baldassarre, Luca and Cevher, Volkan}, journal={arXiv preprint arXiv:1603.06313}, year={2016} }

Hemant Tyagi, Anastasios Kyrillidis, Bernd Gartner, and Andreas Krause, ``Learning sparse additive models with interactions in high dimensions”, AI & Statistics Conference (AISTATS), 2016.
A function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is referred to as a Sparse Additive Model (SPAM), if it is of the form $f(\mathbf{x}) = \sum_{l \in \mathcal{S}}\phi_{l}(x_l)$, where $\mathcal{S} \subset [d]$, $\mathcal{S} \ll d$. Assuming $\phi_l$'s and $\mathcal{S}$ to be unknown, the problem of estimating $f$ from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for *second order* interaction terms. For some $\mathcal{S}_1 \subset [d], \mathcal{S}_2 \subset {[d] \choose 2}$, the function $f$ is assumed to be of the form: $f(\mathbf{x}) = \sum_{p \in \mathcal{S}_1}\phi_{p} (x_p) + \sum_{(l_1, l_2) \in \mathcal{S}_2}\phi_{(l_1, l_2)} \mathbf{x}_{(l_1, l_2)}.$ Assuming $\phi_{p},\phi_{(l_1, l_2)}$, $\mathcal{S}_1$ and, $\mathcal{S}_2$ to be unknown, we provide a randomized algorithm that queries $f$ and *exactly recovers* $\mathcal{S}_1,\mathcal{S}_2$. Consequently, this also enables us to estimate the underlying $\phi_p, \phi_{(l_1, l_2)}$. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise  either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.
@article{kyrillidis2016convex, title={Learning Sparse Additive Models with Interactions in High Dimensions}, author={Tyagi, Hemant and Kyrillidis, Anastasios and Gartner, Bernd and Krause, Andreas}, journal={arXiv preprint arXiv:1604.05307}, year={2016} }

Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, and Alex Dimakis, ``Bipartite correlation clustering: Maximizing agreements”, AI & Statistics Conference (AISTATS), 2016.
In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph G with `+' and `' edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all `+' edges within clusters plus all `' edges cut across clusters. BCC is known to be NPhard. We present a novel approximation algorithm for kBCC, a variant of BCC with an upper bound k on the number of clusters. Our algorithm outputs a kclustering that provably achieves a number of agreements within a multiplicative (1−δ)factor from the optimal, for any desired accuracy δ. It relies on solving a combinatorially constrained bilinear maximization on the biadjacency matrix of G. It runs in time exponential in k and δ−1, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an (1−δ)approximation can be achieved by O(δ−1) clusters regardless of the size of the graph. In turn, our kBCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.
@article{asteris2016bipartite, title={Bipartite Correlation ClusteringMaximizing Agreements}, author={Asteris, Megasthenis and Kyrillidis, Anastasios and Papailiopoulos, Dimitris and Dimakis, Alexandros G}, journal={arXiv preprint arXiv:1603.02782}, year={2016} }

Megasthenis Asteris, Dimitris Papailiopoulos, Anastasios Kyrillidis, and Alex Dimakis, ``Space PCA via bipartite matchings”, Neural Information Processing Systems (NIPS), 2015.
We consider the following multicomponent sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. Such components can be computed one by one, repeatedly solving the singlecomponent problem and deflating the input data matrix, but this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to $1$ from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data, but exponentially in its rank. However, it can be effectively applied on a lowdimensional sketch of the input data. We evaluate our algorithm on real datasets and empirically demonstrate that in many cases it outperforms existing, deflationbased approaches.
@inproceedings{asteris2015sparse, title={Sparse pca via bipartite matchings}, author={Asteris, Megasthenis and Papailiopoulos, Dimitris and Kyrillidis, Anastasios and Dimakis, Alexandros G}, booktitle={Advances in Neural Information Processing Systems}, pages={766774}, year={2015} }

Megasthenis Asteris, Anastasios Kyrillidis, Alex Dimakis, HanGyol Yi and Bharath Chandrasekaran, ``Stay on path: PCA along graph paths”, International Conference on Machine Learning (ICML), 2015.
We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph~$G$ on $p$ vertices corresponding to variables, the nonzero entries of the extracted principal component must coincide with vertices lying along a path in $G$. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a nonconvex quadratic maximization on the empirical covariance. We introduce a simple network and analyze the estimator under the spiked covariance model. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.
@inproceedings{asteris2015stay, title={Stay on path: PCA along graph paths}, author={Asteris, Megasthenis and Kyrillidis, Anastasios and Dimakis, EDU Alexandros G and Yi, HanGyol and Chandrasekaran, Bharath}, booktitle={Proceedings of the 32nd International Conference on Machine Learning (ICML15)}, volume={37}, year={2015} }

Michail Vlachos, Francesco Fusco, Harry Mavroforakis, Anastasios Kyrillidis and Vassilis Vasileiadis, ``Improving CoCluster Quality with Application to Product Recommendations”, ACM CIKM International Conference on Information and Knowledge Management, 2014.
Businesses store an ever increasing amount of historical customer sales data. Given the availability of such information, it is advantageous to analyze past sales, both for revealing dominant buying patterns, and for providing more targeted recommendations to clients. In this context, coclustering has proved to be an important datamodeling primitive for revealing latent connections between two sets of entities, such as customers and products. In this work, we introduce a new algorithm for coclustering that is both scalable and highly resilient to noise. Our method is inspired by $k$Means and agglomerative hierarchical clustering approaches: $(i)$ first it searches for elementary coclustering structures and $(ii)$ then combines them into a better, more compact, solution. The algorithm is flexible as it does not require an explicit number of coclusters as input, and is directly applicable on large data graphs. We apply our methodology on real sales data to analyze and visualize the connections between clients and products. We showcase a real deployment of the system, and how it has been used for driving a recommendation engine. Finally, we demonstrate that the new methodology can discover coclusters of better quality and relevance than stateoftheart coclustering techniques.
@inproceedings{vlachos2014improving, title={Improving cocluster quality with application to product recommendations}, author={Vlachos, Michail and Fusco, Francesco and Mavroforakis, Charalambos and Kyrillidis, Anastasios and Vassiliadis, Vassilios G}, booktitle={Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management}, pages={679688}, year={2014}, organization={ACM} }

Dimitris Papailiopoulos, Anastasios Kyrillidis and Christos Boutsidis, ``Provable deterministic leverage scores sampling”, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2014.
We explain theoretically a curious empirical phenomenon: “Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good lowrank matrix surrogate”. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep powerlaw decay. We support this powerlaw assumption by providing empirical evidence that such decay laws are abundant in realworld data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the stateoftheart techniques.
@inproceedings{papailiopoulos2014provable, title={Provable deterministic leverage score sampling}, author={Papailiopoulos, Dimitris and Kyrillidis, Anastasios and Boutsidis, Christos}, booktitle={Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining}, pages={9971006}, year={2014}, organization={ACM} }

Anastasios Kyrillidis, Rabeeh Karimi Mahabadi, Quoc TranDinh and Volkan Cevher, ``Scalable sparse covariance estimation via selfconcordance”, AAAI Conference on Artificial Intelligence (AAAI14), 2014.
We explain theoretically a curious empirical phenomenon: “Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good lowrank matrix surrogate”. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep powerlaw decay. We support this powerlaw assumption by providing empirical evidence that such decay laws are abundant in realworld data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the stateoftheart techniques.
@inproceedings{kyrillidis2014scalable, title={Scalable Sparse Covariance Estimation via SelfConcordance}, author={Kyrillidis, Anastasios and Mahabadi, Rabeeh Karimi and Dinh, Quoc Tran and Cevher, Volkan}, booktitle={TwentyEighth AAAI Conference on Artificial Intelligence}, year={2014} }

Anastasios Kyrillidis, Michail Vlachos and Anastasios Zouzias, ``Approximate matrix multiplication with application to linear embeddings”, IEEE ISIT Symposium, 2014.
In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionalityreductionbased approximation algorithm due to Sarlos, introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace – on which we project the input matrices – has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear lowdimensional embeddings. Namely, we show that any Euclidean pointset with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.
@inproceedings{kyrillidis2014approximate, title={Approximate matrix multiplication with application to linear embeddings}, author={Kyrillidis, Anastasios and Vlachos, Michail and Zouzias, Anastasios}, booktitle={2014 IEEE International Symposium on Information Theory}, pages={21822186}, year={2014}, organization={IEEE} }

Anastasios Kyrillidis and Anastasios Zouzias, ``Nonuniform feature sampling in decision tree ensembles”, IEEE ICASSP, 2014.
We study the effectiveness of nonuniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ leverage scoresbased and $(ii)$ normbased feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm.
@inproceedings{kyrillidis2014non, title={Nonuniform feature sampling for decision tree ensembles}, author={Kyrillidis, Anastasios and Zouzias, Anastasios}, booktitle={2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={45484552}, year={2014}, organization={IEEE} }

George Skoumas, Dieter Pfoser and Anastasios Kyrillidis, ``On quantifying qualitative geospatial data: A probabilistic approach”, ACM GEOCROWD, 2013.
Living in the era of data deluge, we have witnessed a web content explosion, largely due to the massive availability of UserGenerated Content (UGC). In this work, we specifically consider the problem of geospatial information extraction and representation, where one can exploit diverse sources of information (such as image and audio data, text data, etc), going beyond traditional volunteered geographic information. Our ambition is to include available narrative information in an effort to better explain geospatial relationships: with spatial reasoning being a basic form of human cognition, narratives expressing such experiences typically contain qualitative spatial data, i.e., spatial objects and spatial relationships. To this end, we formulate a quantitative approach for the representation of qualitative spatial relations extracted from UGC in the form of texts. The proposed method quantifies such relations based on multiple text observations. Such observations provide distance and orientation features which are utilized by a greedy Expectation Maximizationbased (EM) algorithm to infer a probability distribution over predefined spatial relationships; the latter represent the quantified relationships under userdefined probabilistic assumptions. We evaluate the applicability and quality of the proposed approach using real UGC data originating from an actual travel blog text corpus. To verify the result quality, we generate gridbased “maps” visualizing the spatial extent of the various relations.
@inproceedings{skoumas2013quantifying, title={On quantifying qualitative geospatial data: a probabilistic approach}, author={Skoumas, Georgios and Pfoser, Dieter and Kyrillidis, Anastasios}, booktitle={Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information}, pages={7178}, year={2013}, organization={ACM} }

Stephen Becker, Volkan Cevher and Anastasios Kyrillidis, ``Randomized lowmemory singular value projection”, 10th International Conference on Sampling Theory and Applications (SampTA), 2013. (Authors listed in alphabetical order.)
Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recovery scheme that merges the two steps with randomized approximations, and as a result, operates on space proportional to the degrees of freedom in the problem. We theoretically establish the estimation guarantees of the algorithm as a function of approximation tolerance. While the theoretical approximation requirements are overly pessimistic, we demonstrate that in practice the algorithm performs well on the quantum tomography recovery problem.
@inproceedings{becker2013randomized, title={Randomized LowMemory Singular Value Projection}, author={Becker, Stephen and Cevher, Volkan and Kyrillidis, Anastasios}, booktitle={10th International Conference on Sampling Theory and Applications (Sampta)}, year={2013} }

Anastasios Kyrillidis, Stephen Becker, Volkan Cevher and Christoph Koch, ``Sparse projections onto the simplex”, International Conference on Machine Learning (ICML), 2013.
Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the $\ell_1$norm. However, several important learning applications cannot benefit from this approach as they feature these convex norms as constraints in addition to the nonconvex rank and sparsity constraints. In this setting, we derive efficient sparse projections onto the simplex and its extension, and illustrate how to use them to solve highdimensional learning problems in quantum tomography, sparse density estimation and portfolio selection with nonconvex constraints.
@inproceedings{kyrillidis2013sparse, title={Sparse projections onto the simplex}, author={Kyrillidis, Anastasios and Becker, Stephen and Cevher, Volkan and Koch, Christoph}, booktitle={Proceedings of The 30th International Conference on Machine Learning}, pages={235243}, year={2013} }

Quoc Tran Dinh, Anastasios Kyrillidis and Volkan Cevher, ``A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions”, International Conference on Machine Learning (ICML), 2013.
We propose an algorithmic framework for convex minimization problems of composite functions with two terms: a selfconcordant part and a possibly nonsmooth regularization part. Our method is a new proximal Newton algorithm with local quadratic convergence rate. As a specific problem instance, we consider sparse precision matrix estimation problems in graph learning. Via a careful dual formulation and a novel analytic stepsize selection, we instantiate an algorithm within our framework for graph learning that avoids Cholesky decompositions and matrix inversions, making it attractive for parallel and distributed implementations.
@inproceedings{dinh2013proximal, title={A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions}, author={Dinh, Quoc T and Kyrillidis, Anastasios and Cevher, Volkan}, booktitle={Proceedings of the 30th International Conference on Machine Learning (ICML13)}, pages={271279}, year={2013} }

Anastasios Kyrillidis and Volkan Cevher, ``Fast proximal algorithms for selfconcordant minimization with application to sparse graph selection”, IEEE ICASSP, 2013.
The convex $\ell_1$regularized log det divergence criterion has been shown to produce theoretically consistent graph learning. However, this objective function is challenging since the $\ell_1$regularization is nonsmooth, the log det objective is not globally Lipschitz gradient function, and the problem is highdimensional. Using the selfconcordant property of the objective, we propose a new adaptive step size selection and present the (F)PS ((F)ast Proximal algorithms for Selfconcordant functions) algorithmic framework which has linear convergence and exhibits superior empirical results as compared to stateoftheart first order methods.
@inproceedings{kyrillidis2013fast, title={Fast proximal algorithms for selfconcordant function minimization with application to sparse graph selection}, author={Kyrillidis, Anastasios and Cevher, Volkan}, booktitle={2013 IEEE International Conference on Acoustics, Speech and Signal Processing}, pages={65856589}, year={2013}, organization={IEEE} }

Anastasios Kyrillidis and Volkan Cevher, ``Matrix ALPS: Accelerated low rank and sparse matrix reconstruction”, IEEE SSP, 2012.
We propose Matrix ALPS for recovering a sparse plus lowrank decomposition of a matrix given its corrupted and incomplete linear measurements. Our approach is a firstorder projected gradient method over nonconvex sets, and it exploits a wellknown memorybased acceleration technique. We theoretically characterize the convergence properties of Matrix ALPS using the stable embedding properties of the linear measurement operator. We then numerically illustrate that our algorithm outperforms the existing convex as well as nonconvex stateoftheart algorithms in computational efficiency without sacrificing stability.
@inproceedings{kyrillidis2012matrix, title={Matrix {ALPS}: {A}ccelerated low rank and sparse matrix reconstruction}, author={Kyrillidis, Anastasios and Cevher, Volkan}, booktitle={2012 IEEE Statistical Signal Processing Workshop (SSP)}, pages={185188}, year={2012}, organization={IEEE} }

Anastasios Kyrillidis and Volkan Cevher, ``Combinatorial selection and least absolute shrinkage via the CLASH algorithm”, IEEE ISIT, 2012.
The least absolute shrinkage and selection operator (LASSO) for linear regression exploits the geometric interplay of the $\ell_2$data error objective and the $\ell_1$norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of LASSO to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) operator. We provide concrete guidelines how to leverage combinatorial constraints within CLASH, and characterize CLASH’s guarantees as a function of the set restricted isometry constants of the sensing matrix. Finally, our experimental results show that CLASH can outperform both LASSO and modelbased compressive sensing in sparse estimation.
@inproceedings{kyrillidis2012combinatorial, title={Combinatorial selection and least absolute shrinkage via the {CLASH} algorithm}, author={Kyrillidis, Anastasios and Cevher, Volkan}, booktitle={Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on}, pages={22162220}, year={2012}, organization={IEEE} }

Anastasios Kyrillidis, Gilles Puy and Volkan Cevher, ``Hard thresholding with norm constraints”, IEEE ICASSP, 2012.
We introduce a new sparse recovery paradigm, called Normed Pursuits, where efficient algorithms from combinatorial and convex optimization interface for interpretable and modelbased solutions. Synthetic and real data experiments illustrate that Normed Pursuits can significantly enhance the performance of both hard thresholding methods and convex solvers in sparse recovery.
@inproceedings{kyrillidis2012hard, title={Hard thresholding with norm constraints}, author={Kyrillidis, Anastasios and Puy, Gilles and Cevher, Volkan}, booktitle={2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={36453648}, year={2012}, organization={IEEE} }

Anastasios Kyrillidis and Volkan Cevher, ``Recipes on hard thresholding methods”, 4th IEEE CAMSAP, 2011.
Compressive sensing (CS) is a data acquisition and recovery technique for finding sparse solutions to linear inverse problems from subNyquist measurements. CS features a wide range of computationally efficient and robust signal recovery methods, based on sparsity seeking optimization. In this paper, we present and analyze a class of sparse recovery algorithms, known as hard thresholding methods. We provide optimal strategies on how to set up these algorithms via basic “ingredients” for different configurations to achieve complexity vs. accuracy tradeoffs. Simulation results demonstrate notable performance improvements compared to stateoftheart algorithms both in terms of data reconstruction and computational complexity.
@inproceedings{kyrillidis2011recipes, title={Recipes on hard thresholding methods}, author={Kyrillidis, Anastasios and Cevher, Volkan}, booktitle={4th IEEE International Workshop on Computational Advances in MultiSensor Adaptive Processing (CAMSAP)}, year={2011} }

Anastasios Kyrillidis and George. N. Karystinos, ``Rankdeficient quadraticform maximization over Mphase alphabet: Polynomialcomplexity solvability and algorithmic developments”, IEEE ICASSP, 2011.
The maximization of a positive (semi)definite complex quadratic form over a finite alphabet is NPhard and achieved through exhaustive search when the form has full rank. However, if the form is rankdeficient, the optimal solution can be computed with only polynomial complexity in the length $N$ of the maximizing vector. In this work, we consider the general case of a rank$D$ positive (semi)definite complex quadratic form and develop a method that maximizes the form with respect to a $M$phase vector with polynomial complexity. The proposed method efficiently reduces the size of the feasible set from exponential to polynomial. We also develop an algorithm that constructs the polynomialsize candidate set in polynomial time and observe that it is fully parallelizable and rankscalable.
@inproceedings{kyrillidis2011rank, title={Rankdeficient quadraticform maximization over $M$phase alphabet: Polynomialcomplexity solvability and algorithmic developments}, author={Kyrillidis, Anastasios and Karystinos, George}, booktitle={2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={38563859}, year={2011}, organization={IEEE} }
Book chapters

Volkan Cevher, Sina Jafarpour and Anastasios Kyrillidis, ``Linear inverse problems with norm and sparsity constraints”, in Practical Applications of Sparse Modeling, Sept. 2014, MIT Press, (Authors listed in alphabetical order).
We describe two nonconventional algorithms for linear regression, called GAME and CLASH. The salient characteristics of these approaches is that they exploit the convex $\ell_1$ball and nonconvex $\ell_0$sparsity constraints jointly in sparse recovery. To establish the theoretical approximation guarantees of GAME and CLASH, we cover an interesting range of topics from game theory, convex and combinatorial optimization. We illustrate that these approaches lead to improved theoretical guarantees and empirical performance beyond convex and nonconvex solvers alone.
@article{cevher2014linear, title={Linear inverse problems with norm and sparsity constraints}, author={Cevher, Volkan and Jafarpour, Sina and Kyrillidis, Anastasios}, journal={Practical Applications of Sparse Modeling}, pages={179}, year={2014}, publisher={MIT Press} }

Anastasios Kyrillidis, Luca Baldassarre, Marwa ElHalabi, Quoc TranDinh and Volkan Cevher, ``Structured sparsity: discrete and convex approaches”, in Compressed sensing and its application, Springer, 2014.
Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, nonadaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated structured sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.
@incollection{kyrillidis2015structured, title={Structured sparsity: Discrete and convex approaches}, author={Kyrillidis, Anastasios and Baldassarre, Luca and El Halabi, Marwa and TranDinh, Quoc and Cevher, Volkan}, booktitle={Compressed Sensing and its Applications}, pages={341387}, year={2015}, publisher={Springer} }
Theses
 Anastasios Kyrillidis, ``Rigorous optimization recipes for sparse and low rank inverse problems with applications in data sciences”, Ph.D. Thesis, School of Computer and Communications, EPFL, September 2014.
 Anastasios Kyrillidis, ``Polynomialcomplexity computation of the Mphase vector that maximizes a rankdeficient quadratic form”, M.Sc. Thesis, Dept. Electronic Engineering and Computer Science, Technical University of Crete, July 2010.