In our previous post, we threw 15 GPUs at Max-3-Cut and pushed the rank-2 algorithm to \(n = 3{,}600\) nodes. The rank-2 approach is powerful — it enumerates \(O(n^3)\) candidate solutions and provably contains the global optimizer when the objective has rank 2 — but its cubic scaling hits a wall around a few thousand nodes, even with parallelism.
What if we drop one rank? The rank-1 algorithm reduces the candidate set to \(O(n)\) and needs no GPU at all. The trade-off seems clear: less computation, worse quality. But three algorithmic insights change this picture dramatically:
Together, these let us solve million-node graphs in minutes on a laptop-class CPU — and the hybrid solver beats greedy local search on every single instance we tested.
A key theme of this work is that rank-1 serves a dual role. As a standalone solver, the rank-1 phase sweep produces a complete partition by rounding eigenvector phases to roots of unity. On graphs with strong spectral structure — torus, lattice, and other highly regular families — this solution is already optimal or near-optimal. On torus graphs, rank-1 beats every other method we tested, including simulated annealing.
But rank-1 also serves as an initializer for other algorithms. Local search methods like greedy hill-climbing are fast but trapped by their starting point: from a random initialization, they converge to whatever local optimum is nearest. The rank-1 solution provides a spectrally informed starting point that encodes global structure of the graph — information that pure local search is blind to. We call this combination Hybrid: run rank-1 first, then run greedy local search starting from the rank-1 partition instead of a random one.
The distinction matters because the hybrid solver is not simply “rank-1 but better.” It is a different algorithm that composes spectral global information (rank-1) with local optimization (greedy). The two components have complementary strengths: rank-1 captures global structure but ignores local details; greedy refines local details but is blind to global structure. Together, they consistently outperform either method alone.
Recall the setup. Given a graph Laplacian \(\mathbf{Q}\) and its top eigenvector \(\mathbf{v}\), the rank-1 phase sweep constructs an initial partition \(\mathbf{z}\) from \(\mathbf{v}\)'s phases, then sweeps through \(n\) boundary points, flipping one node at a time. At each step, we need the new score \(\mathbf{z}^\dagger \mathbf{Q} \mathbf{z}\).
The naive approach recomputes this score from scratch: multiply \(\mathbf{Q}\mathbf{z}\), then dot with \(\mathbf{z}\). Each evaluation costs \(O(\text{nnz})\) for a sparse matrix with nnz non-zeros. Over \(n\) sweeps, the total is \(O(n \cdot \text{nnz})\). For a 5-regular graph (where nnz \(= 5n\)), this gives \(O(n^2)\) — manageable at \(n = 1{,}000\), but 17 hours at \(n = 100{,}000\).
When one node \(i\) flips from \(z_\text{old}\) to \(z_\text{new}\), the score change has a closed-form expression:
where \(\delta z = z_\text{new} - z_\text{old}\). This is \(O(1)\) per candidate. The catch is maintaining the cached product \(\mathbf{Q}\mathbf{z}\), which requires updating the rows affected by the flip: \((\mathbf{Q}\mathbf{z})_j \mathrel{+}= Q_{ji} \cdot \delta z\) for each neighbor \(j\) of node \(i\).
For a graph with average degree \(d\), each update touches \(d\) entries. Over \(n\) flips, the total cost is \(O(n \cdot d)\) instead of \(O(n^2)\). On a 5-regular graph, that is a \(n/5\) speedup — a factor of 20,000 at \(n = 100{,}000\).
The critical implementation detail: directly index into the CSC sparse matrix structure. A seemingly equivalent approach — Q.getcol(i).toarray() — allocates an \(n\)-dimensional vector on every call, turning \(O(d)\) into \(O(n)\).
For Max-3-Cut, we need to assign each node to one of three partitions, represented by the cube roots of unity: \(\{1, \omega, \omega^2\}\) where \(\omega = e^{2\pi i/3}\). The rank-1 algorithm maps eigenvector entries to partitions based on their complex phase angles.
The problem: graph Laplacians are real symmetric, so their eigenvectors are real-valued. A real number has a complex phase of either 0 or \(\pi\), mapping to at most 2 of the 3 partitions. One-third of the solution space is unreachable.
Instead of using a single real eigenvector, combine the top two:
Now \(\text{arg}(q_i)\) spans the full \([0, 2\pi)\) circle, and all three partitions are reachable. The cost is one extra eigenvalue computation — negligible compared to the sweep.
The effect on torus graphs is striking: with the fix, rank-1 finds the best solution among all methods, including simulated annealing. Without it, rank-1 is limited to a bipartition of a 3-colorable structure.
With incremental scoring, the sweep phase scales linearly in \(n \cdot d\). The bottleneck shifts to the eigenvalue computation (ARPACK via scipy.sparse.linalg.eigsh), which scales roughly as \(O(\text{nnz} \cdot k_\text{iter})\).
We tested five methods on 45 graph instances across five families: 5-regular, toroidal grid, Erdős–Rényi, Delaunay triangulations, and road networks.
Synthetic: We test on three standard graph families at sizes \(n = 10{,}000\) to \(n = 1{,}000{,}000\), each with 3 random seeds: 5-regular (every node has exactly 5 neighbors), toroidal grid (periodic 2D lattice), and Erdős–Rényi \(G(n, 10/n)\).
Real-world: We include two families from standard benchmark collections:
roadNet-PA and roadNet-TX are the full road networks of Pennsylvania and Texas, where nodes represent intersections and edges represent road segments. These are large, sparse, near-planar graphs with low average degree and pronounced geographic community structure.Across all 45 instances, the hybrid solver (rank-1 + greedy) produces a better cut than greedy alone. The margin ranges from +0.3% on road networks to +3.9% on torus graphs. This is not a fluke — the spectral warm-start consistently guides greedy to a better local optimum.
On toroidal grids, rank-1 beats every other method at every scale from 10K to 1M nodes. The spectral structure of the torus is perfectly captured by two eigenvectors. Greedy, which is blind to global structure, finds a solution 3.9% worse. Even SA — with its ability to escape local optima — falls 0.5–2.1% short.
On 5-regular graphs, SA is the best method at \(n \leq 100{,}000\), beating hybrid by about 1%. But at \(n \geq 500{,}000\), the tables turn: SA's fixed iteration budget becomes insufficient, while hybrid's warm-start advantage holds. At \(n = 1{,}000{,}000\), hybrid beats SA by 1.1%.
On Delaunay meshes (1K–524K nodes) and road networks (1M+ nodes), rank-1 alone is weak — only 66–80% of greedy. These graphs lack the clean spectral structure of torus or regular graphs. But the hybrid warm-start still adds 0.3–1.1% over greedy, demonstrating that even a mediocre spectral starting point beats a random one.
Rank-1 is a building block, not the final product. Three directions build on it:
The rank-2 algorithm enumerates \(O(n^3)\) candidates, which is intractable for large \(n\). But preliminary experiments show that random sampling of just 0.1% of rank-2 candidates recovers 99.8% of the exact optimum. If this holds at larger \(n\), it would make rank-2 practical at \(n = 10{,}000+\) without any GPUs.
Can the rank-1 solution help make rank-2 more efficient? The connection is not obvious — the two algorithms operate in different spaces — but there are several plausible mechanisms:
These ideas remain to be validated experimentally. Whether a rigorous theoretical connection exists — e.g., a bound on how many rank-2 candidates fall “near” the rank-1 optimum — is an open question.
Our results show a clear pattern: rank-1 excels on graphs with strong spectral structure (large eigengap, low effective rank) and struggles on spectrally diffuse graphs. The eigengap \(\lambda_1 - \lambda_2\) and the ratio \(\lambda_1 / \text{tr}(\mathbf{Q})\) may predict when spectral methods will outperform metaheuristics, enabling algorithm selection before solving.
All code and experiment scripts are available on GitHub:
github.com/barbara-su/MaxKCutParallel
# Rank-1 + Hybrid (single CPU, any graph size)
python src/hybrid.py --q_path data/Q.npy --v_path data/V.npy --K 3
# Generate a 5-regular graph and solve
python src/graph_generators/gen_regular_random.py --n 100000 --degree 5 --rank 2 --seed 42
python src/hybrid.py --q_path instances/regular/Q_regular_100000_seed_42.npy \
--v_path instances/regular/V_regular_100000_seed_42.npy --K 3
# Compare all baselines
python src/baselines.py --q_path data/Q.npy --K 3 --methods random,greedy,sa