Yihong Wu
Recovering a Hidden Hamiltonian Cycle via Linear Programming
Abstract: We introduce the problem of hidden Hamiltonian cycle
recovery, where there is an unknown Hamiltonian cycle in an $n$-vertex
complete graph that needs to be inferred from noisy edge
measurements. The measurements are independent and distributed
according to $P_n$ for edges in the cycle and $Q_n$ otherwise. This
formulation is motivated by a problem in genome assembly, where the
goal is to order a set of contigs (genome subsequences) according to
their positions on the genome using long-range linking measurements
between the contigs. Computing the maximum likelihood estimate in
this model reduces to a Traveling Salesman Problem (TSP). Despite the
NP-hardness of TSP, we show that a simple linear programming (LP)
relaxation, namely the fractional $2$-factor (F2F) LP, recovers the
hidden Hamiltonian cycle with high probability as $n \to \infty$
provided that $\alpha_n - \log n \to \infty$, where $\alpha_n
\triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the R\'enyi divergence
of order $\frac{1}{2}$. This condition is information-theoretically
optimal in the sense that, under mild distributional assumptions,
$\alpha_n \geq (1+o(1)) \log n$ is necessary for any algorithm to
succeed regardless of the computational cost.
Departing from the usual proof techniques based on dual witness
construction, the analysis relies on the combinatorial
characterization (in particular, the half-integrality) of the extreme
points of the F2F polytope. Represented as bicolored multi-graphs,
these extreme points are further decomposed into simpler
``blossom-type'' structures for the large deviation analysis and
counting arguments. Evaluation of the algorithm on real data shows
improvements over existing approaches.
This is joint work with Vivek Bagaria (Stanford), Jian Ding (Penn),
David Tse (Stanford) and Jiaming Xu (Purdue).