Randomization in
Graph Optimization Problems
David Karger
MIT
http://theory.lcs.mit.edu/~karger

Randomized Algorithms
Flip coins to decide what to do next
Avoid hard work of making “right” choice
Often faster and simpler than deterministic algorithms
Different from average-case analysis
Input is worst case
Algorithm adds randomness

Methods
Random selection
if most candidate choices “good”, then a random choice is probably good
Monte Carlo simulation
simulations estimate event likelihoods
Random sampling
generate a small random subproblem
solve, extrapolate to whole problem
Randomized Rounding for approximation

Cuts in Graphs
Focus on undirected graphs
A cut is a vertex partition
Value is number (or total weight) of crossing edges

Optimization with Cuts
Cut values determine solution of many graph optimization problems:
min-cut / max-flow
multicommodity flow (sort-of)
bisection / separator
network reliability
network design
Randomization helps solve these problems

Presentation Assumption
For entire presentation, we consider unweighted graphs (all edges have weight/capacity one)
All results apply unchanged to arbitrarily weighted graphs
Integer weights = parallel edges
Rational weights scale to integers
Analysis unaffected
Some implementation details

Basic Probability
Conditional probability
Pr[A 3 B] = Pr[A] × Pr[B | A]
Independent events multiply:
Pr[A 3 B] = Pr[A] × Pr[B]
Linearity of expectation:
E[X+Y] = E[X] + E[Y]
Union Bound
Pr[X 4 Y] [ Pr[X] + Pr[Y]

Random Selection for
Minimum Cuts
Random choices are good
when problems are rare

Minimum Cut
Smallest cut of graph
Cheapest way to separate into 2 parts
Various applications:
network reliability (small cuts are weakest)
subtour elimination constraints for TSP
separation oracle for network design
Not s-t min-cut

Max-flow/Min-cut
s-t flow: edge-disjoint packing of s-t paths
s-t cut: a cut separating s and t
[FF]: s-t max-flow = s-t min-cut
max-flow saturates all s-t min-cuts
most efficient way to find s-t min-cuts
[GH]: min-cut is “all-pairs” s-t min-cut
find using n flow computations

Flow Algorithms
Push-relabel [GT]:
push “excess” around graph till it’s gone
max-flow in O*(mn) (note: O* hides logs)
Recent O*(m3/2) [GR]
min-cut in O*(mn2) --- “harder” than flow
Pipelining [HO]:
save push/relabel data between flows
min-cut in O*(mn) --- “as easy” as flow

Contraction
Find edge that doesn’t cross min-cut
Contract (merge) endpoints to 1 vertex

Contraction Algorithm
Repeat  n - 2  times:
find non-min-cut edge
contract it (keep parallel edges)
Each contraction decrements #vertices
At end, 2 vertices left
unique cut
corresponds to min-cut of starting graph

Slide 14

Picking an Edge
Must contract non-min-cut edges
[NI]: O(m) time algorithm to pick edge
n contractions: O(mn) time for min-cut
slightly faster than flows
If only could find edge faster….

Randomize
Repeat until 2 vertices remain
pick a random edge
contract it
(keep fingers crossed)

Analysis I
Min-cut is small---few edges
Suppose graph has min-cut c
Then minimum degree at least c
Thus at least nc/2 edges
Random edge is probably safe
Pr[min-cut edge]  £  c/(nc/2)
                        =  2/n
(easy generalization to capacitated case)

Analysis II
Algorithm succeeds if never accidentally contracts min-cut edge
Contracts #vertices from n down to 2
When k vertices, chance of error is 2/k
thus, chance of being right is 1-2/k
Pr[always right] is product of probabilities of being right each time

Analysis III

Repetition
Repetition amplifies success probability
basic failure probability 1-2/n2
so repeat 7n2 times

How fast?
Easy to perform 1 trial in O(m) time
just use array of edges, no data structures
But need n2 trials: O(mn2) time
Simpler than flows, but slower

An improvement [KS]
When k vertices, error probability 2/k
big when k small
Idea: once k small, change algorithm
algorithm needs to be safer
but can afford to be slower
Amplify by repetition!
Repeat base algorithm many times

Recursive Algorithm
Algorithm RCA ( G, n )
{G has n vertices}
 repeat twice
randomly contract G to n/21/2 vertices
RCA(G,n/21/2)

Main Theorem
On any capacitated, undirected graph, Algorithm RCA
runs in O*(n2) time with simple structures
finds min-cut with probability ³ 1/log n
Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2 n) time.

Proof Outline
Graph has O(n2) (capacitated) edges
So O(n2) work to contract, then two subproblems of size n/2½
T(n) =  2 T(n/2½) + O(n2) = O(n2 log n)
Algorithm fails if both iterations fail
Iteration succeeds if contractions and recursion succeed
P(n)=1 - [1 - ½ P(n/2½)]2 = W (1 / log n)

Failure Modes
Monte Carlo algorithms always run fast and probably give you the right answer
Las Vegas algorithms probably run fast and always give you the right answer
To make a Monte Carlo algorithm Las Vegas, need a way to check answer
repeat till answer is right
No fast min-cut check known (flow slow!)

How do we verify
 a minimum cut?

Enumerating Cuts
The probabilistic method, backwards

Cut Counting
Original CA finds any given min-cut with probability at least 2/n(n-1)
Only one cut found
Disjoint events, so probabilities add
So at most n(n-1)/2 min-cuts
probabilities would sum to more than one
Tight
Cycle has exactly this many min-cuts

Enumeration
RCA as stated has constant probability of finding any given min-cut
If run O(log n) times, probability of missing a min-cut drops to 1/n3
But only n2 min-cuts
So, probability miss any at most 1/n
So, with probability 1-1/n, find all
O(n2 log3 n) time

Generalization
If G has min-cut c, cut £ac is a-mincut
Lemma: contraction algorithm finds any given a-mincut with probability W (n-2a)
Proof: just add a factor to basic analysis
Corollary: O(n2a) a-mincuts
Corollary: Can find all in O*(n2a) time
Just change contraction factor in RCA

Summary
A simple fast min-cut algorithm
Random selection avoids rare problems
Generalization to near-minimum cuts
Bound on number of small cuts
Probabilistic method, backwards

Network Reliability
Monte Carlo estimation

The Problem
Input:
Graph G with n vertices
Edge failure probabilities
For simplicity, fix a single p
Output:
FAIL(p): probability G is disconnected by edge failures

Approximation Algorithms
Computing FAIL(p) is #P complete [V]
Exact algorithm seems unlikely
Approximation scheme
Given G, p, e, outputs e-approximation
May be randomized:
succeed with high probability
Fully polynomial (FPRAS) if runtime is polynomial in n, 1/e

Monte Carlo Simulation
Flip a coin for each edge, test graph
k failures in t trials Þ FAIL(p) » k/t
E[k/t] = FAIL(p)
How many trials needed for confidence?
“bad luck” on trials can yield bad estimate
clearly need at least 1/FAIL(p)
Chernoff bound: O*(1/e2FAIL(p)) suffice to give probable accuracy within e
Time O*(m/e2FAIL(p))

Chernoff Bound
Random variables Xi ` [0,1]
Sum X = å Xi
Bound deviation from expectation
 Pr[ |X-E[X]|  m e E[X] ]   <   exp(-e2E[X]/4)
If  E[X] m 4(log n)/e2, “tight concentration”
Deviation by e probability <  1 / n
No one variable is a big part of  E[X]

Application
Let Xi=1 if trial i is a failure, else 0
Let X = X1 + … + Xt
Then E[X] = t FAIL(p)
Chernoff says X within relative e of E[X] with probability 1-exp(e2 t FAIL(p)/4)
So choose t to cancel other terms
“High probability” t = O(log n / e2FAIL(p))
Deviation by e probability <  1 / n

Review
Contraction Algorithm
O(n2a) a-mincuts
Enumerate in O*(n2a) time

Network reliability problem
Random edge failures
Estimate FAIL(p) = Pr[graph disconnects]
Naïve Monte Carlo simulation
Chernoff bound---“tight concentration”
Pr[ |X-E[X]|  m e E[X] ]   <   exp(-e2E[X]/4)
O(log n / e2FAIL(p)) trials expect O(log n / e2) network failures---good for Chernoff
So estimate within e in O*(m/e2FAIL(p)) time

Rare Events
When FAIL(p) too small, takes too long to collect sufficient statistics
Solution: skew trials to make interesting event more likely
But in a way that let’s you recover original probability

DNF Counting
Given DNF formula (OR of ANDs)
(e1 Ùe2 Ù e3) Ú (e1 Ù e4) Ú (e2 Ù e6)
Each variable set true with probability p
Estimate Pr[formula true]
#P-complete
[KL, KLM] FPRAS
Skew to make true outcomes “common”
Time linear in formula size

Rewrite problem
Assume p=1/2
Count satisfying assignments
“Satisfaction matrix
Sij=1 if ith assignment satisfies jth clause
We want number of nonzero rows
Randomly sampling rows won’t work
Might be too few nonzeros

New sample space
So normalize every nonzero row to sum to one (divide by number of nonzeros)
Now sum of nonzeros is desired value
So sufficient to estimate average nonzero

Sampling Nonzeros
We know number of nonzeros/column
If satisfy given clause, all variables in clause must be true
All other variables unconstrained
Estimate average by random sampling
Know number of nonzeros/column
So can pick random column
Then pick random true-for-column assignment

Few Samples Needed
Suppose k clauses
Then E[sample] > 1/k
1 £ satisfied clauses £ k
1 ³ sample value ³ 1/k
Adding O(k log n / e2) samples gives “large” mean
So Chernoff says sample mean is  probably good estimate

Reliability Connection
Reliability as DNF counting:
Variable per edge, true if edge fails
Cut fails if all edges do (AND of edge vars)
Graph fails if some cut does (OR of cuts)
FAIL(p)=Pr[formula true]

Focus on Small Cuts
Fact: FAIL(p) > pc
Theorem: if pc=1/n(2+d) then
     Pr[>a-mincut fails]< n-ad
Corollary: FAIL(p) » Pr[£ a-mincut fails],
      where a=1+2/d
Recall: O(n2a) a-mincuts
Enumerate with RCA, run DNF counting

Proof of Theorem
Given pc=1/n(2+d)
At most n2a  cuts have value ac
Each fails with probability pac=1/na(2+d)
Pr[any cut of value ac fails] = O(n-ad)
Sum over all a > 1

Algorithm
RCA can enumerate all a-minimum cuts with high probability in O(n2a) time.
Given a-minimum cuts, can e-estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2a))
Corollary: when FAIL(p)< n-(2+d), can
e-approximate it in O (cn2+4/d) time

Combine
For large FAIL(p), naïve Monte Carlo
For small FAIL(p), RCA/DNF counting
Balance: e-approx. in O(mn3.5/e2) time
Implementations show practical for hundreds of nodes
Again, no way to verify correct

Summary
Naïve Monte Carlo simulation works well for common events
Need to adapt for rare events
Cut structure and DNF counting lets us do this for network reliability

Random Sampling
More min-cut algorithms

Random Sampling
General tool for faster algorithms:
pick a small, representative sample
analyze it quickly (small)
extrapolate to original (representative)
Speed-accuracy tradeoff
smaller sample means less time
but also less accuracy

Min-cut Duality
[Edmonds]: min-cut=max tree packing
convert to directed graph
“source” vertex s (doesn’t matter which)
spanning trees directed away from s
[Gabow] “augmenting trees”
add a tree in O*(m) time
min-cut c (via max packing) in O*(mc)
great if m and c are small…

Example

Sampling for Approximation

Random Sampling
[Gabow] scheme great if m, c small
Random sampling
reduces m, c
scales cut values (in expectation)
if pick half the edges, get half of each cut
So find tree packings, cuts in samples

Sampling Theorem
Given graph G, build a sample G(p) by including each edge with probability p
Cut of value v in G has expected value pv in G(p)
Definition: “constant” r = 8 (ln n) / e2
Theorem: With high probability, all cuts in G(r  / c)  have (1 ± e) times their expected values.

A Simple Application
[Gabow] packs trees in O*(mc) time
Build G(r / c)
minimum expected cut  r
by theorem, min-cut probably near  r
find min-cut in time O*(rm) using [Gabow]
corresponds to near-min-cut in G
Result: (1+e) times min-cut in O*(rm) time

Proof of Sampling: Idea
Chernoff bound says probability of large deviation in cut value is small
Problem: exponentially many cuts.  Perhaps some deviate a great deal
Solution: showed few small cuts
only small cuts likely to deviate much
but few, so Chernoff bound applies

Proof of Sampling
Sampled with probability r /c,
a cut of value ac has mean ar
[Chernoff]: deviates from expected size by more than e with probability at most n-3a
At most n2a  cuts have value ac
Pr[any cut of value ac deviates] = O(n-a)
Sum over all a ³ 1

Las Vegas Algorithms
Finding Good Certificates

Approximate Tree Packing
Break edges into c /r random groups
Each looks like a sample at rate r  / c
O*( rm / c) edges
each has min expected cut  r
so theorem says min-cut  (1 – e) r
So each has a packing of size (1 – e) r
[Gabow] finds in time O*(r2m/c) per group
so overall time is  c × O*(r2m/c) = O*(r2m)

Las Vegas Algorithm
Packing algorithm is Monte Carlo
Previously found approximate cut (faster)
If close, each “certifies” other
Cut exceeds optimum cut
Packing below optimum cut
If not, re-run both
Result: Las Vegas, expected time O*(r2m)

Exact Algorithm
Randomly partition edges in two groups
each like a 1/2-sample:  e = O*(c-1/2)
Recursively pack trees in each half
c/2 - O*(c1/2) trees
Merge packings
gives packing of size c - O*(c1/2)
augment to maximum packing: O*(mc1/2)
T(m,c)=2T(m/2,c/2)+O*(mc1/2) = O* (mc1/2)

Nearly Linear Time

Analyze Trees
Recall: [G] packs c (directed)-edge disjoint spanning trees
Corollary: in such a packing, some tree crosses min-cut only twice
To find min-cut:
find tree packing
find smallest cut with 2 tree edges crossing
Problem: packing takes O*(mc) time

Constraint trees
Min-cut c:
c directed trees
2c directed min-cut edges
On average, two min-cut edges/tree
Definitions:
tree 2-crosses cut

Finding the Cut
From crossing tree edges, deduce cut
Remove tree edges

Sampling
Solution: use G(r/c) with e=1/8
pack O*(r) trees in O*(m) time
original min-cut has (1+e)r edges in G(r / c)
some tree 2-crosses it in G(r / c)
…and thus 2-crosses it in G
Analyze O*(r) trees in G
time O*(m) per tree
Monte Carlo

Simplify
Discuss case where one tree edge crosses min-cut

Analyzing a tree
Root tree, so cut                 subtree
Use dynamic program up from leaves to determine subtree cuts efficiently
Given cuts at children of a node, compute cut at parent
Definitions:
v¯ are nodes below v
C(v¯) is value of cut at subtree v¯

The Dynamic Program

Algorithm: 1-Crossing Trees
Compute edges’ LCA’s: O(m)
Compute “cuts” at leaves
Cut values = degrees
each edge incident on at most two leaves
total time O(m)
Dynamic program upwards: O(n)
Total: O(m+n)

2-Crossing Trees
Cut corresponds to two subtrees:

Linear Time
Bottleneck is C(v¯, w¯) computations
Avoid.  Find right “twin” w for each v

How do we verify
 a minimum cut?

Network Design
Randomized Rounding

Problem Statement
Given vertices, and cost cvw to buy and edge from v to w, find minimum cost purchase that creates a graph with desired connectivity properties
Example: minimum cost k-connected graph.
Generally NP-hard
Recent approximation algorithms [GW],[JV]

Integer Linear Program
Variable xvw=1 if buy edge vw
Solution cost S xvw cvw
Constraint: for every cut, S xvw ³ k
Relaxing integrality gives tractable LP
Exponentially many cuts
But separation oracles exist (eg min-cut)
What is integrality gap?

Randomized Rounding
Given LP solution values xvw
Build graph where vw is present with probability xvw
Expected cost is at most opt: S xvw cvw
Expected number of edges crossing any cut satisfies constraint
If expected number large for every cut, sampling theorem applies

k-connected subgraph
Fractional solution is k-connected
So every cut has (expected) k edges crossing in rounded solution
Sampling theorem says every cut has at least k-(k log n)1/2 edges
Close approximation for large k
Can often repair: e.g., get k-connected subgraph at cost 1+((log n)/k)1/2 times min

Nonuniform Sampling
Concentrate on the important things
[Benczur-Karger, Karger, Karger-Levine]

s-t Min-Cuts
Recall: if G has min-cut c, then in G(r/c) all cuts approximate their expected values to within e.
Applications:

The Problem
Cut sampling relied on Chernoff bound
Chernoff bounds required that no one edge is a large fraction of the expectation of a cut it crosses
If sample rate ^1/c, each edge across a min-cut is too significant

Biased Sampling
Original sampling theorem weak when
large m
small c
But if m is large
then G has dense regions
where c must be large
where we can sample more sparsely

        Problem          Old Time   New Time
Approx. s-t min-cut     O*(mn)       O*(n2 / e2)
Approx. s-t max-flow  O*(m3/2 )     O*(mn1/2 / e)
Flow of value v           O*(mv)
                                                          O*(n11/9v)
Approx. bisection       O*(m2)        O*(n2 / e2)
m Þ n /e2  in weighted, undirected graphs

Strong Components
Definition: A k-strong component is a maximal vertex-induced subgraph with min-cut k.

Nonuniform Sampling
Definition: An edge is k-strong if its endpoints are in same k-component.
Stricter than k-connected endpoints.
Definition: The strong connectivity ce for edge e is the largest k for which e is k-strong.
Plan: sample dense regions lightly

Nonuniform Sampling
Idea: if an edge is k-strong, then it is in a k-connected graph
So “safe” to sample with probability 1/k
Problem: if sample edges with different probabilities, E[cut value] gets messy
Solution: if sample e with probability pe, give it weight 1/pe
Then E[cut value]=original cut value

Compression Theorem
Definition: Given compression probabilities pe, compressed graph G[pe]
includes edge e with probability pe and
gives it weight 1/pe if included
Note E[G[pe]] = G
Theorem: G[r / ce]
approximates all cuts by e
has O (rn) edges

Proof (approximation)
Basic idea: in a k-strong component, edges get sampled with prob. r / k
original sampling theorem works
Problem: some edges may be in stronger components, sampled less
Induct up from strongest components:
apply original sampling theorem inside
then “freeze” so don’t affect weaker parts

Strength Lemma
Lemma:    å 1/ce  £  n
Consider connected component C of G
Suppose C has min-cut k
Then every edge e in C has ce ³ k
So k edges crossing C’s min-cut have
å 1/ce  £   å 1/k   £  k (1/k )  = 1
Delete these edges (“cost” 1)
Repeat n - 1 times: no more edges!

Proof (edge count)
Edge e included with probability r / ce
So expected number is S r / ce
We saw  S 1/ce  £  n
So expected number at most  r n

Construction
To sample, must find edge strengths
can’t, but approximation suffices
Sparse certificates identify weak edges:
construct in linear time [NI]
contain all edges crossing cuts £ k
iterate until strong components emerge
Iterate for 2i-strong edges, all i
tricks turn it strongly polynomial

Certificate Algorithm
Repeat k times
Find a spanning forest
Delete it
Each iteration deletes one edge from every cut (forest is spanning)
So at end, any edge crossing a cut of size £ k is deleted
[NI] merge all iterations in O(m) time

Flows
Uniform sampling led to flow algorithms
Randomly partition edges
Merge flows from each partition element
Compression problematic
Edge capacities changed
So flow path capacities distorted
Flow in compressed graph doesn’t fit in original graph

Smoothing
If edge has strength ce, divide into br / ce edges of capacity ce /br
Creates br å 1/ce  = brn edges
Now each edge is only 1/br fraction of any cut of its strong component
So sampling a 1/b fraction works
So dividing into b groups works
Yields (1-e) max-flow in                      time

Cleanup
Approximate max-flow can be made exact by augmenting paths
Integrality problems
Augmenting paths fast for small integer flow
But breakup by smoothing ruins integrality
Surmountable
Flows in dense and sparse parts separable
Result: max-flow in O*(n11/9v) time

Proof By Picture

Compress Dense Regions

Solve Sparse Flow

Replace Dense Parts
(keep flow in sparse bits)

“Fill In” Dense Parts

Conclusions

Conclusion
Randomization is a crucial tool for algorithm design
Often yields algorithms that are faster or simpler than traditional counterparts
In particular, gives significant improvements for core problems in graph algorithms

Randomized Methods
Random selection
if most candidate choices “good”, then a random choice is probably good
Monte Carlo simulation
simulations estimate event likelihoods
Random sampling
generate a small random subproblem
solve, extrapolate to whole problem
Randomized Rounding for approximation

Random Selection
When most choices good, do one at random
Recursive contraction algorithm for minimum cuts
Extremely simple (also to implement)
Fast in theory and in practice [CGKLS]

Monte Carlo
To estimate event likelihood, run trials
Slow for very rare events
Bias samples to reveal rare event
FPRAS for network reliability

Random Sampling
Generate representative subproblem
Use it to estimate solution to whole
Gives approximate solution
May be quickly repaired to exact solution
Bias sample toward “important” or “sensitive” parts of problem
New max-flow and min-cut algorithms

Randomized Rounding
Convert fractional to integral solutions
Get approximation algorithms for integer programs
“Sampling” from a well designed sample space of feasible solutions
Good approximations for network design.

Generalization
Our techniques work because undirected graph are matroids
All our results extend/are special cases
Packing bases
Finding minimum “quotients”
Matroid optimization (MST)

Directed Graphs?
Directed graphs are not matroids
Directed graphs can have lots of minimum cuts
Sampling doesn’t appear to work
Residual graphs for flows are directed
Precludes obvious recursive solutions to flow problems

Open problems
Flow in O(nv) time (complete m Þ n)
Eliminate v dependence
Apply to weighted graphs with large flows
Flow in O(m) time?
Las Vegas algorithms
Finding good certificates
Detrministic algorithms
Deterministic construction of “samples”
Deterministically compress a graph

Randomization in
Graph Optimization Problems
David Karger
MIT
http://theory.lcs.mit.edu/~karger
karger@mit.edu