Last time we introduced FPT functions, algorithms and reductions. We also gave a simple 2^k n time algorithm for k-VC.
Our plan for today is as follows:

- give improved FPT algorithms for k-VC
- introduce kernelization
- give kernels for k-VC

Recall the k-VC problem: Given G=(V,E) and an integer k, determine whether there is S\subseteq V with |S|\leq k so that for every (u,v)\in E, either u\in S or v\in S or both.

The 2^k n algorithm A(G,S) proceeded by recursively building up a potential VC S, where S starts empty, and while |S|<k, one picks an edge (u,v) and branches on A(G \ {u}, S\cup {u}) and A(G \ {v}, S\cup {v}). The leaves of the recursion tree are when |S|=k and then if G is the empty graph, S is returned as a k-VC, and if no such leaf has an empty graph, then "NO" is returned.

Today we will see two improvements:

------

Theorem 1: k-VC is in O*(1.47^k) time.

Proof: We'll modify the previous algorithm.

Add one more case before the crucial branching step:

- If the max degree of G is <= 2, then G is just a set of paths and cycles. Solve the VC problem optimally on each one, in polytime.
- Otherwise, G has a node v of degree >= 3. 

Now modify the branching step: 

  Accept iff (A(G-{v},S cup {v}) accepts or A(G-N(v)-{v},S cup N(v)) accepts).
[Either v is in the vertex cover, or it isn't and all of its neighbors must be!]

The recurrence is Then T(n,k) <= T(n-1,k-1)+T(n-4,k-3) + poly(n), and T(n,0) = O(1). 

which solves to ... what? How do we analyze this? 
We know it must be of the form x^k*poly(n) (can't be worse than before!), and we want to determine x.

So our recurrence is 
x^k*poly(n) <= x^{k-1}*poly(n-1)+ x^{k-3}*poly(n-4)+ poly(n)

Think of another recurrence, T'(k) = T'(k-1) + T'(k-3). 
We want to find x such that T'(k) = x^k. 
OK, let's solve x^k = x^{k-1} + x^{k-3} 
--> x^3 - x^2 - 1 = 0
--> x = 1.4666...

We get T(n,k)<= 1.4666^k poly(n).

QED

=======================

In general:

Theorem: For every recurrence of the form
$T(n) = T(n-L_1) + T(n-L_2) +\cdots + T(n-L_i) + O*(1)$
we have $T(n) = O*(r(L_1,\ldots,L_i))$, where

$r(L_1,\ldots,L_i)$ is the smallest positive root of the rational function p(x) = 1-\sum_{j=1}^i x^{-L_j}$.

Proof: Given the recurrence relation, we want to prove T'(k) = x^k for some x.
Plug it in to both sides: x^n = sum_j x^{n-L_j}.
Divide through by x^n: 1 = sum_j x^{-L_j}. 
So we have 1 - sum_j x^{-L_j} = 0. QED

For example, consider: 

$T(n) \leq T(n-1) + T(n-2) + O*(1)$.

The respective expression to solve is $1 - 1/x - 1/x^2 = 0$.

==> x^2 - x - 1 = 0 ==> x(x-1) = 1. Solutions for x are x = 1.618..., -.618033...

Get T(n) <= O*(1.619^n).

==================

So far we branched on nodes of degree >= 3 until there were no more such nodes, and then we solved in polytime. 

You can get even faster with more case analysis and thinking about the problem.

Suppose we keep branching on a node of degree >= 4: then the running time recurrence becomes 

T(k) <= T(k-1) + T(k-4)

(either the node v is in S, or all its >=4 neighbors are in S)

If that were the only case, then our running time would be T(k) <= 1.39^k.
But there are other cases. 
What if there are no nodes of degree >= 4? 
We need to do something different from before.

Suppose the maximum degree of G is 3.

(If max degree is <= 2 then we can solve in polynomial time.)

We can add more rules before the branching step:

- Suppose there is a v with deg(v) = 1. Then delete v and put its neighbor in S. 
(Why does this work?)
- Suppose there is a v with deg(v) = 2. Let its nbrs be u and w.
--- If (u,w) in E, then (u,v,w) is a triangle. Put u,w in S and delete v. (Why?)
--- If (u,w) not in E, then we "fold" u, v, and w into each other: merge them into one vertex (uvw) with neighborhood (N(u) cup N(w))-{v}
																	and reduce k by 1
Why does this work? 

Claim: Original G has VC of size k iff New G has VC of size k-1.
Proof: 
   G has a min VC of size k that doesn't contain v <=> G has a min VC of size k containing u,w <=> new G has vc of size k-1 containing (uvw).
   if above step doesn't work and G has a min VC of size k that contains v <=> G has a min VC of size k that *doesn't* contain u or w <=> new G has VC of size k-1 without (uvw).

   (Convince yourself of these statements!!)
   
QED

-- Suppose there are no vertices with degree 1 or 2, so every node has degree exactly 3.
Consider branching on a degree 3 vertex v. 
When we put v in the cover and remove it from the graph, all nbrs of v now have degree 2, so we can fold at least one of them and reduce the VC by 1.
So k drops by at least 2.
When v is not in the cover, we put its 3 neighbors in the cover. This puts 3 vertices in the VC.

In this case we have the recurrence T(k) <= T(k-2) + T(k-3), which is in fact better... < 1.33^k

So in total our running time recurrence is T(k) <= max{T(k-1)+T(k-4),T(k-2)+T(k-3)} < 1.39^k

===

Best known: [Chen, Kanj, Xia '06] 1.2738^k  

Lots more case analysis.
Could we just keep improving with more and more case analysis and more branching rules, and get O*((1+eps)^k) time for every eps > 0?
That would contradict ETH...

=======================

A Theory of Problem Compression 

There is a fundamentally different way of thinking about FPT problems, as a notion of "instance compression"
When we can "compress" instances of a problem down to smaller, equivalent problems? 

Note, this is not like data compression in the usual sense: we can potentially compress much better than data compression. 
We only have to preserve ONE bit in our compression: whether or not the problem is an "accept" or a "reject"!

For example, if a problem is in P and we are allowed polynomial time to "compress" the problem, we can compress every instance down to a constant-sized instance:

[Just run a polynomial-time algorithm for solving the problem -- if it outputs "accept" then output a constant-sized yes-instance, otherwise output a no-instance.]

Parametrization gives us a nice way of formalizing this. 

Def Let L be a parameteric problem. 
We say that L has a kernel (or, is "kernelizable") if there is a polynomial time reduction from L to L such that 
for (y,k') = M(x,k), we have k' <= k and |y| <= h(k) for some computable h : N -> N

The function h(k) is called the size of the kernel. 
The "hard part" of the instance. (Since it only took polynomial time to compress down to h(k) size.)

The polytime reduction: the "kernelization algorithm"

Being kernelizable means we can compress arbitrary instances of L, so that the size of the entire problem instance depends only on the parameter k.

Concrete examples of this will come later. 
First order of business is to demonstrate an important equivalence:

Theorem: Let L be parametric and *decidable*. L is in FPT <=> L has a kernel.

Proof: (<=) 
L has a kernel => 
Run the poly-time reduction from L to L, in polynomial time. 
Get an output problem instance of size h(k). 
Because L is decidable, we can solve the problem in f(h(k)) time for some computable f. 
Hence we get an FPT algorithm for L.

(=>) L is FPT. 
Suppose there's an algorithm A for L that runs in f(k) + n^c time. 

Here's a kernelization algorithm for L:

Given (x,k), 
1. if |x|^c < f(k) then simply output (x,k). This is only O(f(k)) size, because f(k)+n^c = O(f(k)).
2. if |x|^c > f(k) then f(k)+n^c = O(n^c), in this case, the algorithm A is actually a polynomial time algorithm! 
So we can reduce L to itself with a constant-size kernel: run A on (x,k). 
 if A(x,k) accepts then output a constant-size yes-instance for L
 if rejects then output a constant-size no-instance for L

QED

Kernelization becomes really interesting when we ask whether an FPT problem has *polynomial-size* kernels:

Def: A parametric L has "feasible" kernels if there is a poly-time reduction from L to L that outputs instances of poly(k) size.

This is an "extreme" form of problem compression. 

Asymmetry between FPT algorithms and kernelizations.

Def. A "feasible" FPT problem has an algorithm running in O*(2^{poly(k)}) time. 

Consider an NP problem L and some parameterized version L' (where we stick a parameter k in with its input).

Prop: A feasible kernelization for L  implies a feasible FPT algorithm for L.

Proof: Run the polytime kernelization algorithm for L, get an instance of size poly(k). NP problems of size n are solvable in 2^{poly(n)} time, so we can run a brute-force algorithm for L that takes 2^{poly(k)} time.

However, a problem may be "feasible FPT" but *not* necessarily have a feasible kernelization. 

Example. k-path: given a graph on n nodes and a parameter k, is there a simple path of length k in the graph? 

We will see an O*(2^k) time algorithm but no kernelization of poly(k) size (unless something unlikely happens in complexity theory).

===

Theorem: k-Vertex Cover has kernels of O(k^2) edges.

[That is, there is a polytime alg from k-VC to itself which outputs graphs having at most O(k^2) edges.]

Proof:

Initially S is empty.
Given a graph G and parameter k, 
if a node has degree > k, then it must be in the vertex cover! (Otherwise all its neighbors would be, but that would be a larger VC than k.)
While there is a degree k node in the graph, 
   remove that node from the graph and put it in the vertex cover S. Reduce k by 1.
// the graph has max degree k.
Output the remaining graph and parameter k.

This is a polynomial-time reduction from k-Vertex Cover to itself, such that the original graph has a k-VC iff the output graph has a k'-VC for some k' <= k.
Plus the output graph has O(k^2) edges. 

QED

If the polytime hierarchy doesn't collapse, there aren't kernels of size k^{1.999}.

Here's a tighter version of this result which can be useful. 

Theorem: k-Vertex Cover has kernels with <= 2k vertices (and hence O(k^2) edges).

Before we prove this theorem, let's take an aside to Linear Programming.

Write an "LP relaxation" for k-Vertex Cover. We make a real-valued variable x_v for each vertex of our graph.

min sum_v x_v 
For all v in V, x_v >= 0, x_v <= 1 
For all (u,v) in E, x_u + x_v >= 1. 

(This is in contrast to the IP formulation, which would force x_v to be integer.)

Theorem [Kachiyan'79,Karmakar'84]: LPs are solvable is in polynomial time!

So we can solve this LP, get values a_v in [0,1] for each vertex v.

This LP can be used to get a 2-approximation to VC: 
output a vertex cover of size at most twice that of the optimal vertex cover.

Define S := {v | a_v >= 1/2}. 

Claim: S is a vertex cover. 
Proof: For every edge (u,v), we know a_u + a_v >= 1, so at least one of these is >= 1/2, and hence was put in S.

Claim: |S| <= 2*(minimum vertex cover size)
Proof: We have sum_v a_v <= minimum vertex cover size: since the LP is a "relaxation" of the IP.
For every v in S, we know sum_{v in S} a_v >= |S|/2. Hence the claim holds.

=========================

Now let's prove k-VC has a kernel with <= 2k vertices.

Partition the graph into sets P = {v in V | a_v > 1/2}, Z = {v in V | a_v = 1/2}, N = {v in V | a_v < 1/2}

Claim: Suppose G has a min vertex cover of size k. Then 
(a) S := P union Z contains a min VC, and 
(b) |S| = |P| + |Z| <= 2k. 

Proof: We already proved (b) in the above.

Observe:
1. There are no edges between N and Z. (Why? because for every (u,v) between N and Z, and every (u,v) inside of N, we have a_u + a_v < 1.)
2. There are no edges inside N (Also violating the inequality)

(*) All edges from N must go to P.

Now, let k be the size of the min VC.

Let S be any min VC.

Form S' from S by removing N\cap S and adding P\S.

Clearly S' is contained in P\cup Z.

We want to show that
(1) S' is a vertex cover
(2) |S'|\leq |S|. (Hence S' is also optimal.)

(1) follows from (*) since the only edges that (N\cap S) was responsible for have endpoints in P, and all of P is in S'.

(2) Suffices to show that |N cap S*| >= |P-S*| (we remove at least as many as we add)

Consider all edges between P-S and (P\cup Z)\cap S.
Notice that for any such edge (u,v) one must have a_u+a_v>1.
(At least one endpoint is in P and the other is in P or Z...)

Let eps be the minimum over all u in (P-S) of (a_u-1/2).
Note that eps > 0.

Now, set
a'_u=a_u-eps for all u in (P - S)
a'_u=a_u+eps for all u in (N\cap S)
a'_u=a_u otherwise.

Claim: This a' is still a feasible solution to the LP. 

Proof: First note that a'_u >= 0 for all u.

To check the rest of feasibility, it suffices to just check the edge constraints on nodes u in (P-S).

Notice that the only edges incident to u in (P-S) have an endpoint in S
(S is a vertex cover, so all nodes in (P-S) must have its edges going into S)
So the following two cases cover all edges out of (P-S):

(1) for edge (u,v) with u in (P-S) and v in (N\cap S) we have a'_u + a'_v = a_u + a_v >= 1
(2) for (u,v) with u in (P-S) and v\in (P\cup Z)\cap S we have
a'_u + a'_v = (a_u - eps) + a_v  >= 1, by choice of eps. 
[(a_u - eps) >= a_u - (a_u - 1/2) >= 1/2, and a_v >= 1/2 because v in (P cup Z)]

QED

Finally, consider the VALUE of the LP solution a'_v. It is:

sum_u a'_u = sum_u a_u - eps*(|P-S| - |N cap S|). 

(because we subtract eps for each node in P-S and add eps for each node in N cap S)

Since a_u is an optimal solution to the LP already, 
a'_u can't achieve a better solution. So we must have that 

|P-S*| - |N cap S*| <= 0, i.e. |N cap S*| >= |P-S*|.

QED


Corollary: k-VC has a 2k node kernel.

Proof: Run the LP as above, partition into P,Z,N, then remove the nodes in N from the graph. The above shows this works. QED

State of the art in kernelization for VC:

* Best known kernel for nodes is: 2k - c log(k) for any constant c.
* If there is a kernel for k-VC with k^{2-eps} edges, then coNP is contained in NP/poly. (cool stuff in complexity... unlikely stuff, though)