6.S078 Lecture 9: k-SUM Algorithms
===========================

Announcements: 
- Open Problem Session again tonight! 
More progress on Parity-SAT... and the others too?

- Lecture notes are still coming! 
If you have any questions about content from lectures, please post on Piazza. 

***

k-SUM problem. 
Given: n integers (positive and negative). 
Decide: Are there k (distinct) numbers which sum to zero?

We'll generally assume numbers fit into a word, so O(1) time additions, comparisons, subtractions...

Fact: Let k >= 2. k-SUM on n numbers reduces to O(n) instances of (k-1)-SUM on k-1 parts with n numbers.

Proof: Randomly partition numbers into two parts. 
For each i=1,...,k, each number goes in part 1 with prob 1/k, and part 2 with prob 1-1/k.
Suppose there's a k-SUM solution a_1,...,a_k.
Then Pr[a_1 is part 1, a_2,...,a_k in part 2] >= 1/k*(1-1/k)^{k-1} >= 1/(ek).
(Repeat the random partitioning for O(k) times)
For all numbers x in part 1, 
 Add x to all numbers in part 2. Call (k-1)-SUM on part 2.
If any call returns "yes" then return yes.
QED

Theorem: 2-SUM is in O(n) time. 

Proof. Given a list L of n numbers, make a new list L' = {-a_i | a_i in L}.
Want to determine if L cap L' = empty or not.

For O(n log n) time: sort numbers in O(log n) time. 
For each a_i in the list, binary search for -a_i in O(log n) time. 

Can be improved using hash functions and word tricks. 
In particular, if:
- Each number can be stored in a word,
- Can populate a hash table of O(n^2) size with O(n) elements in O(n) time,
- Can randomly access any entry of a table in O(1) time,
Then we can get O(n) randomized time.

More details:
Suppose the numbers have m-bit representations, so we can think of each x as vectors of length m. 
Let -x be the integer -1*x, written as a vector of length m. 
Pick a random c+2*log n by m Boolean matrix M, for large c. 
Define h_+(x) = M*x and h_{-}(x) = M*(-x)  
Fact: For x != y, Pr[Mx = My] = 1/n^2.
Suppose there's a 2-SUM solution a_1, a_2 in the n numbers.
Then h_+(a_1) = h_{-}(a_2). 
If there's no solution, then for every a_1,
Pr[exists a_1,a_2 s.t. h_+(a_1) = h_{-}(a_2)] <= n^2/2^{c+2*log n} <= 1/2^c.
Make lists L' and L'', where L' = {h_+(a_i) | a_i in L} and L'' = {h_{-}(a_i) | a_i in L}.

Build a hash table of 2^{c+2*log n} length, whose J-th entry contains i in [n] <=> h_{-}(a_i)=J. 
Assume we can access any entry in O(1) time, 
and output its contents in O(L) time where L is the number of words storing the content.   
Then for each h_+(a_i) in L', can look up the list of j's in the b-th entry of the table, and check if any of them form a real 2-SUM solution with a_j.
QED

Cor: 3-SUM in O(n^2) time.

Proof: Combine Fact and 2-SUM algorithm. 
But in fact you don't need hash tables for O(n^2) time...

Start by sorting in O(n log n) time. 
For each number a_i in the list, we make two pointers on the sorted list:
  p1 at the beginning of the sorted list, and p2 at the end.
  Let b_i be the number at p1, c_i be number at p2.

  Repeat until the pointers pass each other:
    If a_i = b_i, move p1 to the right (want distinct numbers)
	If a_i = c_i, move p2 to the left
	If a_i + b_i + c_i = 0 then return the triple
	If a_i + b_i + c_i > 0, then move p2 to the left (to get smaller, we have to decrease c_i)
	If a_i + b_i + c_i < 0, then move p1 to the right (to get larger, we have to increase b_i)
  Return "no solution"
  
  For each a_i, this procedure takes O(n) time to find the other two. 
  So we get O(n^2) time.
  
QED

Fact: Let k >= 2. k-SUM on n numbers reduces to 2-SUM on 2 parts with n^(ceil(k/2)) numbers.

Proof: WLOG assume the instance of k-SUM has k parts, and we want to pick exactly one number from each part.
Enumerate all n^(floor(k/2)) choices of floor(k/2) numbers from the first floor(k/2) parts, form a list
L = {sum_i a_i | a_i is in part i, for all i =1,...,floor(k/2)}
Similary, for all n^(ceil(k/2)) choices from the last ceil(k/2) parts, form a list
L' = {sum_i a_i | a_i is in part floor(k/2)+i, for all i =1,...,ceil(k/2)}.
Now we wish to find a number in L and a number in L' which sum to 0.
QED

Cor: 4-SUM in O(n^2) time, and k-SUM is in n^(ceil(k/2)) time. 

k-SUM Conjecture: 
For every k >= 2, eps > 0, k-SUM cannot be solved in n^(ceil(k/2)-eps) time.

Note this implies that for **odd** k, k-SUM and (k+1)-SUM have essentially the same time complexity.

However, we don't really know improvements for these problems beyond small log factors...

Algorithm for 3SUM. [BDP'05]
Below is a different algorithm, using [LVWW'***??] 

There are roughly three moving parts:
1. Self-reduction for 3SUM -> reduce to small instances. (similar in spirit to OV)
2. Randomized reduction for 3SUM -> reduce *domain* to be small, when instance is small
3. Fast look-up table for small instances with small domain.

1. Self-reduction: 
[LVWW'***??] Deterministic O~(n log n + n^2/s^2)-time reduction from 3-SUM on n numbers 
to O(n^2/s^2) instances of 3-SUM on O(s) numbers.

Recall this was very easy for OV... not so straightforward for 3-SUM!
Self-reduction works in the Real RAM as well!

Proof Idea: Sort the numbers, partition the sorted order into O(n/s) "buckets" of O(s) numbers each.
Argue that there are at most O(n^2/s^2) triples of buckets that could possibly contain a 3-SUM solution, and these triples can be computed in O~(1) each.
 
I'll use [n] = {-n,-n-1,...,0,1,...,n} (a little non-standard)

2. Randomized reduction: 
Theorem: For every c >= 1, there is a d >= 3 such that for any integer m 
there is a family of hash functions H = {h : [2^m] -> [s^d*loglog(m)]}
where each h(x) is computable in O~(m) time and 
for *every* set S of s numbers in [2^m],
If S has a 3SUM then Pr_{h in H}[h(S) has a 3SUM among 3 targets] = 1
If S doesn't have a 3SUM then Pr_{h in H}[h(S) has a 3SUM among 3 targets] <= 1/s^c.
where the 3 targets are a function of h.

Proof: Hash every number in [2^m] modulo a random prime p in [2^t] for some t. 
Note there are >= Omega(2^t/t) primes in this interval, by the prime number thm.
For every triple (a,b,c) of numbers, 
- If a+b+c=0 then a+b+c=0 mod p.
- If a+b+c != 0, then a+b+c <= 3*2^m has at most O(m) prime factors, 
  so Pr[a+b+c = 0 mod p] <= O(mt/2^t).
  
Now pick any set S of s numbers and hash it to h(S).
There are at most s^3 triples in h(S) to consider, so 
  Pr[exists a,b,c in S  a+b+c != 0 but a+b+c=0 mod p] <= O(mts^3/2^t).

Finally, to reduce the hashed set h(S) back to integers again,
 we cast the s numbers mod p and back to integers in {0,1,..,p-1}
 and have three calls to 3-SUM on s integers 
 where we look for 3 numbers summing to 0 in one call, sum to p in another call, and 2p in the third.
(The total sum of any triple is less than 3p.) 
  
Set t = c*log(s)+log(m) for large c >= 1, have error <= log(m)/poly(s).  
  
Don't like the dependence on m? Hash again! 
Now our domain is [s^c*m] instead of [2^m].

When we hash mod a random prime in [2^t] again, 
  Pr[exists a,b,c  a+b+c != 0 but a+b+c=0 mod p] <= O(log(s^c*m)ts^3/2^t).
 
Set t = c*log(s)+loglog(m), have error <= loglog(m)/poly(s).  

Can keep repeating this hashing to drive down the dependence on m, as desired... 
QED

Idea: We can do this hashing repeatedly, until loglog...log(m) <= s. 
Then our domain dependence is only on s.

Note: If we had *real-valued* inputs (and worked on the real RAM) the above hashing tricks would not work at all...

Note: This reduction also shows that WLOG we can assume 3-SUM on n numbers is in the domain [poly(n)].

3. Fast Lookup Table.

Fact: There is a data structure of size s^{O(s)} that can answer any 3-SUM instance on s numbers in domain [s^c].
Proof: There are at most s^{O(s)} such yes instances. Write them all down one by one, and compute their answers.
Store all answers in a look-up table of s^{O(s)} bits.
QED

Assume lookup in a table of size T takes time L(T). 
Usually, L(T) <= O(log T), or L(T) = O(1).

Finally...

3-SUM Algorithm: 
Let s = parameter.
0. Construct look-up table for 3-SUM on s numbers, as in fact.
1. Apply randomized reduction to 3-SUM on all n numbers, mapping them to domain [s^d]. 
   (For *any* subset of O(s) numbers, have probability <= 1/s^3 of error of 3SUM.)
2. Run self-reduction: 
   For each of the O(n^2/s^2) calls to 3-SUM on O(s) numbers, 
   Restrict the O(s) numbers to the domain [s^d], and consult the look-up table.
3. For each call to the self-reduction, the look-up table gives the correct answer with probability >= 1-1/s^3. 
   So we expect <= 1/s^3 fraction of the answers to our O(n^2/s^2) calls to be incorrect.
4. If more than 100 n^2/s^5 calls say "yes" then return "yes". 
	(If "no" instance, we expect <= n^2/s^5 calls to say "yes".)
5. Otherwise search the false positives: 
  For all of the O(n^2/s^5) "yes" calls, search all of the relevant sets of O(s) directly for a subset sum, in O(n^2/s^5*s^2) <= O(n^2/s^3) time. (Note: this is negligible in comparison!)

Total running time: 
O(n^2/s^2) calls * L(s^{O(s)}) time lookup + O(n^2/s^3) false positive search 
+ s^{O(s)} time to set up look-up table.

Set s = eps*(log n)/log log n, so s^{O(s)} <= n^{O(eps)}.

Then running time is: O(n^2*L(n^{O(eps)})*(log log n)^2/(log n)^2).
For L(T) <= O(log T), we have L(n^{O(eps)}) <= O(log n), so we save a log-factor.
If L(T) <= O(1), we save a log^2-factor.