Lecture on Algorithms for the Orthogonal Vectors (OV) Problem.

As before, the exercises are optional, but recommended! They are there to get you thinking about the problems.

==============================

Recall the OV problem.

Input: v_1, ..., v_n in {0,1}^d, n vectors in d dimensions
Decide: are there i, j such that <v_i, v_j> = 0? 

I'll sometimes use OV_{n,d} when I want to emphasize that the number of vectors is n and their dimensionality is d.

OV is a very basic and versatile problem. 
Here is an interesting application: partial match queries.

In the Partial Match problem, we are given a "database" of n binary strings, and a list of n "queries" which are strings in {0,1,?}^*.
(Here, "?" represents a wildcard.)
We say that a query q=q_1,...,q_d matches a string x=x_1,...,x_d if for all i=1,...,d, if q_i in {0,1} then q_i = x_i.
Output: Determine for all n queries, which of them match some string in the database.

This problem is equivalent to OV!

Claim: OV and Partial Match are subquadratic-time equivalent.
Formally, there is an e > 0 such that OV is in n^{2-e}*poly(d) time
        <=> there is an e > 0 such that Partial Match is in n^{2-e}*poly(d) time.

[Note, OV is a decision problem, while Partial Match, as stated above, is a function problem: one outputs n bits]

You will show how to prove the Claim on a future homework. On your upcoming homework, you will show a partial result in this direction.

Let's discuss some algorithms for OV.

1. Trivial brute-force algorithm for OV: 
O(n^2*d) time.

2. There is a "folklore" algorithm which is good when dimension d << log(n):
O(dn*2^d) time.

========================
Exercise: Can you find this algorithm?
========================

3. There is also an poly(d)*(n + 2^d) time algorithm. 
Here is one route to such an algorithm. We can reduce OV to the following "subset query" problem: 
Given subsets S_1,...,S_n of {1,...,d}, are there i,j such that S_i is a proper subset of S_j?
Then we can solve this problem, by making a dynamic program of O(2^d) "cells", where each cell corresponds to a subset of {1,...,d}.

You will complete this algorithm (from the above hints) on your homework.


4. For large d, there is an algorithm running faster than n^2*d:
[Gum-Lipton'01] O(n^2*d^{omega - 2}) time, where omega < 2.373 is the matrix multiplication exponent. 

Consider an n by d matrix A with rows equal to the vectors v_1,...,v_n.
Observation: The i,j entry of A*A^T is <v_i,v_j>. (where A^T is the transpose of A)

Thus, we can think of the OV problem in the following way: we are given an n by d matrix A, and wish to determine if A*A^T contains a 0 entry, in time less than the total number of entries in A*A^T.

From the reduction given in lecture 1 (and the sparsification lemma as described in lecture 2) we have:

SETH => for every eps > 0, OV is not solvable in n^{2-eps}*2^{o(d)} time. 
(recall, 2^{o(d)} means "less than 2^{alpha*d} for all alpha > 0")

So assuming SETH, there is no algorithm that runs in time "subquadratic in n and subexponential in d".

5. In this lecture, we'll describe the algorithm of [Abboud, Williams, Yu 2015]
which shows:

Main Thm:  OV for vectors of dimension c*log(n) can be solved in (randomized) time n^{2-1/O(log c)}. 

The algorithm works even if c is itself a function of n, up to c < 2^{sqrt{log n}}.

Corollary: For all c >= 1, there is an eps > 0 such that OV for vectors of dimension c*log(n) can be solved in time n^{2-eps}.

If we could swap the quantifiers above, we would refute SETH. In other words:

If there is an eps > 0 such that for all c >= 1, OV for vectors of dimension c*log(n) can be solved in time n^{2-eps}, then SETH is false.

This follows from the fact that k-SAT can be sparsified: 
we can reduce k-SAT formulas to the case of k-SAT with only c_k*n clauses for some constant c_k.

=====

The starting point for the OV Algorithm.

We begin with a simple "self-reduction" for the OV problem. 
(A "self-reduction for OV" is a reduction that reduces the OV problem to multiple "smaller" OV problems.)

Thm: For any parameter s in {1,...,n}, we can reduce any instance of OV_{n,d} to n^2/s^2 instances of OV_{2s,d}.

Proof: 

Given an instance of OV with n vectors in d dimensions, the reduction works as follows:
(1) Divide the n vectors into O(n/s) groups, where each group has at most s vectors each.
(2) For all (n/s)^2 pairs of groups, call OV_{2s,d} on the union of the two groups. 
(3) If any call returns "yes", then output "yes" (there's an OV pair) else "no OV pair".

QED

Note that the obvious algorithm for OV_{2s,d} takes O(s^2 d) time, so if we applied that algorithm to the self-reduction, we'd only get a running time of O((n/s)^2 * s^2 d) <= O(n^2 d) time.

The key idea behind the Main Thm is the following: 

Represent the function OV_{2s,d} in some "interesting" way, so that we can evaluate OV_{2s,d} on many pairs of groups fast. 

(In particular, we will represent OV_{2s,d} as a multivariate polynomial in a particular way, and use fast matrix multiplication to evaluate the function quickly on many points!)

This will speed-up step (2) of the self-reduction. 

The ideal situation is that our new representation will be able to compute step (2) in O~(n^2/s^2) time for *large* s.
Then, applying the self-reduction, we'll get an O~(n^2/s^2) time algorithm for OV_{n,d}! 
(It turns out we'll be able to do this for s = n^{1/O(log c)}, which is how we get n^{2-1/O(log c)} time in the Main Thm.)

OK, how do we represent OV_{2s,d}? (To keep the notation simple, we'll just look at OV_{s,d} in the following.) 
First of all, we can think of OV_{s,d} as a Boolean logic expression: 

Given n*d bits V = v_1[1],...,v_1[d],....,v_n[1],...,v_n[d] (encoding n vectors of d bits each), we can write

OV_{s,d}(V) = OR_{i,j in {1,...,s}} AND_{k=1,...,d} (not-v_i[k] OR not-v_j[k]).

==============================
Exercise: Convince yourself that this "OR of AND of ORs" formula indeed encodes the OV problem. 
==============================

We want to evaluate this "OR of AND of ORs" on many pairs of inputs, quickly. 
To do this, we will use a RANDOMIZED representation of OV_{s,d} to get a nice polynomial representing it. The key idea is the marvelous

XOR Trick [Razborov'87]

Given a clause C = (y_1 OR ... OR y_L) and a parameter k, the XOR trick randomly reduces C to a circuit G, 
which is an OR of only k XORs of UNIFORM RANDOM subsets of y_1,...,y_L.
That is, we take k RANDOM subsets R_1,...,R_k of {y_1,...,y_L}, compute the XOR of all bits in each R_i, then take the OR of the k outcomes.

XOR Trick Lemma: For every k and L, there is a distribution of formulas D, each of the form "OR_k XOR_L", such that for all y in {0,1}^L,
(1) If (y_1 OR ... OR y_L) is false, then for every G drawn from D, G(y_1,...,y_L) is false.
(2) If (y_1 OR ... OR y_L) is true, then Pr_{G ~ D}[G(y_1,...,y_L) is false] = 1/2^k.

Proof: (1) If all y_1,...,y_L are 0, then clearly any XOR of a subset of y_1,...,y_L is also 0, and any OR of those XORs is also 0.
(2) If (y_1 OR ... OR y_L) is true, let S subsetof {1,...,L} be the (non-empty) subset of i's such that y_i = 1. 

We claim that, for every S subset of {1,...,L}, and a randomly chosen R subset of {1,...,L}, there is probability 1/2 that R contains an odd number of elements of S, and probability 1/2 that R contains an even number of elements of S. QED

==============================
Exercise: Try to prove the claim yourself! 
==============================

The XOR Trick Lemma shows how to reduce an L-clause into a k-clause of XORs. There is another version over AND. Namely,

XOR Trick Lemma, Part II: For every k and L, there is a distribution of formulas D of the form "AND_k XOR_L" such that, for all y in {0,1}^L,
(1) If (y_1 AND ... AND y_L) is true, then for all H drawn from D, H(y_1,...,y_L) is true.
(2) If (y_1 AND ... AND y_L) is false, then Pr_{H ~ D}[H(y_1,...,y_L) is true] = 1/2^k.

==============================
Exercise: Prove Part II.
==============================

Now, we will use the XOR Trick to randomly reduce the formula for OV into a "sparse" polynomial over F_2 (the field of two elements, mod 2).
The formal theorem we'll prove is:

OV Conversion Thm: For every s,d, there is a distribution D of polynomials over F_2, where each polynomial has s*d variables and at most M(s,d) := poly(s)*{2d choose O(log s)} monomials, such that for all inputs v_1,...,v_s in {0,1}^d to the OV_{s,d} problem,
Pr_{p ~ D}[OV_{s,d}(v_1,...,v_s) = p(v_1,...,v_s) mod 2] >= 3/4.

Moreover, we can construct a random p from the distribution D in poly(M(s,d)) time. 

Proof: Recall we were working with

OV_{s,d}(V) = OR_{i,j in {1,...,s}, i!=j} AND_{k=1,...,d} (not-v_i[k] OR not-v_j[k]).

Step 1: To the big OR over {s choose 2} terms, we apply the XOR Trick with parameter k=3. 
Step 2: To each of the {s choose 2} ANDs in the formula, we apply the XOR Trick (Part II) with k = 3 + 2*log(s). 

First, let's check that Pr_{p ~ D}[OV_{s,d}(v_1,...,v_s) = p(v_1,...,v_s)] >= 3/4.
On any given input, the replacement to the top OR (the XOR Trick) contributes error at most 1/8 to the result, and the replacement of each of the {s choose 2} ANDs (the XOR Trick, Part II) contributes error at most s^2*1/(8s^2) <= 1/8. 
Therefore (by the union bound) the total error is at most 1/4. 

Now let's show that the object we construct from applying all these Tricks corresponds to a polynomial with at most poly(s)*{2d choose O(log s)} monomials. 

Remember that modulo 2, AND is the same as multiplication, XOR is the same as addition, and NOT(x) = 1+x mod 2. Therefore, by DeMorgan's laws, the OR function on k inputs x_1,...,x_k can be written as the degree-k polynomial 1+(1+x_1)*...*(1+x_k) mod 2.

Let's consider the effect of Step 1 on the OV formula. 
It replaces the big OR at the top with an OR of three XORs of O(s^2) inputs. If we rewrite this as a polynomial using the above observations, we obtain a degree-3 polynomial on O(s^2) variables, where each input to this polynomial is a formula from Step 2. This polynomial has poly(s) monomials.

For each of the O(s^2) ANDs in Step 2, we are applying the XOR Trick to replace each AND of d inputs with an AND of (3+2*log(s)) XORs of at most d inputs.

==============================
Exercise: From here, show that the formula we got from applying the XOR Tricks in Steps 1 and 2 corresponds to a polynomial with at most M(s,d) := poly(s)*{2d choose 2*(2*log(s)+3)} monomials. Also check that we can construct this polynomial in poly(M(s,d)) time.
==============================

QED

What does this theorem mean? It shows that we can take the OV function, which is an "OR of AND of ORs" into a polynomial with at most M monomials over F_2, for a certain M. We can view such a polynomial as simply an XOR of at most M ANDs. 
So we are converting this OV function which is a "depth three" circuit into a simpler type of logical expression, a "depth two" circuit. But there are two caveats:

1. Our conversion only works in a randomized way, so there is some chance that our resulting expression gives a wrong answer on a given input.

2. The original OV problem was a formula of size O(s^2 d). Our new expression has size poly(s)*{2d choose O(log s)}, which could potentially be much larger. (Remember we are interested in the case where d = c*log(n) for a constant c.) 
For example, suppose we set s = n^{0.1}. Then the expression would have size poly(n)*{2c*log(n) choose O(log n)}, which could be a large polynomial, larger than n^2. That would mean even the conversion in the OR Conversion Thm would take a large polynomial running time... there is no way we'd get a subquadratic algorithm for OV in that case! 

(It turns out that if we set s to be a bit smaller, s = n^{1/O(log c)}, then problem 2 will go away.)

Let's turn to how we can use this representation to solve OV faster. 
First, suppose we had a *deterministic* reduction from OV_{p,d} to *one* polynomial P_{s,d}. 
Given an instance of OV with n vectors in d dimensions, we could then augment our original self-reduction to be:

(1) Divide the n vectors into O(n/s) groups, where each group has at most s vectors each.
(2') Evaluate P_{s,d} on all (n/s)^2 pairs of groups, where for large s, we want this "batch evaluation" to run in O~((n/s)^2) time. 
(3') If any evaluation returns true, then output "yes" (there's an OV pair) else "no OV pair".

Since we have a randomized reduction instead, we have to be slightly more sophisticated, and run multiple trials of the evaluation process. Our modified self-reduction for OV (now turning into a real OV algorithm) becomes:

(1) Divide the n vectors into O(n/s) groups, where each group has at most s vectors each.
(2a) For t = 60*log(n) independent trials, draw random polynomials P_1,...,P_t representing OV_{2s,d} from the OV Conversion Thm.
(2b) For i=1,...,t, evaluate P_i on all (n/s)^2 pairs of groups (recall this is (n/s)^2 different inputs, each of length 2sd).
(2c) For all (n/s)^2 pairs of groups with vectors v_1,...,v_{2s}, let Majority(P_1(v_1,...,v_{2s}),...,P_t(v_1,...,v_{2s})).
(3'') If any Majority outputs true, then output "yes" (there's an OV pair) else "no OV pair".

==============================
Exercise: Why do steps (2a), (2b), (2c) work, and give us a randomized algorithm that outputs the correct answer with high probability? 

Hint: Using a Chernoff bound on the P_1,...,P_t, you can show that for every pair of groups, the probability that Majority(P_1,...,P_t) outputs the incorrect answer for that pair of groups is less than 1/n^3. Therefore by a union bound over all pairs of groups, the probability that we get an incorrect answer from some pair of groups is less than 1/n. 

(Don't worry if you haven't seen such an analysis before! We'll go over it in the Q&A. It's extremely useful in randomized algorithms!)
==============================

Modulo the above exercise, our algorithm for OV gives the correct answer with high probability. But how on earth can it run faster than n^2 time? The main question is how to implement (2b). The most naive way of implementing step (2b) would require at least {2d choose O(log s)}*n^2/s^2 time, which (as far as we can tell) is never less than n^2...

We are finally ready to show how polynomials are useful here! The key is the following lemma, which will tell us how large we can set s so that Step (2b) can be done efficiently.

Batch Evaluation Lemma [Coppersmith'82] Given any F_2-polynomial P with 2m variables and at most N^{0.1} monomials,
given A, B subsets of {0,1}^m, |A|=|B|=N, we can evaluate P on all N^2 points in (A times B) in O~(N^{1.1}*m + N^2) time.

That is, given any polynomial that is "sparse enough", we can evaluate it on many pairs of points in nearly optimal running time! 

For us, we'll choose the following parameters:

N = n/s (there are n/s groups) and m = s*d (each group has <= s vectors of d bits)
If we can set the parameter s so that the number of monomials in each P_i is at most N^{0.1} = (n/s)^{0.1}, then we can apply the lemma.

Remember that the number of monomials is at most poly(s)*{c*log(n) choose D(log s)} for some constant D > 1. 

==============================
Exercise: Verify that we can apply the Batch Evaluation Lemma for s = n^{delta/(log c)}, for some constant delta > 0.
==============================

Following the exercise and applying the Batch Evaluation Lemma, we can apply Step (2b) in time O~((n/s)^{1.1}*s*d + n^2/s^2) <= O~(n^{2-2*delta/(log c)}). Therefore the entire algorithm takes at most this much time, and we are done!