Bill Thies thies@mit.edu Office hours: Sat 1-3pm, 36-153 http://cag.lcs.mit.edu/~thies/6.046 Announcements: - PS 9 posted (NOT due). Solutions will be posted. - Final exam Mon 12/13, 9am-12pm Today: - Maximum matching - String matching (note: these topics are unrelated!) ---- Acknowledgements: - Parts of these notes were adapted from slides by Lee Wee Sun ---- Matching Def. Given undirected graph G=(V,E), a matching is a subset of edges M \in E s.t. for all vertices v \in V, at most one edge of M is incident on v Example: x----x x x----x x |\ | | |* | * | \ | | Matching | * | * | \ | | --------> | * | * | \| | | *| * x----x----x---x x----x----x---x |M| = 2 Def. A maximum matching is a matching of maximum cardinality i.e., a matching M such that for any matching M', we have |M| >= |M'| Example: x****x x |\ | * | \ | * | \ | * | \| * x****x----x---x |M| = 3 ---- Maximum Bipartite Matching Def. Graph G=(V,E) is bipartite if: - V can be partitioned into two disjoint sets, V = L U R - For all edges (u,v) \in E, either u \in L and v \in R or u \in R and v \in L Example: [edges run only from L to R] x x x x x x x L R (machines) (jobs) << Application: machines in L, jobs in R, to be performed simultaneously. Matching provides work for as many machines as possible. >> ---- Finding Bipartite matching - Can use Ford-Fulkerson to find max matching in undirected bipartite graph in O(VE) time x x x x x x s x t x x L R - unit capacities on all edges - label s and t Claim: max-flow on this graph corresponds to max-matching 1. integral flow f <--> matching M (|M| = |f|) 2. max flow is integral 3. O(VE) ---- Lemma. (==>) If M is matching in G, then \exists integral flow f in G' s.t. |f| = |M| - if (u,v) \in M, f(s, u) = f(u, v) = f(v, t) = 1 f(u, s) = f(v, u) = f(t, v) = -1 other edges: f(u, v) = 0 - is this a flow? 1. capacity constraints [ f(u, v) <= 1 ] 2. skew symmetry [ f(u, v) = - f(v, u) ] 3. flow conservation [ (u,v) is only edge incident to (u,v) f(s, u) - f(u, v) = 0 f(u, v) - f(v, t) = 0 ] - |f| = flow across cut {L U {s}, R U {t}} = |M| ---- Lemma. (<==) If f is integral flow in G', then \exists matching M s.t. |M| = |f| - let M = { {u, v}: u \in L, v \in R, and f(u,v) > 0 } - M is matching - if one unit of flow enters u \in L, one must leave (flow conservation) - because flow is integral, must leave on one edge << same argument for R >> - |M| = |f| << can also see with cut argument >> | Alternate argument (not given in class): | | |M| = f(L, R) use f(Z,X U Y) = f(Z,X) + f(Z,Y) | --> f(L, V') = f(L, s) + f(L, L) + f(L, R) + f(L, t) | | = f(L, V') - f(L, s) - f(L, L) - f(L, t) | 0 + f(s, L) 0 0 | | | | | | flow cons. skew symmetry no edges | | = f(s, L) | = f(s, V') [all edges out of s go to L] | = |f| ---- Theoremm (Integrality). If capacity function c takes on only integral values, then max flow f produced by Ford-Fulkerson satisfies: - |f| integral - f(u,v) integral (forall (u,v) \in E) Proof: induction on the number of iterations << each augmentation must augment by integer amount >> ---- On integral graph, runtime of Ford-Fulkderson is O(E|f*|) where f* is maximum flow. << justification: have O(f*) augmentations. Each takes O(E) time.>> |f*| = |M| = O(V) |E'| = |E| + |V| = Theta(E) thus overal runtime is O(VE) ---- String Matching: Find pattern P[1..m] in text T[1..n] Algorithm Preprocessing time Matching Time --------------------------------------------------------------------------------------- Naive 0 O((n-m+1)m) (lec) Rabin-Karp Theta(m) hashing takes Theta(n) time, but... << randomized >> hashing + checking takes O(n+cm), where c is # of hits = O(n) if # hits is small = O(nm) if worst case #hits Knuth-Morris-Pratt Theta(m) Theta(n) - worst case (now) Finite automaton O(m |Sigma|) Theta(n) << Finite automaton has same idea as Knuth-Morris-Pratt >> ---- Finite Automata (also "Finite State Machine") - set of states 1 "start state" >= 1 "accept states" - input causes state transitions - "accept" state indicates something about input Ex: detect binary strings with odd # of 1's 10011 --> 3 ones State transition diagram: 1 0 <--- || Odd --> Even <-- start || 1 0 State transition table: 0 1 odd odd even even even odd ---- FSM for String Matching T: yo yo, yo soy P: yoyos Naive algorithm: y o y o y o s o y 1 y o y o x 2 x 3 y o y o s 4 x 5 y o x 6 x 7 x 8 x 9 x Improved: y o y o y o s o y 1 y o y o x 2 o s 3 x 4 x ---- Idea: use finite automaton - when current match fails, fall back to suffix of current match that is prefix of P - 1 state for each matching prefix y ---------------------------------- | y | |--------------- | || | | y || o y | o s | e --> y --> yo --> yoy --> yoyo --> yoyos || | | y -------- y # state y o s -------------------------- 0 e y e e 1 y y yo e 2 yo yoy e e 3 yoy y yoyo e 4 yoyo yoy e yoyos 5 yoyos y e e [trace through parsing example] ---- Matching time: O(n) - each character of input is read at most once Time to build table: Naive: T(m,|E|) = O(m*E*(check if any of m states is suffix of entry)) T(m,|E|) = O(m*E*m*m) = O(m^3 * |E|) Clever: O(m|E|) << idea: compute overlap of string with itself in incremental way >> << Knuth-Morris-Pratt uses exact same idea to only store the overlap between P and itself. Upon finding non-matching character, backtracks to the last overlap and retries the character. >> ---- We did not cover this in recitation | | Extra time: can represent max-flow as linear program | | - given directed graph G=(V,E), capacities c(u,v)>=0 | - constraints on net flow: | capacity constraint: f(u,v) <= c(u,v) for each u,v \in V | skew symmetry: f(u,v) = -f(v,u) for each u,v \in V | flow conservation: \sum_{v \in V} f(u,v) = 0 for each u \in V - {s,t} | - objective function: \sum_{v \in V} f(s, v)