Previous: String Search, Up: Sorting and Searching


7.2.10 Sequence Comparison

(require 'diff)

diff:edit-length implements the algorithm:

The values returned by diff:edit-length can be used to gauge the degree of match between two sequences.

diff:edits and diff:longest-common-subsequence combine the algorithm with the divide-and-conquer method outlined in:

If the items being sequenced are text lines, then the computed edit-list is equivalent to the output of the diff utility program. If the items being sequenced are words, then it is like the lesser known spiff program.

— Function: diff:longest-common-subsequence array1 array2 p-lim
— Function: diff:longest-common-subsequence array1 array2

array1 and array2 are one-dimensional arrays.

The non-negative integer p-lim, if provided, is maximum number of deletions of the shorter sequence to allow. diff:longest-common-subsequence will return #f if more deletions would be necessary.

diff:longest-common-subsequence returns a one-dimensional array of length (quotient (- (+ len1 len2) (diff:edit-length array1 array2)) 2) holding the longest sequence common to both arrays.

— Function: diff:edits array1 array2 p-lim
— Function: diff:edits array1 array2

array1 and array2 are one-dimensional arrays.

The non-negative integer p-lim, if provided, is maximum number of deletions of the shorter sequence to allow. diff:edits will return #f if more deletions would be necessary.

diff:edits returns a vector of length (diff:edit-length array1 array2) composed of a shortest sequence of edits transformaing array1 to array2.

Each edit is an integer:

k > 0
Inserts (array-ref array1 (+ -1 j)) into the sequence.
k < 0
Deletes (array-ref array2 (- -1 k)) from the sequence.

— Function: diff:edit-length array1 array2 p-lim
— Function: diff:edit-length array1 array2

array1 and array2 are one-dimensional arrays.

The non-negative integer p-lim, if provided, is maximum number of deletions of the shorter sequence to allow. diff:edit-length will return #f if more deletions would be necessary.

diff:edit-length returns the length of the shortest sequence of edits transformaing array1 to array2.

     (diff:longest-common-subsequence "fghiejcklm" "fgehijkpqrlm")
     ⇒ "fghijklm"
     
     (diff:edit-length "fghiejcklm" "fgehijkpqrlm")
     ⇒ 6
     
     (diff:edits "fghiejcklm" "fgehijkpqrlm")
     ⇒ #A:fixZ32b(3 -5 -7 8 9 10)
            ; e  c  h p q  r