|
|
|
Description |
Is one text file a subset of the other?
Or is there some bit of new text that needs to be salvaged?
The basic unix diff tool is sometimes incredibly unsatisfactory for
this purpose -- for example when text has been moved around, or when
there are widespread whitespace differences.
This program compares two files by treating them as unstructured
sets of word sequences. By default words are defined by isAlpha.
Run wordsetdiff with no arguments to print the help information.
|
|
Synopsis |
|
|
|
Documentation |
|
type TupMap a = HashMap [ByteString] a |
|
type Window = [ByteString] |
|
pack_window :: [ByteString] -> Window |
|
toStrict :: ByteString -> ByteString |
|
data CmdFlag |
Command line option flags
| Constructors | NoColor | | NWords Int | | WithPunc | | AlphaOnly | | CaseInsensitive | |
|
|
|
options :: [OptDescr CmdFlag] |
|
safeRead :: String -> Int |
|
data Loc |
Tracking simple source locations as (start,end) inclusive/exclusive character indices.
| Constructors | | Instances | |
|
|
words_wloc :: (Char -> Bool) -> ByteString -> [(ByteString, Loc)] |
Returns words satisfying whose characters satisfy a predicate along with their ZERO BASED locations.
|
|
clump_regions :: [Loc] -> [Loc] |
Cluster regions together if they are almost touching.
Any regions within clump_distance characters of one another are joined.
The result should have no overlaps:
|
|
combine_locs :: [Loc] -> Loc |
Take the bounding box of a list of locations.
|
|
wordmapN :: (Char -> Bool) -> Int -> ByteString -> TupMap (Set Loc) |
Form a map mapping words to a set of occurrence locations within the bytestring.
| This version forms a map using consecutive sequences of
| N words (represented as lists) as the keys instead of individual words.
|
|
sliding_win :: Int -> [(ByteString, Loc)] -> [[(ByteString, Loc)]] |
|
trim_separators :: (Char -> Bool) -> ByteString -> [Loc] -> [Loc] |
The region of interest will end up bloated with separator
charactors around the edges. This will trim those down.
|
|
print_diff_regions :: Bool -> ByteString -> [Loc] -> IO () |
Print out results, i.e. the distinct regions of text within one file and not the other.
|
|
data Config |
Constructors | Cfg | | color :: Bool | | word_sequence_size :: Int | | case_insensitive :: Bool | | with_punctuation :: Bool | |
|
|
|
|
Produced by Haddock version 2.6.1 |