wordsetdiffContentsIndex
Main
Description

Is one text file a subset of the other? Or is there some bit of new text that needs to be salvaged?

The basic unix diff tool is sometimes incredibly unsatisfactory for this purpose -- for example when text has been moved around, or when there are widespread whitespace differences.

This program compares two files by treating them as unstructured sets of word sequences. By default words are defined by isAlpha.

Run wordsetdiff with no arguments to print the help information.

Synopsis
type TupMap a = HashMap [ByteString] a
type Window = [ByteString]
pack_window :: [ByteString] -> Window
toStrict :: ByteString -> ByteString
data CmdFlag
= NoColor
| NWords Int
| WithPunc
| AlphaOnly
| CaseInsensitive
options :: [OptDescr CmdFlag]
safeRead :: String -> Int
data Loc = Loc Int64 Int64
words_wloc :: (Char -> Bool) -> ByteString -> [(ByteString, Loc)]
clump_regions :: [Loc] -> [Loc]
combine_locs :: [Loc] -> Loc
wordmapN :: (Char -> Bool) -> Int -> ByteString -> TupMap (Set Loc)
sliding_win :: Int -> [(ByteString, Loc)] -> [[(ByteString, Loc)]]
trim_separators :: (Char -> Bool) -> ByteString -> [Loc] -> [Loc]
print_diff_regions :: Bool -> ByteString -> [Loc] -> IO ()
data Config = Cfg {
color :: Bool
word_sequence_size :: Int
case_insensitive :: Bool
with_punctuation :: Bool
}
Documentation
type TupMap a = HashMap [ByteString] a
type Window = [ByteString]
pack_window :: [ByteString] -> Window
toStrict :: ByteString -> ByteString
data CmdFlag
Command line option flags
Constructors
NoColor
NWords Int
WithPunc
AlphaOnly
CaseInsensitive
options :: [OptDescr CmdFlag]
safeRead :: String -> Int
data Loc
Tracking simple source locations as (start,end) inclusive/exclusive character indices.
Constructors
Loc Int64 Int64
show/hide Instances
words_wloc :: (Char -> Bool) -> ByteString -> [(ByteString, Loc)]
Returns words satisfying whose characters satisfy a predicate along with their ZERO BASED locations.
clump_regions :: [Loc] -> [Loc]
Cluster regions together if they are almost touching. Any regions within clump_distance characters of one another are joined. The result should have no overlaps:
combine_locs :: [Loc] -> Loc
Take the bounding box of a list of locations.
wordmapN :: (Char -> Bool) -> Int -> ByteString -> TupMap (Set Loc)
Form a map mapping words to a set of occurrence locations within the bytestring. | This version forms a map using consecutive sequences of | N words (represented as lists) as the keys instead of individual words.
sliding_win :: Int -> [(ByteString, Loc)] -> [[(ByteString, Loc)]]
trim_separators :: (Char -> Bool) -> ByteString -> [Loc] -> [Loc]
The region of interest will end up bloated with separator charactors around the edges. This will trim those down.
print_diff_regions :: Bool -> ByteString -> [Loc] -> IO ()
Print out results, i.e. the distinct regions of text within one file and not the other.
data Config
Constructors
Cfg
color :: Bool
word_sequence_size :: Int
case_insensitive :: Bool
with_punctuation :: Bool
Produced by Haddock version 2.6.1