netcom.util
Class BloomFilter

java.lang.Object
  extended bynetcom.util.BloomFilter
All Implemented Interfaces:
java.lang.Cloneable

public class BloomFilter
extends java.lang.Object
implements java.lang.Cloneable

Title: Bloom Filter

Description: A Bloom filter is basically a bit array which has its bits modified from a one-way hash function of an input string. (Each string is hashed to 5 different indexes.) This allows the filter to tell whether certain string 'keys' have been entered into it or not with a very low memory cost. It also allows for a very small false-positive ratio, see this (where k=5) for more details on the false-positive ratio

Copyright: Copyright (c) 2002

Company: NETCOM

See Also:
MD5, BloomUpdate

Field Summary
static int DEFAULT_TABLE_SIZE
          The table size is critical to the false-positive ratio of the filter The size can be loaded from a file under the key "BLOOMFILTER_TABLE_SIZE" but if that is not supplied, this default is used Ideally we want a size-to-keys ratio greater than 10-1 which results in less than 1% false-positives, however, even a 7-1 ratio still only gives 3% false positives see this (k=5) for exact numbers As a sidenote, a rehash upon size flucuations might be considered, however this would require a separate storage of all origional indexes (before modulus operations) and thus would use much more memory than simply making a larger table Also, this must be constant across ALL MDS nodes for proper hasing, so cannot make this a netcom.util.Properties loaded constant
static int HASH_FUNCTIONS
          This is the number of hash functions used for every key It should always be 5 (any changes would require modifying the algorithm, so this is not a properties-loaded constant)
 
Constructor Summary
BloomFilter()
          The constructor for a BloomFilter
 
Method Summary
 BloomUpdate addFullString(java.lang.String keys)
          This method will parse an input string, and add every 'word' to the bloom filter, separating 'words' by
 void addOneKey(java.lang.String key)
          This method will add a single String to the bloom filter, whether spaces exist in it or not
 boolean checkAllWords(java.lang.String keys)
          This method checks to see if 'all' of the 'words' in a string are in the bloom filter, separating 'words' by
 boolean checkOneKey(java.lang.String key)
          This method checks to see if an individual key is located in the bloom filter
 BloomUpdate getBloomUpdate(BloomFilter input_filter)
          Method to return a BloomUpdate object representing the differences between 'this' object and an input Bloomfilter Does not effect the state of any of the BloomFilter objects
 int getSize()
          Getter method to return the number of keys currently held in the filter
 void updateFilter(BloomUpdate update_obj)
          This method updates this bloom filter according to a supplied BloomUpdate object (ie flips the appropriate bits)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

HASH_FUNCTIONS

public static final int HASH_FUNCTIONS
This is the number of hash functions used for every key It should always be 5 (any changes would require modifying the algorithm, so this is not a properties-loaded constant)

See Also:
Constant Field Values

DEFAULT_TABLE_SIZE

public static final int DEFAULT_TABLE_SIZE
The table size is critical to the false-positive ratio of the filter The size can be loaded from a file under the key "BLOOMFILTER_TABLE_SIZE" but if that is not supplied, this default is used Ideally we want a size-to-keys ratio greater than 10-1 which results in less than 1% false-positives, however, even a 7-1 ratio still only gives 3% false positives see this (k=5) for exact numbers As a sidenote, a rehash upon size flucuations might be considered, however this would require a separate storage of all origional indexes (before modulus operations) and thus would use much more memory than simply making a larger table Also, this must be constant across ALL MDS nodes for proper hasing, so cannot make this a netcom.util.Properties loaded constant

See Also:
Constant Field Values
Constructor Detail

BloomFilter

public BloomFilter()
The constructor for a BloomFilter

Method Detail

getSize

public int getSize()
Getter method to return the number of keys currently held in the filter

Returns:
the number of keys in the filter

addFullString

public BloomUpdate addFullString(java.lang.String keys)
This method will parse an input string, and add every 'word' to the bloom filter, separating 'words' by

Parameters:
keys - String with all words wished to be added
Returns:
a BloomUpdate object signifying the changes made to the BitSet

addOneKey

public void addOneKey(java.lang.String key)
This method will add a single String to the bloom filter, whether spaces exist in it or not

Parameters:
key - String with or without spaces to be inserted as a single key
Returns:
A BloomUpdate object signifying the changes made to the BitSet

checkAllWords

public boolean checkAllWords(java.lang.String keys)
This method checks to see if 'all' of the 'words' in a string are in the bloom filter, separating 'words' by

Parameters:
keys - String with all words wished to be checked
Returns:
A Boolean value of whether 'all' of the words are located in the filter

checkOneKey

public boolean checkOneKey(java.lang.String key)
This method checks to see if an individual key is located in the bloom filter

Parameters:
key - String with exact key to check in the filter
Returns:
A Boolean value of whether the key was in the filter or not

updateFilter

public void updateFilter(BloomUpdate update_obj)
This method updates this bloom filter according to a supplied BloomUpdate object (ie flips the appropriate bits)

Parameters:
update_obj - BloomUpdate object representing changes to the filter

getBloomUpdate

public BloomUpdate getBloomUpdate(BloomFilter input_filter)
Method to return a BloomUpdate object representing the differences between 'this' object and an input Bloomfilter Does not effect the state of any of the BloomFilter objects

Parameters:
input_filter - the filter to compare against
Returns:
a BloomUpdate object representing the differences