Introduction to Bloom filters


Introduction to Bloom filters

Last week, I learned what bloom filters are and implemented one.

One of the Hackers-in-Residence at Hack Reactor wrote a very helpful article that introduces the concept of Bloom Filter that I would recommend reading: 
http://kiafathi.azurewebsites.net/quick-thoughts-bloom-filters/

Bloom filters overview

In a nutshell, bloom filters allow you to filter out requests to access and manipulate a database by telling you whether a key is potentially stored or is definitely not stored. This way, you can avoid doing an expensive operation by ruling out keys that are definitely not in a database.

An instance of a bloom filter is an object with a storage array that is of a pre-determined size (note: bloom filters cannot be re-sized). At the different indexes, the value can be either 1 (truthy value) or undefined( (falsey value). When you create a bloom filter, you determine the total size of the array and the number of hashing functions that will be used to generate a set of marks. For each key, they will have a unique pattern of marks (the element is set to the value of 1).

Once you have added keys to a bloom filter, you can look up whether the key potentially exists in the bloom filter by using the potential keys and using the hashing functions to generate a pattern of marks. If each spot in that pattern is set to a truthy value of 1, then that means the key potentially exists. Otherwise, if any of the spots in that pattern is set to 0, that means the key definitely does not exist otherwise all of those spots would have a truthy value.

Pros and Cons of Bloom Filters:

  • Pro: Bloom filters allow you to avoid trying to do expensive operations by ruling out keys that definitely do not exist in a database.
  • Pro: By appropriately sizing the bloom filter relative to the number of keys stored, you can keep the false positive rate reasonably low (calculation for false positive rate).
  • Con: Bloom filters do not allow you to definitively conclude whether a key is stored in a filter.
  • Con: Because of the above drawback, bloom filters cannot be automatically re-sized without creating a brand new bloom filter and re-inserting all of the keys through a source of truth.
views