Skip to main content

4 posts tagged with "data structures"

View All Tags

Bloom Filter

· One min read

A Bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way.

False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" bloom filter); the more elements that are added to the set, the larger the probability of false positives.

Usecases

  • Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
  • An HBase Bloom Filter is an efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.
  • A website's anti-fraud system can use bloom filters to reject banned users effectively.
  • The Google Chrome web browser used to use a Bloom filter to identify malicious URLs.

Skiplist

· One min read

A skip-list is essentially a linked list that allows you to binary search on it. The way it accomplishes this is by adding extra nodes that will enable you to 'skip' sections of the linked-list. Given a random coin toss to create the extra nodes, the skip list should have O(logn) searches, inserts and deletes.

Usecases

  • LevelDB MemTable
  • Redis SortedSet
  • Lucene inverted index

Skip List

· One min read

A skip list is essentially a linked list that allows for binary search. It achieves this by adding extra nodes that enable you to "skip" parts of the linked list. Given a random number generator to create these extra nodes, a skip list has O(log n) complexity for search, insert, and delete operations.

Use Cases

  • LevelDB MemTable
  • Redis Sorted Set
  • Lucene Inverted Index

Bloom Filter

· One min read

A Bloom filter is a data structure that is used to determine whether an element is a member of a set with a much higher space and time efficiency than other general algorithms.

The results obtained using a Bloom filter may yield false positive matches, but cannot yield false negative matches. In other words, the query returns results that are "either possibly present or definitely not present." Elements can be added to the set, but cannot be removed (although this can be addressed with an additional "counting" Bloom filter); the more elements added to the set, the greater the likelihood of false positives.

Use Cases

  • Cassandra uses Bloom filters to determine if an SSTable contains data for a specific row.
  • HBase Bloom filters are an effective mechanism for testing whether a StoreFile contains a specific row or row-column cell.
  • With Bloom filters, a website's anti-cheat system can effectively deny access to banned users.
  • Google's Chrome browser once used Bloom filters to identify malicious links.