Bloom filters are efficient probabilistic data structures used to quickly determine if an element is part of a set, allowing for rapid membership queries with a trade-off for false positives. They utilize a bit vector and multiple hash functions, where the choice of hash functions and the size of the filter can be optimized based on the expected number of elements and acceptable false positive rates. The article also discusses various implementations and use cases of Bloom filters across different technologies.
Steinar H. Gunderson discusses modern perfect hashing techniques for mapping a predefined set of strings to integers, focusing on optimizing performance for small sets. He critiques existing methods, particularly the use of PEXT instructions, and shares a solution inspired by the chess community's approach to avoid collisions in string hashing. The article includes code examples demonstrating his methods for handling specific string lengths efficiently.