Show HN: CuSBF – Faster GPU Bloom Filter for Sequence Data
cuSBF is a high-performance GPU implementation of the Super Bloom filter designed for efficient processing of sequence data. It significantly improves the speed of k-mer insertion and querying, outperforming various existing methods. The tool is optimized for use on NVIDIA GPUs and supports a range of sequence types.
- ▪cuSBF achieves up to 3427× faster performance compared to Cuckoo-GPU for large filters.
- ▪The implementation uses a unique findere scheme to reduce false positives by leveraging overlapping s-mers.
- ▪It is developed exclusively for Linux environments and requires specific NVIDIA GPU capabilities.
Opening excerpt (first ~120 words) tap to expand
cuSBF Overview cuSBF is a high-performance GPU implementation of the Super Bloom filter, optimized for high-throughput batch k-mer insertion and query on nucleotide (DNA) and protein sequences (or any other sequence type as long as a valid alphabet is provided). It exploits the streaming nature of sequence-derived k-mers by using minimizers to group consecutive k-mers sharing the same minimiser into super-k-mers, assigning all k-mers of a super-k-mer to the same 256-bit memory shard. This amortizes random memory accesses across consecutive k-mer queries, reducing memory-bandwidth pressure. The findere scheme further reduces false positives dramatically by inserting overlapping s-mers and requiring a full run of consecutive s-mer matches.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.