fastq_to_fasta
A template for creation of SeqAn3 apps, with a FASTQ to FASTA example app.
Classes | Public Types | Public Member Functions | Public Attributes | Static Public Attributes | List of all members
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ > Class Template Reference

The HIBF binning directory. A data structure that efficiently answers set-membership queries for multiple bins. More...

#include <raptor/hierarchical_interleaved_bloom_filter.hpp>

Classes

class  counting_agent_type
 Manages counting ranges of values for the hibf::hierarchical_interleaved_bloom_filter. More...
 
class  membership_agent
 Manages membership queries for the hibf::hierarchical_interleaved_bloom_filter. More...
 
class  user_bins
 Bookkeeping for user and technical bins. More...
 

Public Types

using ibf_t = seqan3::interleaved_bloom_filter< data_layout_mode_ >
 The type of an individual Bloom filter. More...
 

Public Member Functions

membership_agent membership_agent () const
 Returns a membership_agent to be used for counting. More...
 
Constructors, destructor and assignment
 hierarchical_interleaved_bloom_filter ()=default
 Defaulted. More...
 
 hierarchical_interleaved_bloom_filter (hierarchical_interleaved_bloom_filter const &)=default
 Defaulted. More...
 
hierarchical_interleaved_bloom_filteroperator= (hierarchical_interleaved_bloom_filter const &)=default
 Defaulted. More...
 
 hierarchical_interleaved_bloom_filter (hierarchical_interleaved_bloom_filter &&)=default
 Defaulted. More...
 
hierarchical_interleaved_bloom_filteroperator= (hierarchical_interleaved_bloom_filter &&)=default
 Defaulted. More...
 
 ~hierarchical_interleaved_bloom_filter ()=default
 Defaulted. More...
 

Public Attributes

std::vector< ibf_tibf_vector
 The individual interleaved Bloom filters. More...
 
std::vector< std::vector< int64_t > > next_ibf_id
 Stores for each bin in each IBF of the HIBF the ID of the next IBF. More...
 
user_bins user_bins
 The underlying user bins. More...
 

Static Public Attributes

static constexpr seqan3::data_layout data_layout_mode = data_layout_mode_
 Indicates whether the Interleaved Bloom Filter is compressed. More...
 

Detailed Description

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
class raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >

The HIBF binning directory. A data structure that efficiently answers set-membership queries for multiple bins.

Template Parameters
data_layout_mode_Indicates whether the underlying data type is compressed. See seqan3::data_layout.
See also
seqan3::interleaved_bloom_filter

This class improves the seqan3::interleaved_bloom_filter by adding additional bookkeeping that allows to establish a hierarchical structure. This structure can then be used to split or merge user bins and distribute them over a variable number of technical bins. In the seqan3::interleaved_bloom_filter, the number of user bins and technical bins is always the same. This causes performance degradation when there are many user bins or the user bins are unevenly distributed.

Terminology

Technical Bin

A Technical Bin represents an actual bin in the binning directory. In the IBF, it stores its kmers in a single Bloom Filter (which is interleaved with all the other BFs).

User Bin

The user may impose a structure on his sequence data in the form of logical groups (e.g. species). When querying the IBF, the user is interested in an answer that differentiates between these groups.

Hierarchical Interleaved Bloom Filter (HIBF)

In constrast to the seqan3::interleaved_bloom_filter, the user bins may be split across multiple technical bins , or multiple user bins may be merged into one technical bin. When merging multiple user bins, the HIBF stores another IBF that is built over the user bins constituting the merged bin. This lower-level IBF can then be used to further distinguish between merged bins.

In this example, user bin 1 was split into two technical bins. Bins 3, 4, and 5 were merged into a single technical bin, and another IBF was added for the merged bin.

The individual IBFs may have a different number of technical bins and differ in their sizes, allowing an efficient distribution of the user bins.

Querying

To query the Hierarchical Interleaved Bloom Filter for values, call hibf::hierarchical_interleaved_bloom_filter::membership_agent() and use the returned hibf::hierarchical_interleaved_bloom_filter::membership_agent. In contrast to the seqan3::interleaved_bloom_filter, the result will consist of indices of user bins.

To count the occurrences in each user bin of a range of values in the Hierarchical Interleaved Bloom Filter, call hibf::hierarchical_interleaved_bloom_filter::counting_agent() and use the returned hibf::hierarchical_interleaved_bloom_filter::counting_agent_type.

Thread safety

The Interleaved Bloom Filter promises the basic thread-safety by the STL that all calls to const member functions are safe from multiple threads (as long as no thread calls a non-const member function at the same time).

Member Typedef Documentation

◆ ibf_t

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
using raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::ibf_t = seqan3::interleaved_bloom_filter<data_layout_mode_>

The type of an individual Bloom filter.

Constructor & Destructor Documentation

◆ hierarchical_interleaved_bloom_filter() [1/3]

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::hierarchical_interleaved_bloom_filter ( )
default

Defaulted.

◆ hierarchical_interleaved_bloom_filter() [2/3]

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::hierarchical_interleaved_bloom_filter ( hierarchical_interleaved_bloom_filter< data_layout_mode_ > const &  )
default

Defaulted.

◆ hierarchical_interleaved_bloom_filter() [3/3]

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::hierarchical_interleaved_bloom_filter ( hierarchical_interleaved_bloom_filter< data_layout_mode_ > &&  )
default

Defaulted.

◆ ~hierarchical_interleaved_bloom_filter()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::~hierarchical_interleaved_bloom_filter ( )
default

Defaulted.

Member Function Documentation

◆ membership_agent()

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
membership_agent raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::membership_agent ( ) const
inline

Returns a membership_agent to be used for counting.

◆ operator=() [1/2]

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
hierarchical_interleaved_bloom_filter& raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::operator= ( hierarchical_interleaved_bloom_filter< data_layout_mode_ > &&  )
default

Defaulted.

◆ operator=() [2/2]

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
hierarchical_interleaved_bloom_filter& raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::operator= ( hierarchical_interleaved_bloom_filter< data_layout_mode_ > const &  )
default

Defaulted.

Member Data Documentation

◆ data_layout_mode

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
constexpr seqan3::data_layout raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::data_layout_mode = data_layout_mode_
staticconstexpr

Indicates whether the Interleaved Bloom Filter is compressed.

◆ ibf_vector

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
std::vector<ibf_t> raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::ibf_vector

The individual interleaved Bloom filters.

◆ next_ibf_id

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
std::vector<std::vector<int64_t> > raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::next_ibf_id

Stores for each bin in each IBF of the HIBF the ID of the next IBF.

Assume we look up a bin b in IBF i, i.e. next_ibf_id[i][b]. If i is returned, there is no lower level IBF, bin b is hence not a merged bin. If j != i is returned, there is a lower level IBF, bin b is a merged bin, and j is the ID of the lower level IBF in ibf_vector.

◆ user_bins

template<seqan3::data_layout data_layout_mode_ = seqan3::data_layout::uncompressed>
user_bins raptor::hierarchical_interleaved_bloom_filter< data_layout_mode_ >::user_bins

The underlying user bins.


The documentation for this class was generated from the following file: