The Burrows-Wheeler Transform
9 hours ago
- #Sequence Alignment
- #Data Compression
- #Burrows-Wheeler Transform
- The Burrows-Wheeler Transform (BWT) is a key algorithm used in data compression (e.g., bzip2) and sequence alignment tools (e.g., bowtie, bwa).
- BWT has three main properties: it is not intuitive, it clusters similar characters, and it is reversible with a marker like '$'.
- The BWT process involves three steps: writing all rotations of a string, sorting these rotations, and taking the last column as the BWT string.
- The '$' marker is essential for reversibility, helping identify the original string among rotations.
- BWT's sorting step groups similar patterns, making it useful for repetitive sequences in English or DNA.
- Decoding BWT involves reconstructing the original string by prepending the BWT string and sorting iteratively.
- BWT enables efficient sequence alignment using the Last-to-First (LF) Mapping property, which preserves character order between columns.
- LF Mapping allows pattern searching by tracking characters between the first and last columns of the BWT matrix.
- Advanced topics include using Suffix Arrays for efficient BWT generation and applications in bioinformatics tools like bowtie2.