Duplicates in BAM/SAM files
Identifying duplicate reads in BAM or SAM files is crucial when analyzing sequencing data, especially in applications such as ChIP-seq and RNA-seq. The MarkDuplicates tool locates and tags duplicate reads, including PCR duplicates and optical duplicates, based on mate CIGAR, ensuring accurate downstream analysis. The tool employs various metrics to distinguish true PCR duplicates from similar reads that may have arisen from random coincidences. MarkDuplicates also flags optical duplicates, which arise from the imaging process during sequencing, to prevent them from being counted twice.
