Hi all!
I have a new sequencing library to evaluate, it's a long range library to use for scaffolding. I mapped it against our genome assembly, and get a nice insert length distribution.
Now I would like to know how many of the read pairs are informative for scaffolding. Looking at the alignments I noticed the reads are not distributed randomly, but cluster in some places. Two read pairs that map to the same positions, but where one read is just shifted 1 or 2 bp are "duplicates" when it comes to how informative they are for scaffolding, but will not be removed by tools like samtools rmdup.
How can I remove read pairs where both reads overlap with both reads from another pair, but are not exactly at the same positions? Is there some function for example in bedtools for this that I missed?
Cheers
I have a new sequencing library to evaluate, it's a long range library to use for scaffolding. I mapped it against our genome assembly, and get a nice insert length distribution.
Now I would like to know how many of the read pairs are informative for scaffolding. Looking at the alignments I noticed the reads are not distributed randomly, but cluster in some places. Two read pairs that map to the same positions, but where one read is just shifted 1 or 2 bp are "duplicates" when it comes to how informative they are for scaffolding, but will not be removed by tools like samtools rmdup.
How can I remove read pairs where both reads overlap with both reads from another pair, but are not exactly at the same positions? Is there some function for example in bedtools for this that I missed?
Cheers
Comment