SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
finding hard-clipped positions in bam files tospo Bioinformatics 0 03-22-2013 06:28 AM
Using bam files to align to custom sequences johnibozo RNA Sequencing 0 12-22-2012 03:54 AM
Fastest way to extract differing positions from each alignment in a BAM file CHRYSES Bioinformatics 5 12-14-2011 11:28 AM
Consensus part from sequence read(fastq) and align(BAM) files culmen Bioinformatics 5 12-21-2010 03:57 AM
Filter BAM records by positions using picard guavajuice Bioinformatics 0 04-02-2010 02:45 PM

Reply
 
Thread Tools
Old 07-14-2013, 11:53 PM   #1
ShellfishGene
Member
 
Location: Germany

Join Date: Mar 2009
Posts: 14
Default Removing pairs that align to almost the same positions from bam

Hi all!

I have a new sequencing library to evaluate, it's a long range library to use for scaffolding. I mapped it against our genome assembly, and get a nice insert length distribution.
Now I would like to know how many of the read pairs are informative for scaffolding. Looking at the alignments I noticed the reads are not distributed randomly, but cluster in some places. Two read pairs that map to the same positions, but where one read is just shifted 1 or 2 bp are "duplicates" when it comes to how informative they are for scaffolding, but will not be removed by tools like samtools rmdup.
How can I remove read pairs where both reads overlap with both reads from another pair, but are not exactly at the same positions? Is there some function for example in bedtools for this that I missed?

Cheers
ShellfishGene is offline   Reply With Quote
Old 07-15-2013, 12:10 PM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Unless someone more knowledgeable than I replies, I suspect the correct answer will be "write some code". This seems to be a unique requirement.
westerman is offline   Reply With Quote
Old 07-16-2013, 02:00 AM   #3
ShellfishGene
Member
 
Location: Germany

Join Date: Mar 2009
Posts: 14
Default

Quote:
Originally Posted by westerman View Post
Unless someone more knowledgeable than I replies, I suspect the correct answer will be "write some code". This seems to be a unique requirement.
Hmm, is that really so unique? I just want to know how good a mate pair (or fosmid) library is, as compared to other libs, for scaffolding. Looking at the mapping it seems often the case that read pairs overlap 90%, but not always 100%. I find this would be misleading when estimating the number of informative pairs.
Maybe I'll write a script if I find the time. Thinking about it, maybe the coverage gives the same information when analyzed correctly.
ShellfishGene is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO