SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
samtools filter PE reads on mapping quality of only one of the reads pguilha Bioinformatics 7 11-21-2013 08:00 AM
How can I filter out reads who or whose pairs are unmapped in samtools? maimaiti2008 Bioinformatics 1 10-19-2013 04:18 PM
Samtools Cigar error mgaldos Bioinformatics 10 09-20-2013 12:49 PM
CIGAR format in SAMtools Bruins Bioinformatics 12 02-03-2012 04:50 AM
samtools view error: CIGAR and sequence length are inconsistent (tophat/bowtie) glacierbird Bioinformatics 2 06-29-2010 01:58 AM

Reply
 
Thread Tools
Old 01-21-2016, 01:56 AM   #1
krapulaxdoctor
Member
 
Location: Netherlands

Join Date: May 2015
Posts: 20
Unhappy Samtools filter reads based on CIGAR values

Dear all,

I would like to ask for some help with samtools:

I found many threads about using samtools to filter reads based on some criteria, however, I did not find any clue how to filter reads based on the CIGAR value.

How can I filter reads (from a BAM) from RNA-seq files that are non-splitted / non-gapped?

From one BAM, I would like to have two BAMs ne with gapped reads and one with non-gapped reads.

Could someone write an example?
Thank you for the help in advance,
krapulaxdoctor is offline   Reply With Quote
Old 01-21-2016, 06:24 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Gapped reads are those containing the N operator in the cigar string right? You could do this:

Code:
samtools view -h in.bam \
| awk '{if($0 ~ /^@/ || $6 ~ /N/) {print $0}}' \
| samtools view -Sb - > gapped.bam
And for ungapped reads:

Code:
samtools view -h in.bam \
| awk '{if($0 ~ /^@/ || $6 !~ /N/) {print $0}}' \
| samtools view -Sb - > ungapped.bam
dariober is offline   Reply With Quote
Old 01-21-2016, 06:50 AM   #3
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Be aware that deletions (CIGAR string D) also give rise to gapped alignments, and the representation as N vs. D depends on the gap length and the aligner.
HESmith is offline   Reply With Quote
Old 01-21-2016, 07:56 AM   #4
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by HESmith View Post
Be aware that deletions (CIGAR string D) also give rise to gapped alignments, and the representation as N vs. D depends on the gap length and the aligner.
True, but I surmise the OP wants to select reads spanning different exons as opposed those only assigned to one exon. If this is the case, I think tophat uses N to mark gaps between exons (don't know other aligners).
dariober is offline   Reply With Quote
Old 01-22-2016, 12:00 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by dariober View Post
True, but I surmise the OP wants to select reads spanning different exons as opposed those only assigned to one exon. If this is the case, I think tophat uses N to mark gaps between exons (don't know other aligners).
Realistically, any aligner that advertises the ability to handle spliced reads will do this too. If not, it shouldn't be used.
dpryan is offline   Reply With Quote
Old 01-22-2016, 12:07 AM   #6
krapulaxdoctor
Member
 
Location: Netherlands

Join Date: May 2015
Posts: 20
Default

Thank you for all the quick responses.
It is a great help for me.

dariober:
Quote:
True, but I surmise the OP wants to select reads spanning different exons as opposed those only assigned to one exon. If this is the case...
Yes, that would be the main goal. Thanks.
krapulaxdoctor is offline   Reply With Quote
Old 01-22-2016, 04:41 AM   #7
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Quote:
Originally Posted by dpryan View Post
Realistically, any aligner that advertises the ability to handle spliced reads will do this too. If not, it shouldn't be used.
I agree 100%, but it would not be the first time that someone used an inappropriate tool for the job. The alert was directed at the OP, in case the awk filter did not produce the expected results.
HESmith is offline   Reply With Quote
Reply

Tags
samtools filter cigar bam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO