SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Direction/Orientation of Illumina read in SAM file flobpf Bioinformatics 3 11-12-2013 04:06 AM
bwa mapped, interesting SAM output mo_hit4u Bioinformatics 1 11-20-2012 03:20 PM
weird BWA SAM (samse) output attilav Bioinformatics 3 12-21-2011 05:15 PM
sam output from bwa for SOLiD reads in colorspace? nisha SOLiD 19 01-07-2010 05:05 AM
sam output from bwa colorspace alignment Mr Mutundes Bioinformatics 0 12-15-2009 04:02 AM

Reply
 
Thread Tools
Old 05-10-2017, 09:03 PM   #1
jorvis
Junior Member
 
Location: Tulsa, OK

Join Date: Feb 2017
Posts: 3
Default Read direction lost with BWA in SAM output?

I've tried two different header styles in my input FASTQ headers when running BWA:

@SN7001163:162:C4A1UACXX:1:1101:1062:2076/1

and

@SN7001163:162:C4A1UACXX:1:1101:1062:2076 1:N:0:GTCCGCA

My goal is to be able to tell which mate I'm looking at in the FASTQ file, but it seems to get stripped in the SAM output, where from "bwa sampe" I get lines like this:

Code:
SN7001163:162:C4A1UACXX:1:1101:1062:2076	77	*	0	0	*	*	0	0	GTTTGCTTGGCTGTGAGCTTGTCCGACACGGGCCACCAGGAGAGTGAGATACACCGAGACGAGCATCCTGTCTTTCTCTCGGACGGTTCCACAACAAATAA	@[email protected]?;<;F>?<2A<E<FFC9:FE8):[email protected][email protected]=;=;D;).).7>77==EB'93;;[email protected]@:@(:3,+(4::@B>@5?-<@B<?<34>ABB1<8:43
SN7001163:162:C4A1UACXX:1:1101:1062:2076	141	*	0	0	*	*	0	0	GCCATGTTGAGTGAGAATTTATTATTTGTTGTGGAACC	;<;;([email protected])@)84):46=69416)[email protected]:@:<=1([email protected]?
How can I tell which of these alignment lines refers to which input mate?
jorvis is offline   Reply With Quote
Old 05-10-2017, 09:12 PM   #2
jorvis
Junior Member
 
Location: Tulsa, OK

Join Date: Feb 2017
Posts: 3
Default

I realize that those two reads didn't actually align, so the SAM lines were pretty minimal. Here are a pair which did:

Code:
SN7001163:162:C4A1UACXX:1:1101:1174:2116        81      Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 292     37      101M    =       294     -99     CTCGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATT   >BBDDDDDDDDDDDBDFFHHHHIIHJJJJJJJJJJJJJJIIJJJJJJJJJJJJJJIIIJJJJIJJJJJJJIJJJJJJJHJJJIJJJJJHHHHHFFFFFCCC   XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101
SN7001163:162:C4A1UACXX:1:1101:1174:2116        161     Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 294     37      101M    =       292     99      CGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATTAG   BCBFFFFFHHHH?HIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHIJJJJGHIIHHIJJJJJJJJJIJJJJJHHHHHHFFFFFFFEEEEEEEEDDDDDDDC   XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:99C1
jorvis is offline   Reply With Quote
Old 05-10-2017, 11:56 PM   #3
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

[original post deleted because I misunderstood the question]

gingers answer below is correct. You can simply confirm this by swapping R1 and R2 reads.

Last edited by WhatsOEver; 05-11-2017 at 01:36 AM.
WhatsOEver is offline   Reply With Quote
Old 05-11-2017, 01:11 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

There are flags in the SAM file for the first (and last) read attached to a template sequence. If a bitwise and of the flag field with 0x40 returns non-zero, then it is the first read of a template sequence. In the case of the two examples you have, here is the full flag breakdown:

Code:
81 = 0101 0001
             Paired
        Reverse-complemented
      First read in the template

161 = 1010 0001
              Paired
        Other read is reverse-complemented
      Last read in the template
See https://samtools.github.io/hts-specs/SAMv1.pdf

These flags can be filtered using samtools view:

Code:
samtools view -b -f 0x40 in.bam > out_FirstRead.bam
samtools view -b -F 0x40 in.bam > out_notFirst.bam
The distinction between "last" and "second" is not important for most purposes, but there are some situations where more than two reads can be associated with the same template sequence.

Whether or not a read is first or last is particularly important for strand-specific sequencing, because it allows you to distinguish between templates that are oriented in the same direction as the primary transcript, and those that are not (e.g. siRNA).

Last edited by gringer; 05-11-2017 at 01:14 AM.
gringer is offline   Reply With Quote
Reply

Tags
bwa, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO