Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read direction lost with BWA in SAM output?

    I've tried two different header styles in my input FASTQ headers when running BWA:

    @SN7001163:162:C4A1UACXX:1:1101:1062:2076/1

    and

    @SN7001163:162:C4A1UACXX:1:1101:1062:2076 1:N:0:GTCCGCA

    My goal is to be able to tell which mate I'm looking at in the FASTQ file, but it seems to get stripped in the SAM output, where from "bwa sampe" I get lines like this:

    Code:
    SN7001163:162:C4A1UACXX:1:1101:1062:2076	77	*	0	0	*	*	0	0	GTTTGCTTGGCTGTGAGCTTGTCCGACACGGGCCACCAGGAGAGTGAGATACACCGAGACGAGCATCCTGTCTTTCTCTCGGACGGTTCCACAACAAATAA	@?@DDD?;<;F>?<2A<E<FFC9:FE8):8@?FFF@FF=;=;D;).).7>77==EB'93;;3=@@:@(:3,+(4::@B>@5?-<@B<?<34>ABB1<8:43
    SN7001163:162:C4A1UACXX:1:1101:1062:2076	141	*	0	0	*	*	0	0	GCCATGTTGAGTGAGAATTTATTATTTGTTGTGGAACC	;<;;(42@9)@)84):46=69416)2@:@:<=1(66@?
    How can I tell which of these alignment lines refers to which input mate?

  • #2
    I realize that those two reads didn't actually align, so the SAM lines were pretty minimal. Here are a pair which did:

    Code:
    SN7001163:162:C4A1UACXX:1:1101:1174:2116        81      Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 292     37      101M    =       294     -99     CTCGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATT   >BBDDDDDDDDDDDBDFFHHHHIIHJJJJJJJJJJJJJJIIJJJJJJJJJJJJJJIIIJJJJIJJJJJJJIJJJJJJJHJJJIJJJJJHHHHHFFFFFCCC   XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101
    SN7001163:162:C4A1UACXX:1:1101:1174:2116        161     Locus_14841_Transcript_1__1_Confidence_0.750_Length_603 294     37      101M    =       292     99      CGTCATTTCAATGCCCCCTCTCATATCAGAAGGAAAATCATGAGTGCTCCTTTGTCAAAAGAGCTGAGAGCAAAGTACAATGTGAGAAGTATGCCCATTAG   BCBFFFFFHHHH?HIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHIJJJJGHIIHHIJJJJJJJJJIJJJJJHHHHHHFFFFFFFEEEEEEEEDDDDDDDC   XT:A:U  NM:i:1  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:99C1

    Comment


    • #3
      [original post deleted because I misunderstood the question]

      gingers answer below is correct. You can simply confirm this by swapping R1 and R2 reads.
      Last edited by WhatsOEver; 05-11-2017, 12:36 AM.

      Comment


      • #4
        There are flags in the SAM file for the first (and last) read attached to a template sequence. If a bitwise and of the flag field with 0x40 returns non-zero, then it is the first read of a template sequence. In the case of the two examples you have, here is the full flag breakdown:

        Code:
        81 = 0101 0001
                     Paired
                Reverse-complemented
              [B]First read in the template[/B]
        
        161 = 1010 0001
                      Paired
                Other read is reverse-complemented
              [B]Last read in the template[/B]
        See https://samtools.github.io/hts-specs/SAMv1.pdf

        These flags can be filtered using samtools view:

        Code:
        samtools view -b -f 0x40 in.bam > out_FirstRead.bam
        samtools view -b -F 0x40 in.bam > out_notFirst.bam
        The distinction between "last" and "second" is not important for most purposes, but there are some situations where more than two reads can be associated with the same template sequence.

        Whether or not a read is first or last is particularly important for strand-specific sequencing, because it allows you to distinguish between templates that are oriented in the same direction as the primary transcript, and those that are not (e.g. siRNA).
        Last edited by gringer; 05-11-2017, 12:14 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X