Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark paired-end read orientation

    Hi
    I have directional paired-end library for BS-seq and use bismark to analyze the data. The program runs fine. However, I notice that the read orientation is not correct. The paired reads should have opposite orientation, but bismark output shows same orientation for a pair, either on the + strand or on the - strand. I think it won't affect methylation estimation, but the visualization is not correct. Does anyone have idea how to fix it?

    Here is two pairs of reads:
    ID1 2/1 67 chr10 127850053 ...
    ID1 2/2 131 chr10 127850058 ...
    ID2 2/1 115 chr1 177168854 ...
    ID2 2/2 179 chr1 177168819 ...

    Thanks

  • #2
    This is indeed somewhat of a problem because the FLAG value does not consider that there are four different bisulfite strands of DNA (which is the case for paired-end files). We are therefore trying to use FLAG tags which take the strand identity into account (this is taken directly from Bismark):

    Strands OT and CTOT will be treated as aligning to the top strand (both sequences are scored as aligning to the top strand, and strands OB and CTOB will be treated as aligning to the bottom strand (both sequences are scored as reverse complemented sequences)

    Code:
     my $flag_1;                                                          # FLAG variable used for SAM format
      my $flag_2;
    
      if ($index == 0){       # OT
        $flag_1 = 67;                                                      # Read 1 is on the + strand  (1+2+64) (Read 2 is technically reverse-complemented, but we do not score it)
        $flag_2 = 131;                                                     # Read 2 is on - strand but informative for the OT        (1+2+128)
      }
      elsif ($index == 1){    # CTOB
        $flag_1 = 115;                                                     # Read 1 is on the + strand, we score for OB  (1+2+16+32+64)
        $flag_2 = 179;                                                     # Read 2 is on the - strand  (1+2+16+32+128)
      }
      elsif ($index == 2){    # CTOT
        $flag_1 = 67;                                                      # Read 1 is on the - strand (CTOT) strand, but we score it for OT (1+2+64)
        $flag_2 = 131;                                                     # Read 2 is on the + strand, score it for OT (1+2+128)
      }
      elsif ($index == 3){    # OB
        $flag_1 = 115;                                                     # Read 1 is on the - strand, we score for OB  (1+2+16+32+64)
        $flag_2 = 179;                                                     # Read 2 is on the - strand  (1+2+16+32+128)
      }
    If you need to change this somehow then you should probably just find the paired-end SAM output subroutine towards the end of Bismark and change the FLAG tags to what you need them to be. If you want to discuss this further you can always drop me an email.

    Best,
    Felix

    Comment


    • #3
      Thanks for the prompt response. I didn't locate the code you showed before I posted the question. It is helpful to know. For now, it is OK with me as long as I know that is the reason I get such orientation arrangement. I will definitely contact you if I need help. Thanks.

      Comment


      • #4
        I'm also working my way through the details of paired-end mode. I'll just add another point I found useful in understanding the paired-end Bismark mode.

        I was confused how situations OT and CTOB in Felix's code could be distinguished given that in both cases read 1 aligns to the + strand and read 2 aligns to the - strand. But if you look elsewhere in the code you'll see the $index variable is defined by which version of "in silico bisulfite-conversion" the reads and genome have undergone - either CT or GA conversion. See the following:
        Code:
          if ($read_conversion_1 eq 'CT' and $genome_conversion eq 'CT'){
            $index = 0; ## this is OT   (original top strand)
          }	
          elsif ($read_conversion_1 eq 'GA' and $genome_conversion eq 'GA'){
            $index = 1; ## this is CTOB (complementary to OB)
          }
          elsif ($read_conversion_1 eq 'GA' and $genome_conversion eq 'CT'){
            $index = 2; ## this is CTOT (complementary to OT)
          }
          elsif ($read_conversion_1 eq 'CT' and $genome_conversion eq 'GA'){
            $index = 3; ## this is OB   (original bottom)
          }
        Felix - I'm confused about the -/- orientation for when $index==3. I understand the Bismark FLAG values but I would've thought that your comment in the code should be:

        Code:
          elsif ($index == 3){    # OB
            $flag_1 = 115;                                                     # Read 1 is on the - strand, we score for OB  (1+2+16+32+64)
            $flag_2 = 179;                                                     # Read 2 is on the [B]+[/B] strand  (1+2+16+32+128)
          }
        i.e. that while the paired-read is informative for the OB strand, the individual reads are in the -/+ orientation (as they in the CTOT case).

        My understand of "normal" Illumina paired-end reads are always oriented ------> <------- and thus they were always scored as having opposing orientations. There's a better picture of what I mean here http://www.broadinstitute.org/softwa...r_orientations

        I think I understand the other 3 cases but I'd like to make sure my thinking isn't wrong on the fourth.
        Thanks,
        Pete
        Last edited by PeteH; 03-15-2012, 10:18 PM.

        Comment


        • #5
          Hi Pete,

          you are right in that the comment for alignments to the OB strand (index = 3) should be -/+ instead of -/-. Read 1 corresponds to a <------- direction OB read, while read 2 is the ------> direction CTOB read. The strand assignment is based on the read 1 alignment, therefore the entire alignment would be scored as original bottom (OB) alignment.

          Thanks for your comment, the next release will have this flaw in the documentation corrected.

          Felix

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X