Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark paired-end read orientation

    Hi
    I have directional paired-end library for BS-seq and use bismark to analyze the data. The program runs fine. However, I notice that the read orientation is not correct. The paired reads should have opposite orientation, but bismark output shows same orientation for a pair, either on the + strand or on the - strand. I think it won't affect methylation estimation, but the visualization is not correct. Does anyone have idea how to fix it?

    Here is two pairs of reads:
    ID1 2/1 67 chr10 127850053 ...
    ID1 2/2 131 chr10 127850058 ...
    ID2 2/1 115 chr1 177168854 ...
    ID2 2/2 179 chr1 177168819 ...

    Thanks

  • #2
    This is indeed somewhat of a problem because the FLAG value does not consider that there are four different bisulfite strands of DNA (which is the case for paired-end files). We are therefore trying to use FLAG tags which take the strand identity into account (this is taken directly from Bismark):

    Strands OT and CTOT will be treated as aligning to the top strand (both sequences are scored as aligning to the top strand, and strands OB and CTOB will be treated as aligning to the bottom strand (both sequences are scored as reverse complemented sequences)

    Code:
     my $flag_1;                                                          # FLAG variable used for SAM format
      my $flag_2;
    
      if ($index == 0){       # OT
        $flag_1 = 67;                                                      # Read 1 is on the + strand  (1+2+64) (Read 2 is technically reverse-complemented, but we do not score it)
        $flag_2 = 131;                                                     # Read 2 is on - strand but informative for the OT        (1+2+128)
      }
      elsif ($index == 1){    # CTOB
        $flag_1 = 115;                                                     # Read 1 is on the + strand, we score for OB  (1+2+16+32+64)
        $flag_2 = 179;                                                     # Read 2 is on the - strand  (1+2+16+32+128)
      }
      elsif ($index == 2){    # CTOT
        $flag_1 = 67;                                                      # Read 1 is on the - strand (CTOT) strand, but we score it for OT (1+2+64)
        $flag_2 = 131;                                                     # Read 2 is on the + strand, score it for OT (1+2+128)
      }
      elsif ($index == 3){    # OB
        $flag_1 = 115;                                                     # Read 1 is on the - strand, we score for OB  (1+2+16+32+64)
        $flag_2 = 179;                                                     # Read 2 is on the - strand  (1+2+16+32+128)
      }
    If you need to change this somehow then you should probably just find the paired-end SAM output subroutine towards the end of Bismark and change the FLAG tags to what you need them to be. If you want to discuss this further you can always drop me an email.

    Best,
    Felix

    Comment


    • #3
      Thanks for the prompt response. I didn't locate the code you showed before I posted the question. It is helpful to know. For now, it is OK with me as long as I know that is the reason I get such orientation arrangement. I will definitely contact you if I need help. Thanks.

      Comment


      • #4
        I'm also working my way through the details of paired-end mode. I'll just add another point I found useful in understanding the paired-end Bismark mode.

        I was confused how situations OT and CTOB in Felix's code could be distinguished given that in both cases read 1 aligns to the + strand and read 2 aligns to the - strand. But if you look elsewhere in the code you'll see the $index variable is defined by which version of "in silico bisulfite-conversion" the reads and genome have undergone - either CT or GA conversion. See the following:
        Code:
          if ($read_conversion_1 eq 'CT' and $genome_conversion eq 'CT'){
            $index = 0; ## this is OT   (original top strand)
          }	
          elsif ($read_conversion_1 eq 'GA' and $genome_conversion eq 'GA'){
            $index = 1; ## this is CTOB (complementary to OB)
          }
          elsif ($read_conversion_1 eq 'GA' and $genome_conversion eq 'CT'){
            $index = 2; ## this is CTOT (complementary to OT)
          }
          elsif ($read_conversion_1 eq 'CT' and $genome_conversion eq 'GA'){
            $index = 3; ## this is OB   (original bottom)
          }
        Felix - I'm confused about the -/- orientation for when $index==3. I understand the Bismark FLAG values but I would've thought that your comment in the code should be:

        Code:
          elsif ($index == 3){    # OB
            $flag_1 = 115;                                                     # Read 1 is on the - strand, we score for OB  (1+2+16+32+64)
            $flag_2 = 179;                                                     # Read 2 is on the [B]+[/B] strand  (1+2+16+32+128)
          }
        i.e. that while the paired-read is informative for the OB strand, the individual reads are in the -/+ orientation (as they in the CTOT case).

        My understand of "normal" Illumina paired-end reads are always oriented ------> <------- and thus they were always scored as having opposing orientations. There's a better picture of what I mean here http://www.broadinstitute.org/softwa...r_orientations

        I think I understand the other 3 cases but I'd like to make sure my thinking isn't wrong on the fourth.
        Thanks,
        Pete
        Last edited by PeteH; 03-15-2012, 10:18 PM.

        Comment


        • #5
          Hi Pete,

          you are right in that the comment for alignments to the OB strand (index = 3) should be -/+ instead of -/-. Read 1 corresponds to a <------- direction OB read, while read 2 is the ------> direction CTOB read. The strand assignment is based on the read 1 alignment, therefore the entire alignment would be scored as original bottom (OB) alignment.

          Thanks for your comment, the next release will have this flaw in the documentation corrected.

          Felix

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X