Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST + GATK -> strand bias?

    Hi All,

    I have HiSeq exome data, 75 b paired end. I've used bfast to align, which, if I am not mistaking, converts one end to its complement so that both members of a pair are attributed to the same strand, the pos one, in the resulting bam file.

    If this is true it leads to a problem in GATK (when filtering calls), simply that one can not use strand bias tests (they are all highly sign, all reds are on the same strand). Also, this would also create problem in the read position rank tests, as the 'best' end of every other read is annotated as its 'good' end.

    So the question is: Am I missing something, or are the above conclusions correct?
    Also, is there a way to circumvent this?

    Thanks a bunch,
    Boel

  • #2
    With the newest version of BFAST, you can have the input reads on be the opposite strand to properly represent paired end reads. BFAST will not reverse compliment one end, but the conversion to FASTQ may do so (a new "-k" option avoids this).

    What is your input data, FASTQ files?

    You may get more traction at [email protected].

    Comment


    • #3
      Originally posted by nilshomer View Post
      With the newest version of BFAST, you can have the input reads on be the opposite strand to properly represent paired end reads. BFAST will not reverse compliment one end, but the conversion to FASTQ may do so (a new "-k" option avoids this).

      What is your input data, FASTQ files?

      You may get more traction at [email protected].
      Thanks.

      I started with qseq files, converted them with the ill2fastq.pl and then aligned with bfast (bfast+bwa-0.6.5). Did not use the -k option.

      Not sure that this does create a bias, thats what I am trying to figure out. Do you know?
      According to sam format all reads are on the same strand as the reference, which might make this a non-issue. But I must say that I am confused right now.

      Comment


      • #4
        The ill2fastq.pl script will reverse compliment one of the ends, so they are mapped onto the same strand. You can try with the newest release and the "-k" option, as well as supplying the proper pairing information (there is a new "postprocess" pairing option).

        Comment


        • #5
          Originally posted by nilshomer View Post
          The ill2fastq.pl script will reverse compliment one of the ends, so they are mapped onto the same strand. You can try with the newest release and the "-k" option, as well as supplying the proper pairing information (there is a new "postprocess" pairing option).
          I am correct in assuming that using bfast in the way I have does create a bias?

          Further, when looking at my reads in IGV most (70%) have a insert size that is positive, while my correct mean insert size should be around -12 (overlapping). Does this suggest that something has gone wrong in the mapping?

          Comment


          • #6
            Originally posted by Boel View Post
            I am correct in assuming that using bfast in the way I have does create a bias?

            Further, when looking at my reads in IGV most (70%) have a insert size that is positive, while my correct mean insert size should be around -12 (overlapping). Does this suggest that something has gone wrong in the mapping?
            Not sure as I don't have enough information. I have tried mapping with pairs that should have a mean insert of zero without problems. Please try the newest version of BFAST and post your results as well as enough information for us to debug.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X