Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat2 high discordant alignments

    Hi,

    I am mapping paired end RNAseq data using tophat2, but the alignment summary generated is showing I am getting a very high discordant alignment rate. The only tophat options I am specifying is -p 16 and -o "DIR". Below is the output from tophat2:


    PHP Code:
    Left reads:
              
    Input     :  88556961
               Mapped   
    :  76938162 (86.9of input)
                
    of these:  20429665 (26.6%) have multiple alignments (622137 have >20)
    Right reads:
              
    Input     :  88556961
               Mapped   
    :  75252663 (85.0of input)
                
    of these:  20114304 (26.7%) have multiple alignments (621700 have >20)
    Unpaired reads:
              
    Input     :     68008
               Mapped   
    :     56927 (83.7of input)
                
    of these:      8045 (14.1%) have multiple alignments (9 have >20)
    85.9overall read mapping rate.

    Aligned pairs:  65389463
         of these
    :  18622479 (28.5%) have multiple alignments
                    61775607 
    (94.5%) are discordant alignments
     4.1
    concordant pair alignment rate
    The flagstat output I get is also below:

    PHP Code:
    341377625 0 in total (QC-passed reads QC-failed reads)
    189129873 0 secondary
    0 supplimentary
    0 duplicates
    341377625 
    0 mapped (100.00%:-nan%)
    152190825 0 paired in sequencing
    76938162 
    0 read1
    75252663 
    0 read2
    263998 
    0 properly paired (0.17%:-nan%)
    130778926 0 with itself and mate mapped
    21411899 
    0 singletons (14.07%:-nan%)
    116104620 0 with mate mapped to a different chr
    80799382 
    0 with mate mapped to a different chr (mapQ>=5

    I am using cutadapt to remove adapters and remove low quality reads and that is running fine. But then when I pass the paired files onto tophat, the results don't seem good. From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?

    I have tried aligning the fastq files with tophat2 without passing the files through cutadapt first and the alignment is fine and there is a very low discordant alignment rate, so I'm guessing the fastq files are good, but something is happening after the cutadapt step.
    Just as a note, I am not using Galaxy for the analysis.

    Thanks

  • #2
    Originally posted by ea11 View Post
    From what I have read, it is to do with the mate pairs no longer being in sync in the two fastq files. Is there a way around this and to get the number of discordant alignments down?
    Thanks
    Use a paired-end aware trimmer like trimmomatic/BBDuk (from BBMap) which keep the paired end files in sync post trimming.

    That said, if you are happy with the cutadapt results and just want to fix the PE read order you can do so by using repair.sh from BBMap (paired end reads in two files example): http://seqanswers.com/forums/showpos...0&postcount=45

    Comment


    • #3
      Thanks for the reply. I though cutadapt did that with the -p option to specify paired end data.
      I shall give BBDuk a try and see the results. I was not happy with the results of trimmomatic on my data, so staying away from that trimmer for now.

      Thanks

      Comment


      • #4
        Just checking. You are not switching the R1/R2 files when you use them as input for tophat by mistake? That will produce discordant results for obvious reasons.

        Comment


        • #5
          Nope I am not. R1 files are before the R2 files in the script.

          Comment


          • #6
            BBMap will do spliced alignments so after you use BBDuk you may want to give BBMap a try on the side while you do your TopHat2 runs.

            Comment


            • #7
              Thanks, I shall have a read and see what the results look like with BBDuk/BBMap while my tophat jobs are running

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X