Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting much higher coverage with bowtie2 than tophat2

    Hello,


    I ran an analysis on paired end reads through tophat2 using:
    tophat2 -p12 -o <tophat_dir> --no-coverage-search <reference genome> R1.fq R2.fq
    and the results gave 1.2% coverage.

    I ran the same data through bowtie2
    bowtie2 -x <index> -1 <R1.fq> -2 <R2.fq> -S <output.sam> and got a 42.75% overall alignment rate.

    Why such a big discrepancy? I tried --coverage-search as well and got the same results.

    I checked the tophat run.log and it's putting this into bowtie:
    bowtie2 -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 12 --sam-no-hd -x

    Any idea what's going on?

    P.S. I've been getting a silent error in my tophat.log
    bam2fastx: /usr/lib64/libz.so.1: no version information available
    and
    fix_map_ordering: /usr/lib64/libz.so.1: no version information available
    I do have libz.so.1.2.3

    The process still runs fine and the bam file output can be used for differential analysis... I just have terrible coverage. Any ideas?

  • #2
    libz.so.1 is a warning so your observation is unrelated (see #7 in http://seqanswers.com/forums/showthread.php?t=39873).

    Comment


    • #3
      I'm sorry but to which #7 are you referring to? I didn't see any relevance in that thread?

      I'm just curious why tophat is missing 40% of the alignment that bowtie2 is finding.

      Comment


      • #4
        Since you had noted the silent errors in your post I was only pointing out that they are warnings and are not related to the difference you are seeing.

        Have you tried to run bowtie2 with the same parameters as TopHat? Since bowtie2 in tophat is being run with different parameters it is not surprising that the result there is different.

        You can try BBMap as an alternative to tophat.

        Comment


        • #5
          The interesting thing is that I took a look at bowtie 2's defaults and tophat was pretty much on point with running them.

          Comment


          • #6
            But the command options you included for bowtie2 example that you ran directly are not the same as what TopHat used.

            The parameters used in TopHat are the defaults for bowtie2 is what you were saying. My apologies.
            Last edited by GenoMax; 10-24-2014, 05:58 PM.

            Comment


            • #7
              Have you done QC with these reads? Have they been trimmed in parallel (R1 and R2)?

              Comment


              • #8
                Yeah I did QC before and after checking for adapter contamination and trimming using scythe and sickle, respectively.

                This is really puzzling me. There's no reason for tophat to give me different results than bowtie2 would...

                Comment


                • #9
                  Tophat can be quite sensitive to a few parameters especially insert size in my experience. That wouldn't account for the discrepancy here however.

                  I would try another aligner - and can highly recommend STAR for speed and accuracy.

                  Comment


                  • #10
                    There's no reason Tophat should be failing like this though. Any idea what parameters I can try to change in Tophat to fix the issue?

                    Comment


                    • #11
                      To cut down on the possible complexity of what is going wrong, try aligning only one of the two mate files with Tophat (i.e. as a single-end alignment) and see if Tophat manages to align more data.

                      Also..how long are your reads and what are you aligning to?
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • #12
                        My reads average around 1,000,000 base pairs. I aligned them to the Mmul_1 Rhesus build from Ensembl. I also tried the resMac3 build to compare. I tried with and without a transcriptome index and with and without a reference GTF file. Nothing made a difference. This is blowing my mind.

                        I can align the paired ends with bowtie2 and I get ~40-60% per sample, but with tophat I get between 0.5% - 4 % per sample.

                        I was told by another lab working on this that they were able to get the alignment I got with bowtie2 using gsnap. It doesn't make any sense to me why Tophat is the only tool doing this. That means it's unreliable for other alignments in my mind and that bothers me a lot.

                        Comment


                        • #13
                          You are not using original reads? You have reads/contigs that average a megabase each?

                          TopHat is designed for reads that are a kb or shorter.

                          Comment


                          • #14
                            Wait I'm sorry, I meant my total reads for each strand are at 1 MB. Each read per file is 251.

                            Comment


                            • #15
                              Yeah, did I read that right? 1,000,000 base paired-end reads? No wonder bowtie2 returns a different result. Are you able to run the alignments with the original RNA-seq reads whatever they were (i.e. PE 100 or whatever)? Then you'll see Tophat actually function.
                              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                              Salk Institute for Biological Studies, La Jolla, CA, USA */

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X