Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat 1.0.11 crashing

    Has anybody had an trouble with Tophat 1.0.11 crashing while analyzing data that would complete normally under Tophat 1.0.10? I've got 8 FASTQ files of data where this happens.

    This is from a 2.5GB FASTQ file with solexa 1.3 qualities. The run looks like this:

    $ tophat -o ./lane1-crash --GFF /media/HD_2/tophat_project/mouse.gff --solexa1.3-quals -p4 m_musculus s_1_sequence.txt

    [Fri Oct 16 10:11:24 2009] Beginning TopHat run (v1.0.11)
    -----------------------------------------------
    [Fri Oct 16 10:11:24 2009] Preparing output location ./lane1-crash/
    [Fri Oct 16 10:11:24 2009] Checking for Bowtie index files
    [Fri Oct 16 10:11:24 2009] Checking for reference FASTA file
    [Fri Oct 16 10:11:24 2009] Checking for Bowtie
    Bowtie version: 0.11.2.0
    [Fri Oct 16 10:11:24 2009] Checking reads
    seed length: 43bp
    format: fastq
    quality scale: --solexa1.3-quals
    [Fri Oct 16 10:15:19 2009] Reading known junctions from GFF file
    [Fri Oct 16 10:16:57 2009] Mapping reads against m_musculus with Bowtie
    [Fri Oct 16 10:40:34 2009] Joining segment hits
    [Fri Oct 16 10:45:03 2009] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err = -11

    If I revert back to 1.0.10 this same run completes normally and I have good looking output.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

  • #2
    Can you send me the logs?

    Comment


    • #3
      sure, where should I send them?
      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
      Salk Institute for Biological Studies, La Jolla, CA, USA */

      Comment


      • #4
        You can just email them to me at [email protected]

        Comment


        • #5
          Done. Thanks for taking a look.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Ah...as it turns out this was due to my own error. Looks like I was using a bad bowtie index. Thanks Cole for pointing out the issue. I'm going to load up the index that I have that works and re-run this stuff. That should do it.
            /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
            Salk Institute for Biological Studies, La Jolla, CA, USA */

            Comment


            • #7
              Dear Cole,

              I tried to run TopHat on 454 data where fragments can have different lengths (from ~20 to ~1000) and I did not get any result..

              Do all reads should be of about the same length? And if this is the case, can you change it so that it would be possible to use TopHat on 454 data?

              Thank you very much in advance,

              Valentina

              Comment


              • #8
                TopHat currently requires that reads be the same length, and is NOT designed for 454 reads. You may find after trimming that it works well, but I will not be making specific changes to TopHat to support 454 any time soon. Most of my development time is now spent on Cufflinks, which has a long list of planned features that I want to get to before I graduate.

                Comment


                • #9
                  I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

                  prep_reads v1.0.11
                  ---------------------------
                  15796133 out of 15917477 reads have been filtered out

                  Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

                  The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment


                  • #10
                    Thank you for answer, Cole!

                    Comment


                    • #11
                      Originally posted by sdriscoll View Post
                      I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

                      prep_reads v1.0.11
                      ---------------------------
                      15796133 out of 15917477 reads have been filtered out

                      Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

                      The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?
                      TopHat filters out two types of reads here: those with lots of N's and those which are nearly all the same character. Since TopHat sometimes chooses algorithms that require indexing the unmappable reads, keeping around all the polyA reads, for example, will just bloat the unmapped read index and generate false positive splices between real exons and downstream low complexity repeats.

                      How many N's do these reads have on average? Is there some systematic problem with that lane that you could trim away?

                      Comment


                      • #12
                        I'll take a look. Getting this set of reads to work through Tophat and Cufflinks is for sure something we want to get working.
                        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                        Salk Institute for Biological Studies, La Jolla, CA, USA */

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          Yesterday, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        55 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        45 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        55 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X