Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Segfault in tophat_reports

    With tophat 2.0.3, I get a segfault from tophat reports

    [...clipped output for clarity...]
    [2012-06-02 18:19:12] Mapping left_kept_reads_seg28 to genome segment_juncs with Bowtie2 (28/28)
    [2012-06-02 18:19:13] Joining segment hits
    [2012-06-02 18:21:29] Reporting output tracks
    [FAILED]
    Error running /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir bowtie2_unaligned_100/ --max-multihits 20 --max-seg-multihits 40 --segment-length 100 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --no-closure-search --no-coverage-search --no-microexon-search --sam-header bowtie2_unaligned_100/tmp/hg19_genome.bwt.samheader.sam --samtools=/home/miseiler/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/miseiler/Projects/mssm/data/seq/hg19.fa bowtie2_unaligned_100/junctions.bed bowtie2_unaligned_100/insertions.bed bowtie2_unaligned_100/deletions.bed bowtie2_unaligned_100/fusions.out bowtie2_unaligned_100/tmp/accepted_hits bowtie2_unaligned_100/tmp/left_kept_reads.candidates_and_unspl.bam bowtie2_unaligned_100/tmp/left_kept_reads.bam
    Loading ...done

    Running that command by itself gives the following:
    tophat_reports v2.0.3 (3443S)
    ---------------------------------------
    [samopen] SAM header is present: 25 sequences.
    Loading chr1...done
    [...more chromosomes...]
    Loading chrM...done
    Loading ...done
    zsh: segmentation fault (core dumped) /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports 8 0 50

    Unfortunately, I have tried running earlier tophat_report versions from the tophat2 series and they simply segfault without even opening hg19.fa.

    Anyone seeing this?

  • #2
    Yes, I'm getting this also with tophat 2.0.3. I've searched the log files, but no helpful hints to be found.

    Any hints here on how to fix this?

    Comment


    • #3
      Apparently what's going on here is that tophat doesn't support reads over 1024 bp. I've never seen this in any of the documentation, but did manage to get a reply noting this.

      Bowtie2, however, does, and the way I've gotten "around" the problem is to align long contigs in two stages: a bowtie2 end-to-end stage, and then a spliced tophat stage which is split.

      Here's a script I use to split them. It's python2, and requires BioPython:
      Code:
      from Bio import SeqIO
      from Bio.SeqRecord import SeqRecord
      from StringIO import StringIO
      import sys
      
      if len(sys.argv) != 3:
          print('USAGE: python splitfa.py file.fa MAXLEN > [NEW FILE]')
      
      f = open(sys.argv[1], 'r')
      recs = list(SeqIO.parse(f, 'fasta'))
      f.close()
      
      keep = []
      tosplit = []
      newrecs = []
      
      MAXLEN = int(sys.argv[2])
      
      def split_indices(size, maxlen=MAXLEN):
          sizefactor = size / maxlen + 1 # Split into floor(size/maxlen) + 1 even groups
          contigsize = size / sizefactor # Size of new contigs
          
          sizes = [ (i * contigsize) for i in xrange(sizefactor) ] + [size] # A list of all the indices, including 0 and the max, of parts of our contig we will be splitting
          return [ (sizes[i], sizes[i+1]) for i in xrange(sizefactor) ]
      
      for rec in recs:
          if len(rec) <= MAXLEN:
              keep.append(rec)
          else:
              tosplit.append(rec)
      
      for rec in tosplit:
          num = 1
          for i, j in split_indices(len(rec)):
              #newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, name=rec.name, description=rec.description))
              newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, description=description))
              num += 1
      
      stdout_handle = StringIO()
      SeqIO.write(keep + newrecs, stdout_handle, 'fasta')
      print(stdout_handle.getvalue())
      As the doc suggests, you just do python <thisfile> <long contigs.fa> <new max length> > <new file> and it will split any contig over max length into evenly spaced new ones. For example, at a max length of 800, a contig of 900 would become two contigs of 450. New ones will have _split_# appended to their ID so you can separate them (and perhaps perform some magic to join them later...)

      It's a pretty undesirable, but it gets the job done.

      One thing I suggest is trying multiple contig lengths...I've found that tophat makes in some cases vastly better alignments at some lengths than others, and that it is dataset-dependent. It isn't necessarily the one with the largest amount of bp that it can align to the genome, which for me was 300bp.

      I think that's enough for one post.

      Comment


      • #4
        Thanks a lot for your reply!

        Now I'm a bit confused. I've only got regular old RNASeq data - 50bp long, mapping to dros genome. refGene.gff created from flybase file dmel-all-r5.45.gff (I've filtered this file to only contain CDS and exon features)

        There should be no reads longer than 1024bp

        I wonder what am I not getting? Everything works with tophat v1.4.1

        Comment


        • #5
          It seems you have a different problem, sorry. Maybe it's worth noting that tophat2 doesn't even find alignments for me with the method I posted, and that I am indeed also using tophat 1.4.1.

          If it's not huge, maybe you can try running it again with --keep-tmp and sending the temporary files to the developers as a bug report.

          Comment


          • #6
            Hi,

            I am also getting the same error while using the TopHat version 2.0.3. Has anyone figured out the reason?

            Pinki

            Comment


            • #7
              Yes - I have the same error with a FAILED at tophat_reports. Tried on 2 different genomes (human and rat). I have standard 100bp paired reads. Also tried the unannounced tophat 2.0.4 (http://tophat.cbcb.umd.edu/downloads/) and get the same error.

              Is it just me or is tophat2 a giant waste of time?

              Comment


              • #8
                I have the same fail with Tophat 2.0.3 and 2.0.4. It fails running tophat_reports and gives me: terminate called after throwing an instance of 'terminate called recursively
                Running the last command alone fails with:

                [samopen] no @SQ lines in the header.
                Loading 1...done
                Loading 2...done
                Loading 3...done
                Loading 4...done
                Loading 5...done
                Loading 6...done
                Loading 7...done
                Loading 8...done
                Loading 9...done
                Loading 10...done
                Loading 11...done
                Loading 12...done
                Loading 13...done
                Loading 14...done
                Loading 15...done
                Loading 16...done
                Loading 17...done
                Loading 18...done
                Loading X...done
                Loading MT...done
                Loading ...done
                Loaded 135171 GFF junctions from ./tophat_out/tmp/ssct9.juncs.
                terminate called after throwing an instance of 'std::logic_error'
                terminate called recursively
                terminate called recursively
                Aborted (core dumped)

                My reads are paired-end 90bps long, pig genome. Platform Ubuntu Server.

                Comment


                • #9
                  Same problem
                  50 bp PE reads

                  Can anyone please comment if you have any insights or any way to bypass it so far.
                  Last edited by epi; 06-21-2012, 10:27 AM.

                  Comment


                  • #10
                    I am also getting the same error in Tophat 2.0.3 & 2.0.4

                    *****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

                    Loading chrUn_gl000245...done
                    Loading chrUn_gl000246...done
                    Loading chrUn_gl000247...done
                    Loading chrUn_gl000248...done
                    Loading chrUn_gl000249...done
                    Loading ...done
                    Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
                    terminate called after throwing an instance of 'std::logic_error'
                    what(): basic_string::_S_construct NULL not valid
                    terminate called recursively
                    terminate called recursively
                    terminate called recursively
                    ********************

                    Any helps?

                    Comment


                    • #11
                      I am having the same problem with tophat 2.0.4

                      [2012-06-24 20:24:31] Reporting output tracks
                      [FAILED]
                      Error running /opt/tophat-2.0.4.Linux_x86_64/tophat_reports ...

                      what(): basic_string::_S_construct NULL not valid

                      Does anyone know how to fix this??

                      Comment


                      • #12
                        I also have the same problem with tophat 2.0.4 ,But I used tophat 2.0.0 to deal with the same library didn't occur that problem before, so I tryed tophat 2.0.0 again, I find tophat 2.0.0 can solve this problem.

                        Comment


                        • #13
                          compile it from source

                          These errors will go off when you compile it from source.



                          Originally posted by shibujohn View Post
                          I am also getting the same error in Tophat 2.0.3 & 2.0.4

                          *****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

                          Loading chrUn_gl000245...done
                          Loading chrUn_gl000246...done
                          Loading chrUn_gl000247...done
                          Loading chrUn_gl000248...done
                          Loading chrUn_gl000249...done
                          Loading ...done
                          Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
                          terminate called after throwing an instance of 'std::logic_error'
                          what(): basic_string::_S_construct NULL not valid
                          terminate called recursively
                          terminate called recursively
                          terminate called recursively
                          ********************

                          Any helps?

                          Comment


                          • #14
                            Same problem (&quot;tophat_reports [FAILED]&quot with tophat 2.0.3.

                            * Same problem for me with Tophat 2.0.3.

                            * I am using 2x100bp paired-end sequences, aligning to mm9. This is on Ubuntu 10.04.

                            * I have only encountered this problem for a small SUBSET of the total number of aligned sequences.

                            * I'm going to try Tophat 2.0.4.

                            * I can't think of a good reason to expect that building from source would actually fix this, but I will probably do it anyway "just in case."

                            Error message:
                            [2012-06-29 04:09:46] Joining segment hits
                            [2012-06-29 05:25:22] Reporting output tracks
                            [FAILED]
                            Error running tophat_reports

                            The log has no more useful information than that, which is a bit disappointing.

                            Comment


                            • #15
                              Sorry, just thought I'd post that the Tophat 2.0.4 changelog indicates that the
                              * "reporting output tracks . . . [FAILED]" issue

                              is SOLVED by an update to Tophat 2.0.4.

                              I'm re-running my sequences now. This was a known issue, at least!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X