Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • miseiler
    Junior Member
    • May 2012
    • 4

    Segfault in tophat_reports

    With tophat 2.0.3, I get a segfault from tophat reports

    [...clipped output for clarity...]
    [2012-06-02 18:19:12] Mapping left_kept_reads_seg28 to genome segment_juncs with Bowtie2 (28/28)
    [2012-06-02 18:19:13] Joining segment hits
    [2012-06-02 18:21:29] Reporting output tracks
    [FAILED]
    Error running /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir bowtie2_unaligned_100/ --max-multihits 20 --max-seg-multihits 40 --segment-length 100 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --no-closure-search --no-coverage-search --no-microexon-search --sam-header bowtie2_unaligned_100/tmp/hg19_genome.bwt.samheader.sam --samtools=/home/miseiler/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/miseiler/Projects/mssm/data/seq/hg19.fa bowtie2_unaligned_100/junctions.bed bowtie2_unaligned_100/insertions.bed bowtie2_unaligned_100/deletions.bed bowtie2_unaligned_100/fusions.out bowtie2_unaligned_100/tmp/accepted_hits bowtie2_unaligned_100/tmp/left_kept_reads.candidates_and_unspl.bam bowtie2_unaligned_100/tmp/left_kept_reads.bam
    Loading ...done

    Running that command by itself gives the following:
    tophat_reports v2.0.3 (3443S)
    ---------------------------------------
    [samopen] SAM header is present: 25 sequences.
    Loading chr1...done
    [...more chromosomes...]
    Loading chrM...done
    Loading ...done
    zsh: segmentation fault (core dumped) /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports 8 0 50

    Unfortunately, I have tried running earlier tophat_report versions from the tophat2 series and they simply segfault without even opening hg19.fa.

    Anyone seeing this?
  • chubbchubb
    Junior Member
    • Jul 2009
    • 3

    #2
    Yes, I'm getting this also with tophat 2.0.3. I've searched the log files, but no helpful hints to be found.

    Any hints here on how to fix this?

    Comment

    • miseiler
      Junior Member
      • May 2012
      • 4

      #3
      Apparently what's going on here is that tophat doesn't support reads over 1024 bp. I've never seen this in any of the documentation, but did manage to get a reply noting this.

      Bowtie2, however, does, and the way I've gotten "around" the problem is to align long contigs in two stages: a bowtie2 end-to-end stage, and then a spliced tophat stage which is split.

      Here's a script I use to split them. It's python2, and requires BioPython:
      Code:
      from Bio import SeqIO
      from Bio.SeqRecord import SeqRecord
      from StringIO import StringIO
      import sys
      
      if len(sys.argv) != 3:
          print('USAGE: python splitfa.py file.fa MAXLEN > [NEW FILE]')
      
      f = open(sys.argv[1], 'r')
      recs = list(SeqIO.parse(f, 'fasta'))
      f.close()
      
      keep = []
      tosplit = []
      newrecs = []
      
      MAXLEN = int(sys.argv[2])
      
      def split_indices(size, maxlen=MAXLEN):
          sizefactor = size / maxlen + 1 # Split into floor(size/maxlen) + 1 even groups
          contigsize = size / sizefactor # Size of new contigs
          
          sizes = [ (i * contigsize) for i in xrange(sizefactor) ] + [size] # A list of all the indices, including 0 and the max, of parts of our contig we will be splitting
          return [ (sizes[i], sizes[i+1]) for i in xrange(sizefactor) ]
      
      for rec in recs:
          if len(rec) <= MAXLEN:
              keep.append(rec)
          else:
              tosplit.append(rec)
      
      for rec in tosplit:
          num = 1
          for i, j in split_indices(len(rec)):
              #newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, name=rec.name, description=rec.description))
              newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, description=description))
              num += 1
      
      stdout_handle = StringIO()
      SeqIO.write(keep + newrecs, stdout_handle, 'fasta')
      print(stdout_handle.getvalue())
      As the doc suggests, you just do python <thisfile> <long contigs.fa> <new max length> > <new file> and it will split any contig over max length into evenly spaced new ones. For example, at a max length of 800, a contig of 900 would become two contigs of 450. New ones will have _split_# appended to their ID so you can separate them (and perhaps perform some magic to join them later...)

      It's a pretty undesirable, but it gets the job done.

      One thing I suggest is trying multiple contig lengths...I've found that tophat makes in some cases vastly better alignments at some lengths than others, and that it is dataset-dependent. It isn't necessarily the one with the largest amount of bp that it can align to the genome, which for me was 300bp.

      I think that's enough for one post.

      Comment

      • chubbchubb
        Junior Member
        • Jul 2009
        • 3

        #4
        Thanks a lot for your reply!

        Now I'm a bit confused. I've only got regular old RNASeq data - 50bp long, mapping to dros genome. refGene.gff created from flybase file dmel-all-r5.45.gff (I've filtered this file to only contain CDS and exon features)

        There should be no reads longer than 1024bp

        I wonder what am I not getting? Everything works with tophat v1.4.1

        Comment

        • miseiler
          Junior Member
          • May 2012
          • 4

          #5
          It seems you have a different problem, sorry. Maybe it's worth noting that tophat2 doesn't even find alignments for me with the method I posted, and that I am indeed also using tophat 1.4.1.

          If it's not huge, maybe you can try running it again with --keep-tmp and sending the temporary files to the developers as a bug report.

          Comment

          • pinki999
            Member
            • Oct 2010
            • 37

            #6
            Hi,

            I am also getting the same error while using the TopHat version 2.0.3. Has anyone figured out the reason?

            Pinki

            Comment

            • caddymob
              Member
              • Apr 2009
              • 36

              #7
              Yes - I have the same error with a FAILED at tophat_reports. Tried on 2 different genomes (human and rat). I have standard 100bp paired reads. Also tried the unannounced tophat 2.0.4 (http://tophat.cbcb.umd.edu/downloads/) and get the same error.

              Is it just me or is tophat2 a giant waste of time?

              Comment

              • Enraico
                Junior Member
                • Feb 2012
                • 5

                #8
                I have the same fail with Tophat 2.0.3 and 2.0.4. It fails running tophat_reports and gives me: terminate called after throwing an instance of 'terminate called recursively
                Running the last command alone fails with:

                [samopen] no @SQ lines in the header.
                Loading 1...done
                Loading 2...done
                Loading 3...done
                Loading 4...done
                Loading 5...done
                Loading 6...done
                Loading 7...done
                Loading 8...done
                Loading 9...done
                Loading 10...done
                Loading 11...done
                Loading 12...done
                Loading 13...done
                Loading 14...done
                Loading 15...done
                Loading 16...done
                Loading 17...done
                Loading 18...done
                Loading X...done
                Loading MT...done
                Loading ...done
                Loaded 135171 GFF junctions from ./tophat_out/tmp/ssct9.juncs.
                terminate called after throwing an instance of 'std::logic_error'
                terminate called recursively
                terminate called recursively
                Aborted (core dumped)

                My reads are paired-end 90bps long, pig genome. Platform Ubuntu Server.

                Comment

                • epi
                  Member
                  • Jan 2012
                  • 38

                  #9
                  Same problem
                  50 bp PE reads

                  Can anyone please comment if you have any insights or any way to bypass it so far.
                  Last edited by epi; 06-21-2012, 10:27 AM.

                  Comment

                  • shibujohn
                    Junior Member
                    • Oct 2009
                    • 6

                    #10
                    I am also getting the same error in Tophat 2.0.3 & 2.0.4

                    *****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

                    Loading chrUn_gl000245...done
                    Loading chrUn_gl000246...done
                    Loading chrUn_gl000247...done
                    Loading chrUn_gl000248...done
                    Loading chrUn_gl000249...done
                    Loading ...done
                    Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
                    terminate called after throwing an instance of 'std::logic_error'
                    what(): basic_string::_S_construct NULL not valid
                    terminate called recursively
                    terminate called recursively
                    terminate called recursively
                    ********************

                    Any helps?

                    Comment

                    • acervera
                      Junior Member
                      • Feb 2012
                      • 6

                      #11
                      I am having the same problem with tophat 2.0.4

                      [2012-06-24 20:24:31] Reporting output tracks
                      [FAILED]
                      Error running /opt/tophat-2.0.4.Linux_x86_64/tophat_reports ...

                      what(): basic_string::_S_construct NULL not valid

                      Does anyone know how to fix this??

                      Comment

                      • drunk_coder
                        Junior Member
                        • Dec 2011
                        • 9

                        #12
                        I also have the same problem with tophat 2.0.4 ,But I used tophat 2.0.0 to deal with the same library didn't occur that problem before, so I tryed tophat 2.0.0 again, I find tophat 2.0.0 can solve this problem.

                        Comment

                        • shibujohn
                          Junior Member
                          • Oct 2009
                          • 6

                          #13
                          compile it from source

                          These errors will go off when you compile it from source.



                          Originally posted by shibujohn View Post
                          I am also getting the same error in Tophat 2.0.3 & 2.0.4

                          *****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

                          Loading chrUn_gl000245...done
                          Loading chrUn_gl000246...done
                          Loading chrUn_gl000247...done
                          Loading chrUn_gl000248...done
                          Loading chrUn_gl000249...done
                          Loading ...done
                          Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
                          terminate called after throwing an instance of 'std::logic_error'
                          what(): basic_string::_S_construct NULL not valid
                          terminate called recursively
                          terminate called recursively
                          terminate called recursively
                          ********************

                          Any helps?

                          Comment

                          • catbus
                            Member
                            • Feb 2011
                            • 21

                            #14
                            Same problem (&quot;tophat_reports [FAILED]&quot with tophat 2.0.3.

                            * Same problem for me with Tophat 2.0.3.

                            * I am using 2x100bp paired-end sequences, aligning to mm9. This is on Ubuntu 10.04.

                            * I have only encountered this problem for a small SUBSET of the total number of aligned sequences.

                            * I'm going to try Tophat 2.0.4.

                            * I can't think of a good reason to expect that building from source would actually fix this, but I will probably do it anyway "just in case."

                            Error message:
                            [2012-06-29 04:09:46] Joining segment hits
                            [2012-06-29 05:25:22] Reporting output tracks
                            [FAILED]
                            Error running tophat_reports

                            The log has no more useful information than that, which is a bit disappointing.

                            Comment

                            • catbus
                              Member
                              • Feb 2011
                              • 21

                              #15
                              Sorry, just thought I'd post that the Tophat 2.0.4 changelog indicates that the
                              * "reporting output tracks . . . [FAILED]" issue

                              is SOLVED by an update to Tophat 2.0.4.

                              I'm re-running my sequences now. This was a known issue, at least!

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...