SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools fasta line length segfault, redux jdrum00 Bioinformatics 6 06-02-2015 10:58 PM
Diagnosing samtools segfault Bukowski Bioinformatics 3 11-04-2012 11:59 PM
maq map termiantes with segfault ben8seq Bioinformatics 0 05-17-2011 03:57 AM
bwa bug (?) leading to samtools segfault fpepin Bioinformatics 4 05-03-2011 10:49 AM
cuffdiff segfault Ichinichi Bioinformatics 0 03-05-2010 08:58 AM

Reply
 
Thread Tools
Old 06-02-2012, 03:00 PM   #1
miseiler
Junior Member
 
Location: New Brunswick, NJ

Join Date: May 2012
Posts: 4
Default Segfault in tophat_reports

With tophat 2.0.3, I get a segfault from tophat reports

[...clipped output for clarity...]
[2012-06-02 18:19:12] Mapping left_kept_reads_seg28 to genome segment_juncs with Bowtie2 (28/28)
[2012-06-02 18:19:13] Joining segment hits
[2012-06-02 18:21:29] Reporting output tracks
[FAILED]
Error running /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir bowtie2_unaligned_100/ --max-multihits 20 --max-seg-multihits 40 --segment-length 100 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --max-mismatches 2 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p8 --no-closure-search --no-coverage-search --no-microexon-search --sam-header bowtie2_unaligned_100/tmp/hg19_genome.bwt.samheader.sam --samtools=/home/miseiler/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/miseiler/Projects/mssm/data/seq/hg19.fa bowtie2_unaligned_100/junctions.bed bowtie2_unaligned_100/insertions.bed bowtie2_unaligned_100/deletions.bed bowtie2_unaligned_100/fusions.out bowtie2_unaligned_100/tmp/accepted_hits bowtie2_unaligned_100/tmp/left_kept_reads.candidates_and_unspl.bam bowtie2_unaligned_100/tmp/left_kept_reads.bam
Loading ...done

Running that command by itself gives the following:
tophat_reports v2.0.3 (3443S)
---------------------------------------
[samopen] SAM header is present: 25 sequences.
Loading chr1...done
[...more chromosomes...]
Loading chrM...done
Loading ...done
zsh: segmentation fault (core dumped) /u2/home/miseiler/Desktop/tophat-2.0.3.Linux_x86_64/tophat_reports 8 0 50

Unfortunately, I have tried running earlier tophat_report versions from the tophat2 series and they simply segfault without even opening hg19.fa.

Anyone seeing this?
miseiler is offline   Reply With Quote
Old 06-05-2012, 12:33 AM   #2
chubbchubb
Junior Member
 
Location: Germany

Join Date: Jul 2009
Posts: 3
Default

Yes, I'm getting this also with tophat 2.0.3. I've searched the log files, but no helpful hints to be found.

Any hints here on how to fix this?
chubbchubb is offline   Reply With Quote
Old 06-05-2012, 05:59 AM   #3
miseiler
Junior Member
 
Location: New Brunswick, NJ

Join Date: May 2012
Posts: 4
Default

Apparently what's going on here is that tophat doesn't support reads over 1024 bp. I've never seen this in any of the documentation, but did manage to get a reply noting this.

Bowtie2, however, does, and the way I've gotten "around" the problem is to align long contigs in two stages: a bowtie2 end-to-end stage, and then a spliced tophat stage which is split.

Here's a script I use to split them. It's python2, and requires BioPython:
Code:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from StringIO import StringIO
import sys

if len(sys.argv) != 3:
    print('USAGE: python splitfa.py file.fa MAXLEN > [NEW FILE]')

f = open(sys.argv[1], 'r')
recs = list(SeqIO.parse(f, 'fasta'))
f.close()

keep = []
tosplit = []
newrecs = []

MAXLEN = int(sys.argv[2])

def split_indices(size, maxlen=MAXLEN):
    sizefactor = size / maxlen + 1 # Split into floor(size/maxlen) + 1 even groups
    contigsize = size / sizefactor # Size of new contigs
    
    sizes = [ (i * contigsize) for i in xrange(sizefactor) ] + [size] # A list of all the indices, including 0 and the max, of parts of our contig we will be splitting
    return [ (sizes[i], sizes[i+1]) for i in xrange(sizefactor) ]

for rec in recs:
    if len(rec) <= MAXLEN:
        keep.append(rec)
    else:
        tosplit.append(rec)

for rec in tosplit:
    num = 1
    for i, j in split_indices(len(rec)):
        #newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, name=rec.name, description=rec.description))
        newrecs.append(SeqRecord(rec.seq[i:j], id=rec.id + '_split_%s' % num, description=description))
        num += 1

stdout_handle = StringIO()
SeqIO.write(keep + newrecs, stdout_handle, 'fasta')
print(stdout_handle.getvalue())
As the doc suggests, you just do python <thisfile> <long contigs.fa> <new max length> > <new file> and it will split any contig over max length into evenly spaced new ones. For example, at a max length of 800, a contig of 900 would become two contigs of 450. New ones will have _split_# appended to their ID so you can separate them (and perhaps perform some magic to join them later...)

It's a pretty undesirable, but it gets the job done.

One thing I suggest is trying multiple contig lengths...I've found that tophat makes in some cases vastly better alignments at some lengths than others, and that it is dataset-dependent. It isn't necessarily the one with the largest amount of bp that it can align to the genome, which for me was 300bp.

I think that's enough for one post.
miseiler is offline   Reply With Quote
Old 06-05-2012, 07:08 AM   #4
chubbchubb
Junior Member
 
Location: Germany

Join Date: Jul 2009
Posts: 3
Default

Thanks a lot for your reply!

Now I'm a bit confused. I've only got regular old RNASeq data - 50bp long, mapping to dros genome. refGene.gff created from flybase file dmel-all-r5.45.gff (I've filtered this file to only contain CDS and exon features)

There should be no reads longer than 1024bp

I wonder what am I not getting? Everything works with tophat v1.4.1
chubbchubb is offline   Reply With Quote
Old 06-05-2012, 07:27 AM   #5
miseiler
Junior Member
 
Location: New Brunswick, NJ

Join Date: May 2012
Posts: 4
Default

It seems you have a different problem, sorry. Maybe it's worth noting that tophat2 doesn't even find alignments for me with the method I posted, and that I am indeed also using tophat 1.4.1.

If it's not huge, maybe you can try running it again with --keep-tmp and sending the temporary files to the developers as a bug report.
miseiler is offline   Reply With Quote
Old 06-12-2012, 04:27 AM   #6
pinki999
Member
 
Location: Germany

Join Date: Oct 2010
Posts: 37
Default

Hi,

I am also getting the same error while using the TopHat version 2.0.3. Has anyone figured out the reason?

Pinki
pinki999 is offline   Reply With Quote
Old 06-13-2012, 09:17 AM   #7
caddymob
Member
 
Location: USA

Join Date: Apr 2009
Posts: 36
Default

Yes - I have the same error with a FAILED at tophat_reports. Tried on 2 different genomes (human and rat). I have standard 100bp paired reads. Also tried the unannounced tophat 2.0.4 (http://tophat.cbcb.umd.edu/downloads/) and get the same error.

Is it just me or is tophat2 a giant waste of time?
caddymob is offline   Reply With Quote
Old 06-19-2012, 06:27 AM   #8
Enraico
Junior Member
 
Location: Venice

Join Date: Feb 2012
Posts: 5
Default

I have the same fail with Tophat 2.0.3 and 2.0.4. It fails running tophat_reports and gives me: terminate called after throwing an instance of 'terminate called recursively
Running the last command alone fails with:

[samopen] no @SQ lines in the header.
Loading 1...done
Loading 2...done
Loading 3...done
Loading 4...done
Loading 5...done
Loading 6...done
Loading 7...done
Loading 8...done
Loading 9...done
Loading 10...done
Loading 11...done
Loading 12...done
Loading 13...done
Loading 14...done
Loading 15...done
Loading 16...done
Loading 17...done
Loading 18...done
Loading X...done
Loading MT...done
Loading ...done
Loaded 135171 GFF junctions from ./tophat_out/tmp/ssct9.juncs.
terminate called after throwing an instance of 'std::logic_error'
terminate called recursively
terminate called recursively
Aborted (core dumped)

My reads are paired-end 90bps long, pig genome. Platform Ubuntu Server.
Enraico is offline   Reply With Quote
Old 06-21-2012, 06:55 AM   #9
epi
Member
 
Location: USA

Join Date: Jan 2012
Posts: 38
Default

Same problem
50 bp PE reads

Can anyone please comment if you have any insights or any way to bypass it so far.

Last edited by epi; 06-21-2012 at 10:27 AM.
epi is offline   Reply With Quote
Old 06-22-2012, 02:26 AM   #10
shibujohn
Junior Member
 
Location: London

Join Date: Oct 2009
Posts: 6
Default

I am also getting the same error in Tophat 2.0.3 & 2.0.4

*****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

Loading chrUn_gl000245...done
Loading chrUn_gl000246...done
Loading chrUn_gl000247...done
Loading chrUn_gl000248...done
Loading chrUn_gl000249...done
Loading ...done
Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
terminate called recursively
terminate called recursively
terminate called recursively
********************

Any helps?
shibujohn is offline   Reply With Quote
Old 06-24-2012, 12:35 PM   #11
acervera
Junior Member
 
Location: Helsinki

Join Date: Feb 2012
Posts: 6
Default

I am having the same problem with tophat 2.0.4

[2012-06-24 20:24:31] Reporting output tracks
[FAILED]
Error running /opt/tophat-2.0.4.Linux_x86_64/tophat_reports ...

what(): basic_string::_S_construct NULL not valid

Does anyone know how to fix this??
acervera is offline   Reply With Quote
Old 06-24-2012, 06:17 PM   #12
drunk_coder
Junior Member
 
Location: china

Join Date: Dec 2011
Posts: 9
Default

I also have the same problem with tophat 2.0.4 ,But I used tophat 2.0.0 to deal with the same library didn't occur that problem before, so I tryed tophat 2.0.0 again, I find tophat 2.0.0 can solve this problem.
drunk_coder is offline   Reply With Quote
Old 06-29-2012, 03:05 AM   #13
shibujohn
Junior Member
 
Location: London

Join Date: Oct 2009
Posts: 6
Default compile it from source

These errors will go off when you compile it from source.
http://tophat.cbcb.umd.edu/downloads...t-2.0.4.tar.gz


Quote:
Originally Posted by shibujohn View Post
I am also getting the same error in Tophat 2.0.3 & 2.0.4

*****[sjohn@n002 logs]$ tail /home/sjohn/tophat_out/logs/reports.log*****

Loading chrUn_gl000245...done
Loading chrUn_gl000246...done
Loading chrUn_gl000247...done
Loading chrUn_gl000248...done
Loading chrUn_gl000249...done
Loading ...done
Loaded 216660 GFF junctions from ./tophat_out/tmp/refGene.juncs.
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
terminate called recursively
terminate called recursively
terminate called recursively
********************

Any helps?
shibujohn is offline   Reply With Quote
Old 06-29-2012, 10:58 AM   #14
catbus
Member
 
Location: San Francisco

Join Date: Feb 2011
Posts: 21
Default Same problem ("tophat_reports [FAILED]") with tophat 2.0.3.

* Same problem for me with Tophat 2.0.3.

* I am using 2x100bp paired-end sequences, aligning to mm9. This is on Ubuntu 10.04.

* I have only encountered this problem for a small SUBSET of the total number of aligned sequences.

* I'm going to try Tophat 2.0.4.

* I can't think of a good reason to expect that building from source would actually fix this, but I will probably do it anyway "just in case."

Error message:
[2012-06-29 04:09:46] Joining segment hits
[2012-06-29 05:25:22] Reporting output tracks
[FAILED]
Error running tophat_reports

The log has no more useful information than that, which is a bit disappointing.
catbus is offline   Reply With Quote
Old 06-29-2012, 02:06 PM   #15
catbus
Member
 
Location: San Francisco

Join Date: Feb 2011
Posts: 21
Default

Sorry, just thought I'd post that the Tophat 2.0.4 changelog indicates that the
* "reporting output tracks . . . [FAILED]" issue

is SOLVED by an update to Tophat 2.0.4.

I'm re-running my sequences now. This was a known issue, at least!
catbus is offline   Reply With Quote
Old 07-16-2012, 11:38 AM   #16
vinay052003
Member
 
Location: Atlanta, US

Join Date: Jan 2010
Posts: 59
Default

Hi catbus,
Did you solve the "segmentation fault" problem with tophat_report?
I tried running the updated version by compiling tophat 2.0.4 as well as using the pre-built binaries... but still getting the same error.
vinay052003 is offline   Reply With Quote
Old 07-16-2012, 02:58 PM   #17
catbus
Member
 
Location: San Francisco

Join Date: Feb 2011
Posts: 21
Default

2.0.4 solved it for me! I used the pre-built binaries and did not encounter this problem again.
catbus is offline   Reply With Quote
Old 07-17-2012, 01:41 AM   #18
Enraico
Junior Member
 
Location: Venice

Join Date: Feb 2012
Posts: 5
Default

The 2.0.4 official release solved for me, too. They also fixed the launch bash script for Ubuntu.
Enraico is offline   Reply With Quote
Old 12-13-2012, 04:47 AM   #19
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default Tophat error when "Reporting output tracks"

(cross posted on biostar: http://www.biostars.org/p/58978/ )

I've compiled tophat-2.0.6 from the sources. It raises a segmentation fault for:


Code:
[lindenb@master tophat]$ /commun/data/packages/tophat/bin/tophat_reports --min-a
nchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 
--min-isoform-fraction 0.15 --output-dir JETER/ --max-multihits 20 --max-seg-mul
tihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --mi
n-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-cov
erage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mi
smatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --m
ax-insertion-length 3 --max-deletion-length 3 -z gzip --inner-dist-mean 50 --inn
er-dist-std-dev 20 --gtf-annotations Homo_sapiens.GRCh37.69.gtf --gtf-juncs JETE
R/tmp/Homo_sapiens.juncs --no-closure-search --no-coverage-search --no-microexon
-search --sam-header JETER/tmp/chr22_genome.bwt.samheader.sam --report-discordan
t-pair-alignments --report-mixed-alignments --samtools=/commun/data/packages/sam
tools-0.1.18/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-
penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-re
f-gap-open 5 --bowtie2-ref-gap-cont 3 ./chr22.fa JETER/junctions.bed JETER/inser
tions.bed JETER/deletions.bed JETER/fusions.out JETER/tmp/accepted_hits JETER/tm
p/left_kept_reads.m2g.bam JETER/tmp/left_kept_reads.bam JETER/tmp/right_kept_rea
ds.m2g.bam JETER/tmp/right_kept_reads.bam
Code:
tophat_reports v2.0.6 (3670M)
---------------------------------------
[samopen] SAM header is present: 1 sequences.
	Loading 22...
done
	Loading ...done
Loaded 7655 GFF junctions from JETER/tmp/Homo_sapiens.juncs.
Loaded 8026 junctions
Segmentation fault (core dumped)
I'm learning tophat, I've just played with the following makefile:

Code:
include tools.mk
REF=./chr22
REF.bowtie=$(foreach S,1.bt2 2.bt2 3.bt2 4.bt2 rev.1.bt2 rev.2.bt2, $(addsuffix .$S,$(REF)) )
GTF=Homo_sapiens.GRCh37.69.gtf
export PATH:=$(PATH):${BOWTIE2.dir}:${samtools.dir}

JETER: s1_r1.fastq s1_r2.fastq ${REF.bowtie} $(REF).fa $(GTF)
	echo ${PATH}
	/commun/data/packages/tophat/bin/tophat2 -G $(GTF) -o $@ $(REF) s1_r1.fastq s1_r2.fastq
s1_r1.fastq:
	gunzip -c /commun/data/projects/20121206.RNASEQ/XX_1_7_1.fastq.gz | head -n 400000 > $@
s1_r2.fastq:
	gunzip -c /commun/data/projects/20121206.RNASEQ/XX_1_7_2.fastq.gz | head -n 400000 > $@
	
$(GTF):
	gunzip -c /commun/data/pubdb/ensembl/release-69/gtf/homo_sapiens/Homo_sapiens.GRCh37.69.gtf.gz| egrep '^22' > $@
	
$(REF).fa: 
	curl -o $(REF).fa.gz "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr22.fa.gz"
	gunzip  -c $(REF).fa.gz | sed 's/^>chr/>/' > $@
	rm $(REF).fa.gz


${REF.bowtie}:$(REF).fa
	${BOWTIE2.dir}/bowtie2-build  -f -c $(REF) $(REF)
	${BOWTIE2.dir}/bowtie2-inspect -s $(REF)
how should I fix this problem ?
lindenb is offline   Reply With Quote
Old 12-13-2012, 05:16 AM   #20
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default error in From my previous mail: I've narrowed the error. The exception is raised from

I've narrowed the error.

The exception is raised from

Code:
 (int)length(*ref_str)
in

Code:
   src/tophat_reports.cpp
in the line

Code:
   if (new_left >= 0 && new_bh.right() <= (int)length(*ref_str))
it seems that ref_str is NULL when length is invoked

Pierre
lindenb is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO