SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 10-16-2009, 12:41 PM   #1
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default Tophat 1.0.11 crashing

Has anybody had an trouble with Tophat 1.0.11 crashing while analyzing data that would complete normally under Tophat 1.0.10? I've got 8 FASTQ files of data where this happens.

This is from a 2.5GB FASTQ file with solexa 1.3 qualities. The run looks like this:

$ tophat -o ./lane1-crash --GFF /media/HD_2/tophat_project/mouse.gff --solexa1.3-quals -p4 m_musculus s_1_sequence.txt

[Fri Oct 16 10:11:24 2009] Beginning TopHat run (v1.0.11)
-----------------------------------------------
[Fri Oct 16 10:11:24 2009] Preparing output location ./lane1-crash/
[Fri Oct 16 10:11:24 2009] Checking for Bowtie index files
[Fri Oct 16 10:11:24 2009] Checking for reference FASTA file
[Fri Oct 16 10:11:24 2009] Checking for Bowtie
Bowtie version: 0.11.2.0
[Fri Oct 16 10:11:24 2009] Checking reads
seed length: 43bp
format: fastq
quality scale: --solexa1.3-quals
[Fri Oct 16 10:15:19 2009] Reading known junctions from GFF file
[Fri Oct 16 10:16:57 2009] Mapping reads against m_musculus with Bowtie
[Fri Oct 16 10:40:34 2009] Joining segment hits
[Fri Oct 16 10:45:03 2009] Searching for junctions via segment mapping
[FAILED]
Error: segment-based junction search failed with err = -11

If I revert back to 1.0.10 this same run completes normally and I have good looking output.
sdriscoll is offline   Reply With Quote
Old 10-16-2009, 03:16 PM   #2
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Can you send me the logs?
Cole Trapnell is offline   Reply With Quote
Old 10-16-2009, 04:06 PM   #3
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

sure, where should I send them?
sdriscoll is offline   Reply With Quote
Old 10-16-2009, 04:07 PM   #4
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

You can just email them to me at cole@cs.umd.edu
Cole Trapnell is offline   Reply With Quote
Old 10-16-2009, 04:12 PM   #5
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Done. Thanks for taking a look.
sdriscoll is offline   Reply With Quote
Old 10-16-2009, 09:23 PM   #6
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Ah...as it turns out this was due to my own error. Looks like I was using a bad bowtie index. Thanks Cole for pointing out the issue. I'm going to load up the index that I have that works and re-run this stuff. That should do it.
sdriscoll is offline   Reply With Quote
Old 10-23-2009, 09:07 AM   #7
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Dear Cole,

I tried to run TopHat on 454 data where fragments can have different lengths (from ~20 to ~1000) and I did not get any result..

Do all reads should be of about the same length? And if this is the case, can you change it so that it would be possible to use TopHat on 454 data?

Thank you very much in advance,

Valentina
valeu is offline   Reply With Quote
Old 10-23-2009, 12:52 PM   #8
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

TopHat currently requires that reads be the same length, and is NOT designed for 454 reads. You may find after trimming that it works well, but I will not be making specific changes to TopHat to support 454 any time soon. Most of my development time is now spent on Cufflinks, which has a long list of planned features that I want to get to before I graduate.
Cole Trapnell is offline   Reply With Quote
Old 10-23-2009, 02:19 PM   #9
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

prep_reads v1.0.11
---------------------------
15796133 out of 15917477 reads have been filtered out

Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?
sdriscoll is offline   Reply With Quote
Old 10-24-2009, 04:26 AM   #10
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Thank you for answer, Cole!
valeu is offline   Reply With Quote
Old 10-24-2009, 10:33 AM   #11
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Quote:
Originally Posted by sdriscoll View Post
I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

prep_reads v1.0.11
---------------------------
15796133 out of 15917477 reads have been filtered out

Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?
TopHat filters out two types of reads here: those with lots of N's and those which are nearly all the same character. Since TopHat sometimes chooses algorithms that require indexing the unmappable reads, keeping around all the polyA reads, for example, will just bloat the unmapped read index and generate false positive splices between real exons and downstream low complexity repeats.

How many N's do these reads have on average? Is there some systematic problem with that lane that you could trim away?
Cole Trapnell is offline   Reply With Quote
Old 10-26-2009, 12:55 PM   #12
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

I'll take a look. Getting this set of reads to work through Tophat and Cufflinks is for sure something we want to get working.
sdriscoll is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO