Seqanswers Leaderboard Ad

**Cole Trapnell** · 10-16-2009, 03:16 PM

Can you send me the logs?

**sdriscoll** · 10-16-2009, 04:06 PM

sure, where should I send them?

**Cole Trapnell** · 10-16-2009, 04:07 PM

You can just email them to me at [email protected]

**sdriscoll** · 10-16-2009, 04:12 PM

Done. Thanks for taking a look.

**sdriscoll** · 10-16-2009, 09:23 PM

Ah...as it turns out this was due to my own error. Looks like I was using a bad bowtie index. Thanks Cole for pointing out the issue. I'm going to load up the index that I have that works and re-run this stuff. That should do it.

**valeu** · 10-23-2009, 09:07 AM

Dear Cole,

I tried to run TopHat on 454 data where fragments can have different lengths (from ~20 to ~1000) and I did not get any result..

Do all reads should be of about the same length? And if this is the case, can you change it so that it would be possible to use TopHat on 454 data?

Thank you very much in advance,

Valentina

**Cole Trapnell** · 10-23-2009, 12:52 PM

TopHat currently requires that reads be the same length, and is NOT designed for 454 reads. You may find after trimming that it works well, but I will not be making specific changes to TopHat to support 454 any time soon. Most of my development time is now spent on Cufflinks, which has a long list of planned features that I want to get to before I graduate.

**sdriscoll** · 10-23-2009, 02:19 PM

I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

prep_reads v1.0.11
---------------------------
15796133 out of 15917477 reads have been filtered out

Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?

**valeu** · 10-24-2009, 04:26 AM

Thank you for answer, Cole!

**Cole Trapnell** · 10-24-2009, 10:33 AM

Originally posted by sdriscoll View Post

I've gotten good data out of Tophat for 23 out of 24 total lanes of data (>3GB FASTQ files each lane). For some reason that 24th lane (87bp reads, mouse) produces almost nil. The output from prep_reads.log looks like this:

prep_reads v1.0.11
---------------------------
15796133 out of 15917477 reads have been filtered out

Is this most likely due to really poor qualities? What exactly is going on during prep_reads that filters out reads and is there a way to tweak that?

The most confusing thing is that the people who run the sequencing machines run our outputs through Eland in order to make sure the data is good and it was able to align >60% of the reads from this same data that produces almost nothing through Tophat. I guess what I'm looking for is an idea of what the issue could be with this data. Any ideas?

TopHat filters out two types of reads here: those with lots of N's and those which are nearly all the same character. Since TopHat sometimes chooses algorithms that require indexing the unmappable reads, keeping around all the polyA reads, for example, will just bloat the unmapped read index and generate false positive splices between real exons and downstream low complexity repeats.

How many N's do these reads have on average? Is there some systematic problem with that lane that you could trim away?

**sdriscoll** · 10-26-2009, 12:55 PM

I'll take a look. Getting this set of reads to work through Tophat and Cufflinks is for sure something we want to get working.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat 1.0.11 crashing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News