SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting transcript lengths from GFF file Siva Bioinformatics 20 04-30-2018 10:02 PM
amplicon read lengths SeqNerd Ion Torrent 2 06-08-2011 11:50 PM
Calculating read lengths - SOLiD naluru SOLiD 1 01-26-2011 05:57 AM
Platform comparison of read lengths ryantkoehler General 0 10-05-2009 09:37 AM
How to visualise alignments with different read lengths? lindseyjane Bioinformatics 5 09-17-2009 02:27 AM

Reply
 
Thread Tools
Old 10-20-2010, 10:05 AM   #1
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default Mixed read lengths in TopHat input file

Hi everyone,

After quite a bit of searching the threads on this forum, I'm still not exactly sure how TopHat deals with an input fastq file that contains reads of different lengths (e.g. 51bp and 76bp). I assume that equal read lengths throughout the file would be ideal, but I have four 76bp lanes and one 51bp lane from the same library and would like to use all the data if possible. Any advice from the TopHat/Bowtie power users out there?

Thanks in advance,

Shurjo
shurjo is offline   Reply With Quote
Old 10-20-2010, 11:23 AM   #2
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

This is a shameless plug. But as far as I know TopHat does not yet support that.
SpliceMap is able to deal with such kind of reads natively.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 10-21-2010, 01:23 AM   #3
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

I am not sure because previously Tophat manual explicitly mentioned "reads have to be in equal length". However, such sentence is now nowhere be found in the web.

On the other hand, if such constraint is removed, then it should be a big improvement.

Yet, there is no such announcement in tophat change log.

I am also confused.

Quote:
Originally Posted by john_mu View Post
This is a shameless plug. But as far as I know TopHat does not yet support that.
SpliceMap is able to deal with such kind of reads natively.

Last edited by marcowanger; 10-21-2010 at 01:23 AM. Reason: typo
marcowanger is offline   Reply With Quote
Old 10-21-2010, 01:27 AM   #4
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Question

Quote:
Originally Posted by john_mu View Post
This is a shameless plug. But as far as I know TopHat does not yet support that.
SpliceMap is able to deal with such kind of reads natively.
The new tophat 1.1 showed

Quote:
min read length: xxbp, max read length: xxbp
during run.

So I think the reads do not need to be in equal length??
marcowanger is offline   Reply With Quote
Old 10-21-2010, 01:34 PM   #5
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

Quote:
Originally Posted by marcowanger View Post
The new tophat 1.1 showed



during run.

So I think the reads do not need to be in equal length??
Ah I see, sorry I had not played with the new version much yet. You are probably right.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 10-21-2010, 06:47 PM   #6
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

As you seems to be affiliated to splicemap.

May I ask you one question?

I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

What is the difference between these 2 files??

Quote:
Originally Posted by john_mu View Post
Ah I see, sorry I had not played with the new version much yet. You are probably right.
marcowanger is offline   Reply With Quote
Old 10-22-2010, 05:10 AM   #7
ersenkavak
Junior Member
 
Location: stockholm

Join Date: Feb 2010
Posts: 2
Default

i ran tophat 1.1.1 with varying lengths between 20 to 100.
As far as I can tell, it is working just fine.Even though, i have not compared it systematically, it looks much more powerful then dividing into lengths and running it seperately for each length. This is probably due to the powerful splice mappability with more reads...

cheers
ersenkavak is offline   Reply With Quote
Old 10-22-2010, 11:25 AM   #8
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

Thanks for the input, everyone. I was at a talk by Steve Salzberg earlier this morning where he specifically mentioned that the latest TopHat can handle mixed read lengths in the input file, so I guess that answers my question.
shurjo is offline   Reply With Quote
Old 10-22-2010, 11:27 AM   #9
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

Quote:
Originally Posted by marcowanger View Post
As you seems to be affiliated to splicemap.

May I ask you one question?

I want to know what is the difference between Cufflink compatible sam file and normal sam file that SpliceMap produce.

What is the difference between these 2 files??
Hi marcowanger,

The cufflinks compatible file doesn't include the clipped part of the alignments. Since some alignments might be not be able to find the other end of the split read, we still keep the partial alignment.

Also, if you are interested in trying SpliceMap I suggest you wait until after this weekend. There was a small bug I just found regarding counting the number of multiply mapped reads.

John Mu
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 10-27-2010, 05:00 AM   #10
Steven Salzberg
Junior Member
 
Location: College Park, Maryland

Join Date: Aug 2009
Posts: 3
Default TopHat does support variable read lengths

The previous is poster is correct, I announced this feature at a recent talk. TopHat now supports variable read lengths, meaning you can mix multiple Illumina (or SOLiD) runs that use different lengths and run TopHat just once on them. Make sure you get the latest release, version 1.1.2 (or newer).

ALSO: this release of TopHat adds support for strand-specific RNA-Seq alignment for reads produced by a number of strand-specific protocols. Please see the manual for details.
Steven Salzberg is offline   Reply With Quote
Old 10-27-2010, 05:02 AM   #11
Steven Salzberg
Junior Member
 
Location: College Park, Maryland

Join Date: Aug 2009
Posts: 3
Default

Quote:
Originally Posted by marcowanger View Post
The new tophat 1.1 showed



during run.

So I think the reads do not need to be in equal length??
This was always true - but TopHat handled all reads (of varying lengths) with the same algorithm. Now it dynamically adjusts the mapping strategy based on read length - longer reads are broken up into more pieces that are mapped separately.
Steven Salzberg is offline   Reply With Quote
Old 03-14-2011, 06:14 PM   #12
jkozubek
Member
 
Location: Boston University

Join Date: Mar 2011
Posts: 18
Default

Does anyone know if Tophat is therefore ignoring reads less than the min read length? For instance, if it sets min read length at 20 bp and max read length at 26 bp, would it ignore mapping of a read that is 18 bp?
jkozubek is offline   Reply With Quote
Old 03-14-2011, 07:25 PM   #13
jkozubek
Member
 
Location: Boston University

Join Date: Mar 2011
Posts: 18
Default

Nevermind. i see from my output that it is mapping reads under the min read length.
jkozubek is offline   Reply With Quote
Old 11-01-2012, 05:46 AM   #14
telos
Member
 
Location: London

Join Date: Jan 2010
Posts: 11
Default

If you have mixed read lengths (e.g. due to adaptor trimming) how then do you set the --mate-inner-dist (it would have been better to ask for the expected insert size rather than the inner distance)

Last edited by telos; 11-01-2012 at 07:15 AM.
telos is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO