SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alignment/transcriptome assembly/differential expression analysis with 40bp reads? heytreeful Illumina/Solexa 4 03-11-2013 08:54 AM
Software for identification of Trinity/Cufflinks transcripts ifiddes Bioinformatics 5 02-29-2012 11:57 PM
merging paired reads - any software out there? Tectona De novo discovery 4 12-19-2011 04:29 AM
RNA-seq data analysis with 40bp reads heytreeful Bioinformatics 1 10-17-2011 09:30 PM
ask help for identification of novel transcripts jiwu2573 Bioinformatics 0 01-23-2010 02:38 PM

Reply
 
Thread Tools
Old 03-11-2013, 05:31 AM   #1
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default 40bp paired reads for novel transcripts identification, which software?

I have some 40bp paired reads RNAseq data, basically that 40 x 2 from each RNA fragment.

It seems to me that, tophat prefers reads that is longer than 25. Is there some software that you recommend and can use such short reads to identify novel transcripts?

Or should I just cut the reads into 3 piece of 13bp long? It seems a little crazy to me. but considering it allow at most 1 junction, then, it seems to me that the method might work?

Any suggestions, fellows?
sqcrft is offline   Reply With Quote
Old 03-11-2013, 05:42 AM   #2
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

My understanding of how Tophat works is that if it can't align the entire read it divides it into (by default) 25bp chunks and tries to align those separately. You can change the size they get split into, maybe set it to 20bp so then each of your reads will get split in half.
biznatch is offline   Reply With Quote
Old 03-11-2013, 06:07 AM   #3
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

For new transcript detection, try reference-based transcriptome assembly like cufflinks
NicoBxl is offline   Reply With Quote
Old 03-11-2013, 06:10 AM   #4
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

thanks, that is what I am thinking. It seems to me that tophat works better with 3 pieces, that is why they recommend a minimal 75 = 25 x 3 read length.

Now, I am not sure whether I should try cut into 3 pieces or 2 pieces: 13 x 3 or 20 x 2.
sqcrft is offline   Reply With Quote
Old 03-11-2013, 06:15 AM   #5
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

Thanks, I will probably go with cufflinks later.

but that is next step. right now the main issue to me is about the alignment.
The alignment needs to be optimized at the fist place before cufflinks works. It seems to me that 40bp is too short for alignment software optimized for novel transcript identification.

Quote:
Originally Posted by NicoBxl View Post
For new transcript detection, try reference-based transcriptome assembly like cufflinks
sqcrft is offline   Reply With Quote
Old 03-11-2013, 06:19 AM   #6
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

I don't think tophat necessarily works better with 3 segments for each read, do you see that somewhere? But I know that if your sequences get too short it gets harder to get specific alignments. I would try 2x20 (or try both and compare). There may be other options to change to optimize for shorter reads as well.
biznatch is offline   Reply With Quote
Old 03-11-2013, 06:42 AM   #7
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

I am now more inclined to use 20 x 2 now. But probably I will try both, computation facility is free for me any way.

I can't find a source now to support the point: 3-piece is better, but I believe I saw it somewhere before.

Here is a not so strong support

"TopHat generates its database of possible splice junctions from two sources of evidence. The first and strongest source of evidence for a splice junction is when two segments from the same read (for reads of at least 45bp) are mapped at a certain distance on the same genomic sequence or when an internal segment fails to map"

http://tophat.cbcb.umd.edu/manual.html#whis

It seems to me that, if you have 3 pieces, you are more likely to have two pieces mapped to two adjacent exons. If you have only two pieces, the only possible way is that, the splicing junction is right in the middle of the read, thus is less likely to happen. But, I couldn't find a formal statement from a paper or the tophat website to support this point.

Quote:
Originally Posted by biznatch View Post
I don't think tophat necessarily works better with 3 segments for each read, do you see that somewhere? But I know that if your sequences get too short it gets harder to get specific alignments. I would try 2x20 (or try both and compare). There may be other options to change to optimize for shorter reads as well.
sqcrft is offline   Reply With Quote
Old 03-11-2013, 07:12 AM   #8
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by sqcrft View Post
I have some 40bp paired reads RNAseq data, basically that 40 x 2 from each RNA fragment.

It seems to me that, tophat prefers reads that is longer than 25. Is there some software that you recommend and can use such short reads to identify novel transcripts?

Or should I just cut the reads into 3 piece of 13bp long? It seems a little crazy to me. but considering it allow at most 1 junction, then, it seems to me that the method might work?

Any suggestions, fellows?
You can try STAR with --seedSearchStartLmax 20 or smaller (this effectively defines the "segment" length). Even though you are looking for novel transcripts, I would highly recommend "mapping with annotations" as it reduces the misalignment rate.
alexdobin is offline   Reply With Quote
Old 03-11-2013, 07:53 AM   #9
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

Thanks, that is what I am doing now.
I am using the ensembl gtf as reference, also split reads in to 2 piece of 20bp.

other option I am using is to set maximal intron size at 5kb to reduce false discovery due to short 40bp reads. and "--coverage-search"

Quote:
Originally Posted by alexdobin View Post
You can try STAR with --seedSearchStartLmax 20 or smaller (this effectively defines the "segment" length). Even though you are looking for novel transcripts, I would highly recommend "mapping with annotations" as it reduces the misalignment rate.
sqcrft is offline   Reply With Quote
Reply

Tags
rnaseq, short paired end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO