SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Insert Sizes for Paired End Reads Exactly the same as Read Length rlowe Bioinformatics 0 06-27-2012 04:01 AM
TopHat -paired end vs single end reads adarshjose RNA Sequencing 10 06-12-2012 06:15 PM
Paired end reads in Tophat mathew Bioinformatics 8 03-22-2012 04:57 AM
TopHat: how to use paired-end reads without partner nike00 RNA Sequencing 2 07-20-2011 01:46 AM
Velvet insert length on Illumina NGS Paired end reads sari_khaleel Illumina/Solexa 0 10-29-2010 08:12 AM

Reply
 
Thread Tools
Old 01-26-2014, 06:49 AM   #1
sugo
Junior Member
 
Location: Canada

Join Date: Nov 2013
Posts: 8
Default Tophat paired-end reads and minimum length

Hello,

I have a few questions about running my paired-end reads through Tophat (using Galaxy).

1) From what I've read on this forum, it sounds like paired end reads have to be properly mate-matched (that is, each pair must have a mate, and be in the same order, in the R1 and R2 files) in order for Tophat to map the mates properly. My question is, if I save any remaining unpaired mates after QC in a separate file, and run them through Tophat separately from the paired end reads, how can I then join the single-end and paired-end data together for analysis in Cufflinks?

2) How do I determine the standard deviation of distances between my mate-pairs? All I've got to work off of is a graph of the size distributions, which range from about 200 to about 1000 (average ~300). I want to ensure that Tophat is still able to successfully map those larger fragments.

3) What is the shortest fragment length that it is reasonable to try and map? I noticed that the default Tophat setting on galaxy is to map a minimum read segment length of 25. So I'm wondering if this is a good cutoff for minimum length of read to keep after QC.

Any thoughts or suggestions are greatly appreciated
sugo is offline   Reply With Quote
Old 01-26-2014, 10:14 AM   #2
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

1) Personally I wouldn’t bother with this. There might be reasons those reads didn’t map in the right orientation that would mean you’d rather just ignore them anyway. Are you hurting for coverage? Because you probably won’t really recover much this way anyway. If you didn’t have gross problems with library prep or sequencing, hopefully tophat should be aligning 80%+ of the reads correctly.

2) You can use "bamtools stats -insert -in aligned_reads.bam” to figure this out. But remember, tophat can handle pairs that have introns between them, so you really shouldn’t worry about 1000bp. Some pairs will have >100,000bp between them. For instance, in one of my RNAseq data sets the median insert size is just 169bp, while average is 49Kbp. Obviously introns are pushing that average way up.

3) You really shouldn’t have it shorter than about 50bp for detecting splicing. Tophat runs by trying to align the whole read first, then breaking it up into peaces (default is 4 fragments of 25bp for 100bp reads). If you only have one fragment worth (i.e. leaving it at 25bp and having <50bp read length), the splicing mapping is basically worthless. You can set that 25bp to be 20 or something, and get down to a total read length of 40bp, but remember that the shorter the read length the less unique mappings you’re going to have. So your alignments will get progressively worse as you drop that down. The option is set with "--segment-length”. Personally, I wouldn’t do much trimming though. You can get rid of adapters, but in my experience quality trimming really doesn’t help when you’re aligning to a genome. These aligners are already quality aware, so mismatches in poor quality regions don’t hurt you much.
Wallysb01 is offline   Reply With Quote
Old 02-12-2015, 03:52 PM   #3
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Quote:
Originally Posted by Wallysb01 View Post
1) Personally I wouldn’t bother with this. There might be reasons those reads didn’t map in the right orientation that would mean you’d rather just ignore them anyway. Are you hurting for coverage? Because you probably won’t really recover much this way anyway. If you didn’t have gross problems with library prep or sequencing, hopefully tophat should be aligning 80%+ of the reads correctly.
I think you misunderstood the OP. He meant having pairs of reads which have become mate-less after QC, not after mapping.

So, one mate will pass QC, but the other one will not, thus leaving you with a list of single-end reads who lost their partner due to sequencing quality reasons, but they could still align on their own.

What to do in those cases?
carmeyeii is offline   Reply With Quote
Reply

Tags
galaxy, paired-end reads, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO