SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
De novo assembling taking into account base quality richenlaw Bioinformatics 1 03-12-2011 04:31 AM
Tophat ignoring '--max-multihits' flag? polarise Bioinformatics 3 03-04-2011 11:28 AM
A question about tophat -g/--max-multihits option keycard Bioinformatics 2 12-29-2010 04:36 AM
tophat max-multihits not heeded brentp Bioinformatics 1 06-14-2010 09:54 AM
MagicViewer - Not taking Fasta file BioTalk Bioinformatics 2 03-04-2010 03:48 AM

Reply
 
Thread Tools
Old 02-24-2011, 07:35 AM   #1
DavidMatthewsBristol
Junior Member
 
Location: Bristol, UK

Join Date: Aug 2010
Posts: 7
Default Taking care of multihits in RNA seq runs

Hi,
I've been using Galaxy to analyse rna seq data from mRNA isolated from Hela cells. Like others I have the problem of reads that are multihits. I have put up a workflow for analysis on Galaxy that involves the following steps:
1. Run the data using tophat and allow up to 40 maps per read (default)
2. Use a samtools feature to get rid of all mappings that are not mate paired and in a proper pair.
3. Count up how many times an individual read is in the sam file and remove all read pairs that are not mapped to a unique site, putting them in a separate "multihits" file.
4. Keep all the uniquely mapped proper mate paired hits in a unique hits sam file.

This approach generates more unique hits than asking tophat to throw out reads that do not uniquely map (this may have changed with the latest tophat release - I haven't checked yet). I think (and the tophat guys may correct me on this) this is because tophat may be removing reads where one end is not uniquely mapped but the other is (and therefore only makes sense with one of the mates).
However, whatever tophat does (now or in the future) this approach does have the advantage of telling you where and how big the multihit problem is. My datset has, for example, 18 million unique proper paired reads, 1.3 million that map to two places, a few hundered thousand that map to 3 places and so on down the line.
One problem with multihits is that we may be overestimating some genes by including multihits or conversely underestimating some genes by excluding them. This "Bristol" workflow allows us to at least know if a gene has a problem of being prone to multihits.

I think this approach is useful but I may have missed something or be behind the curve!! Who knows but I thought it might be a useful workflow to start a discussion about what to do with multihit reads.

Cheers
David
DavidMatthewsBristol is offline   Reply With Quote
Old 10-16-2012, 05:20 AM   #2
mathew
Member
 
Location: australia

Join Date: Jan 2011
Posts: 81
Default new TopHAT

I am just curious if new version of TopHAt taken into considertaion this issue? Any one has any experience on this
mathew is offline   Reply With Quote
Old 10-16-2012, 09:06 PM   #3
HSV-1
Member
 
Location: asia

Join Date: Jul 2012
Posts: 38
Default

This has been worrying me for long. Any experience will help.
Thanks .
HSV-1 is offline   Reply With Quote
Reply

Tags
rna seq multi hits galaxy

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO