SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mapping using Tophat2 in different species lischoco RNA Sequencing 3 05-12-2014 09:15 AM
How to do unique mapping in bowtie2 metheuse Epigenetics 9 11-06-2013 06:10 AM
How to do unique mapping in tophat2 for paired end? metheuse RNA Sequencing 4 08-31-2013 09:41 PM
BWA unique mapping, mutireads christophpale Bioinformatics 10 08-10-2011 03:56 PM
unique mapping for chip-seq? frozenlyse Epigenetics 2 12-11-2008 03:05 AM

Reply
 
Thread Tools
Old 06-30-2014, 09:42 AM   #1
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default Unique mapping with Tophat2

Hello everyone,

I have seen posts about filtering unique hits after default run with Tophat2 but some say I should get the reads with MAPQ=255, some say I should get the reads with NH:i:1. Can you briefly describe how to collect the unique hits after tophat2 run, I am a little confused :/
sazz is offline   Reply With Quote
Old 06-30-2014, 12:43 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I think the most recent versions use a MAPQ of 50 for unique alignments, but in either case a threshold of 5 will filter out the multimappers (the current possible MAPQs are 0, 1, 2, 3, and 50, if I remember correctly). At least htseq-count will automatically filter by NH tag if it's there, though I always specified a MAPQ threshold anyway.
dpryan is offline   Reply With Quote
Old 07-01-2014, 06:32 AM   #3
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default

Thanks to devon, I assume I could get the unique hits from a default run (-g 20).

But to confirm if I get the unique hits, I performed several runs. So, if I substract
"-g 1" number of reads from "-g 2" number of reads, I should find the number of multi-
hitting reads, right? In my case it is: 20,064,410 - 19,499,641 = 564,749 reads seems to
be multi mapping. If I substract 564,749 from "-g 1" number of reads, I should find the
uniquely mapping reads: 19,499,461 - 564,749 = 18,934,712? But when I typed
"samtools view -q 50 -o output_file accepted_hits.bam" code to get the number of unique
hits from a default run (-g 20), it is: 19,170,015 :/ so it's not equal, am I thinking
wrong?

Also what I have realized is that if I try to filter unique hits from a "-g 2", the number is: 19,136,250 but if I try to filter unique hits from a "-g 20" (default), it is: 19,170,015 , I expected that they would be also equal in number. What am I missing?
sazz is offline   Reply With Quote
Old 07-01-2014, 06:41 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I'm not sure what's going on in your case, but there are certainly problems with the mapq encoding in tophat. Specifically it seems to calculate the mapq value after doing the -g filtering instead of before, so if you specify -g 1 then all reads (even ones with lots of possible mapping positions) will get a mapq of 50.

If you're using -g > 1 then you could also compare the mapq values to the NH:i:x values since these should correctly report whether the hit is unique or not.
simonandrews is offline   Reply With Quote
Old 07-01-2014, 06:45 AM   #5
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default

yes, you are right. If you do a "-g 1" run, all the reads have MAPQ of 50, so it defines uniqueness depending on the output file.
sazz is offline   Reply With Quote
Old 07-01-2014, 06:47 AM   #6
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by sazz View Post
yes, you are right. If you do a "-g 1" run, all the reads have MAPQ of 50, so it defines uniqueness depending on the output file.
Indeed, but it really shouldn't do that as it's completely misleading. The NH:i;x tags are specifically designed to show the number of redundant hits within the current file, but mapq should be a measure of mapping quality which is independent of how many hits you reported.
simonandrews is offline   Reply With Quote
Old 07-01-2014, 08:08 AM   #7
sazz
Member
 
Location: Istanbul, Turkey

Join Date: Oct 2012
Posts: 28
Default

I tried to find the number of unique mappings with "NH:i:1" tag but the result is exactly same as the number I find with MAPQ:50.

"-g 2": unique hits both with NH:i:1 and MAPQ 50 is 19,136,250
"-g 20": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015
"-g 1000": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015

Still I don't understand why there is difference between "-g 2" and "-g 20"?
sazz is offline   Reply With Quote
Old 07-01-2014, 08:47 AM   #8
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by sazz View Post
I tried to find the number of unique mappings with "NH:i:1" tag but the result is exactly same as the number I find with MAPQ:50.

"-g 2": unique hits both with NH:i:1 and MAPQ 50 is 19,136,250
"-g 20": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015
"-g 1000": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015

Still I don't understand why there is difference between "-g 2" and "-g 20"?
I'm not sure. In the controlled tests I did to get to the bottom of this we got back what we expected, with the -g filter being applied correctly. We also tested the --report-secondary-alignments options which also seemed to work OK. The only problem we hit was with the mapq values.

As another check you could look at the 0x100 position in the flag value. This should be false for only one read in each set (the primary alignment) so finding the number of reads for which this is false should also give you a measure of different hits.
simonandrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:56 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO