![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mapping using Tophat2 in different species | lischoco | RNA Sequencing | 3 | 05-12-2014 09:15 AM |
How to do unique mapping in bowtie2 | metheuse | Epigenetics | 9 | 11-06-2013 06:10 AM |
How to do unique mapping in tophat2 for paired end? | metheuse | RNA Sequencing | 4 | 08-31-2013 09:41 PM |
BWA unique mapping, mutireads | christophpale | Bioinformatics | 10 | 08-10-2011 03:56 PM |
unique mapping for chip-seq? | frozenlyse | Epigenetics | 2 | 12-11-2008 03:05 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Istanbul, Turkey Join Date: Oct 2012
Posts: 28
|
![]()
Hello everyone,
I have seen posts about filtering unique hits after default run with Tophat2 but some say I should get the reads with MAPQ=255, some say I should get the reads with NH:i:1. Can you briefly describe how to collect the unique hits after tophat2 run, I am a little confused :/ |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
I think the most recent versions use a MAPQ of 50 for unique alignments, but in either case a threshold of 5 will filter out the multimappers (the current possible MAPQs are 0, 1, 2, 3, and 50, if I remember correctly). At least htseq-count will automatically filter by NH tag if it's there, though I always specified a MAPQ threshold anyway.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Istanbul, Turkey Join Date: Oct 2012
Posts: 28
|
![]()
Thanks to devon, I assume I could get the unique hits from a default run (-g 20).
But to confirm if I get the unique hits, I performed several runs. So, if I substract "-g 1" number of reads from "-g 2" number of reads, I should find the number of multi- hitting reads, right? In my case it is: 20,064,410 - 19,499,641 = 564,749 reads seems to be multi mapping. If I substract 564,749 from "-g 1" number of reads, I should find the uniquely mapping reads: 19,499,461 - 564,749 = 18,934,712? But when I typed "samtools view -q 50 -o output_file accepted_hits.bam" code to get the number of unique hits from a default run (-g 20), it is: 19,170,015 :/ so it's not equal, am I thinking wrong? Also what I have realized is that if I try to filter unique hits from a "-g 2", the number is: 19,136,250 but if I try to filter unique hits from a "-g 20" (default), it is: 19,170,015 , I expected that they would be also equal in number. What am I missing? |
![]() |
![]() |
![]() |
#4 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I'm not sure what's going on in your case, but there are certainly problems with the mapq encoding in tophat. Specifically it seems to calculate the mapq value after doing the -g filtering instead of before, so if you specify -g 1 then all reads (even ones with lots of possible mapping positions) will get a mapq of 50.
If you're using -g > 1 then you could also compare the mapq values to the NH:i:x values since these should correctly report whether the hit is unique or not. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Istanbul, Turkey Join Date: Oct 2012
Posts: 28
|
![]()
yes, you are right. If you do a "-g 1" run, all the reads have MAPQ of 50, so it defines uniqueness depending on the output file.
|
![]() |
![]() |
![]() |
#6 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
Indeed, but it really shouldn't do that as it's completely misleading. The NH:i;x tags are specifically designed to show the number of redundant hits within the current file, but mapq should be a measure of mapping quality which is independent of how many hits you reported.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Istanbul, Turkey Join Date: Oct 2012
Posts: 28
|
![]()
I tried to find the number of unique mappings with "NH:i:1" tag but the result is exactly same as the number I find with MAPQ:50.
"-g 2": unique hits both with NH:i:1 and MAPQ 50 is 19,136,250 "-g 20": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015 "-g 1000": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015 Still I don't understand why there is difference between "-g 2" and "-g 20"? |
![]() |
![]() |
![]() |
#8 | |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() Quote:
As another check you could look at the 0x100 position in the flag value. This should be false for only one read in each set (the primary alignment) so finding the number of reads for which this is false should also give you a measure of different hits. |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|