SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to report only 1 hit read by bowtie2? zhouzaiwei Bioinformatics 2 09-10-2013 10:39 PM
SAM alignment result: match without reference report finfin General 0 07-23-2012 08:07 AM
tagcleaner.pl 26% of my sequences match the tag NNNNNNNNNNNN john1923 Bioinformatics 0 07-29-2011 07:42 AM

Reply
 
Thread Tools
Old 12-12-2013, 11:38 PM   #1
momo
Junior Member
 
Location: china

Join Date: Jul 2011
Posts: 1
Question HELP!! XS tag to filter unique reads Do NOT match bowtie2 report

I find the XS tag filtering the uniquely mapped reads number is different from the bowtie2 report. I first extract the flag number of 147/99 + 163/83 and it matches the reported 'aligned concordantly exactly 1 time' + 'aligned concordantly >1 times'. But when I extract the pairs with both reads tagged XS and find the number is a bit larger than the reported 'aligned concordantly >1 times'. I wonder what reads bowtie2 reported as unique...

Here is the output of bowtie2:
28115453 reads; of these:
28115453 (100.00%) were paired; of these:
9177458 (32.64%) aligned concordantly 0 times
15185809 (54.01%) aligned concordantly exactly 1 time
3752186 (13.35%) aligned concordantly >1 times
----
9177458 pairs aligned concordantly 0 times; of these:
2358270 (25.70%) aligned discordantly 1 time
----
6819188 pairs aligned 0 times concordantly or discordantly; of these:
13638376 mates make up the pairs; of these:
9978644 (73.17%) aligned 0 times
1757628 (12.89%) aligned exactly 1 time
1902104 (13.95%) aligned >1 times
82.25% overall alignment rate

and the numbers I get:
flag:
147/99 9471127
163/83 9466868

Pairs with XS tag
0 read get XS tag:13757424
1 read get XS tag:1358209
2 reads get XS tag:3822362
momo is offline   Reply With Quote
Old 02-10-2014, 08:56 AM   #2
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default Mismatch between bowtie2 report and manual XS-flag count

We are having a similar problem. Our bowtie2 report does not match the manual count of XS flags:

192904649 reads; of these:
192904649 (100.00%) were paired; of these:
30256829 (15.68%) aligned concordantly 0 times
85757625 (44.46%) aligned concordantly exactly 1 time
76890195 (39.86%) aligned concordantly >1 times
----
30256829 pairs aligned concordantly 0 times; of these:
8088262 (26.73%) aligned discordantly 1 time
----
22168567 pairs aligned 0 times concordantly or discordantly; of these:
44337134 mates make up the pairs; of these: (correct, counted flag 'YT:Z:UP')
26074256 (58.81%) aligned 0 times (correct, counted flag 'YT:Z:UP' && not 'AS:i')
4769246 (10.76%) aligned exactly 1 time (??? we get: 10728515, counted flag 'YT:Z:UP' && 'AS:i' && not 'XS:i')
13493632 (30.43%) aligned >1 times (??? we get: 7534363, counted flag 'YT:Z:UP' && 'XS:i')
93.24% overall alignment rate

Any ideas of where it goes wrong??

We did a quick check and re-run bowtie2 with the 'k 2' option, i.e. bowtie2 reports 2 alignments wherever 2 valid alignments are found. And indeed, we find reads with the 'YT:Z:UP' && 'AS:i' && not 'XS:i' flags, where a second alignment is reported nevertheless. Does this suggest that the XS flag is not always properly set?

I'd be thankful for any advise!
Juliana and Katrin
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 01:58 AM   #3
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The presence of an XS tag doesn't mean it aligns more than once. It'll align more than one time if AS==XS (though I'd have to double check the bowtie2 source code to ensure that that's how it's actually calculated).
dpryan is offline   Reply With Quote
Old 02-11-2014, 05:05 AM   #4
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Hi dpryan,

Thanks for your reply. What does the XS flag stand for in your opinion?

As I understood the bowtie2 manual, the XS flag is set, when a second valid alignment for the same read is found. AS==XS would then be the case, if the second valid alignment (XS) has a score as high as the first (reported as "best", AS) valid alignment. Please correct me, if I am wrong.

Best,
Katrin
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 05:33 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That's correct, but one needs to remember that the existence of a secondary alignment doesn't mean that the read doesn't map uniquely. If that were not the case, then one could decrease the threshold for declaring an alignment valid low enough that there would be no possible unique alignments. This rather goes against what we generally mean when we conceive of reads mapping uniquely vs. multiply.

If AS==XS then everyone agrees that the read is multimapping. If that's not the case, then most people would agree that the read originated from the first rather than the second position, but simply temper that belief by whatever the computed MAPQ turns out to be (N.B., the bowtie2 algorithm is weird). A quick perusal of the bowtie2 code (btw, apparently there's a new version since yesterday) suggests that these values are computed in AlnSinkWrap::finishRead in the aln_sink.cpp file. Have a look there if you really want to know exactly how the reported numbers were aligned (I've never looked through that function since I rely on those numbers for anything important).
dpryan is offline   Reply With Quote
Old 02-11-2014, 06:11 AM   #6
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Hi dpryan,

Thanks again, also for clarifying terminology. I always assumed the bowtie2 summary stated whether one or more valid alignments are found irrespective of the scores. For single-end reads, this is actually the case. The count of XS flags in the bowtie2 output is equal to the ">1 times" in the summary. So for single-end data, I disregarded all (multiple) alignments with XS==AS by manual filtering.

Now I was trying to find a way for such a (uniqueness) filter for paired-end data and got stuck on the XS flag counts. But even if you are right, my counts of XS should result in a higher number than ">1 times" in the summary (because I count the mere presence of XS, i.e. both AS>XS and AS==XS). My count is smaller though. I am puzzled!? So either XS is not always set properly, or the bowtie2 summary is wrong, or -of course- I still don't get it.

Thanks for your help and thoughts,
Katrin
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 06:28 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

True, someone would need to go through the code to see how those numbers are derived.
dpryan is offline   Reply With Quote
Old 02-11-2014, 06:36 AM   #8
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Yes, probably. It would be nice to know what is going on. I suppose filtering alignments for uniqueness is an issue of general interest. And since this option is gone in bowtie2 ('-m' in bowtie version 1), filtering based on XS flags would be an easy alternative.

Do you have experience with paired-end DNA-sequence data? If so, do you apply any filter, or do you keep all reads?
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 06:41 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I usually just filter by MAPQ score (5 or 10 is a good enough threshold to ensure that an alignment is actually unique).
dpryan is offline   Reply With Quote
Old 02-11-2014, 07:00 AM   #10
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Could you point me to any site that explains how MAPQ is calculated in bowtie2, or is it very complicated? So far I was hesitant to filter based on MAPQ because I only understood it on a superficial level from what I read in the manual (Q = -10*log10(p), where p is the probability that the read does not originate from the aligned location, i.e. for Q=5, p=.32 ... but don't know what p is based on ... can be a not-so-good alignment, can be the presence of another valid alignment, probably altogether).

Thanks.
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 07:04 AM   #11
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I posted the algorithm previously here. There's only a vague relationship between bowtie2's MAPQ scores and the idealized MAPQ scores specified in the SAM spec.
dpryan is offline   Reply With Quote
Old 02-11-2014, 08:32 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I should note that I just corrected the code in that link I posted (the output of 2 or 6 was wrong before).
dpryan is offline   Reply With Quote
Old 02-11-2014, 08:38 AM   #13
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Thanks for the notice. Did you extract this code from bowtie2?
KatrinSameith is offline   Reply With Quote
Old 02-11-2014, 09:43 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It's very close to the code in bowtie2. This is from bison (a BS-seq aligner I wrote that uses bowtie2 as the underlying aligner), which is written in C, rather than bowtie2, which is written in C++, but aside from the language/interface difference most of the code will be identical (there aren't that many ways to code that algorithm, frankly).

Edit: I believe the bowtie2 function is in unique.h, if you're curious (there are 3 versions there, #2 seems to be the one used, at least up to version 2.1.0).

Last edited by dpryan; 02-11-2014 at 09:45 AM.
dpryan is offline   Reply With Quote
Old 02-18-2014, 04:58 AM   #15
KatrinSameith
Member
 
Location: Dresden, Germany

Join Date: Jan 2014
Posts: 10
Default

Just to let you know. The new bowtie2 version (2.2.0) is fine! I re-run my analysis -- and the XS counts give the same results as the bowtie2 summary. I should say that the summary is more or less the same, implying that the XS tags were not always properly set before.

Best & thanks again,
Katrin
KatrinSameith is offline   Reply With Quote
Old 02-18-2014, 05:09 AM   #16
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Good to know, thanks!
dpryan is offline   Reply With Quote
Old 02-11-2015, 06:19 AM   #17
am@i
Member
 
Location: lucknow

Join Date: Dec 2013
Posts: 13
Default

Hello,
I find the XS tag filtering the aligned discordantly 1 time reads.

Thanks
Amrita
am@i is offline   Reply With Quote
Old 03-03-2015, 07:33 AM   #18
JPOeyen
Junior Member
 
Location: Bonn, Germany

Join Date: Mar 2015
Posts: 3
Default

Hi all,
I am having similar problems filtering my bowtie2 output, although not as severe as the originally reported numbers. I mapped reads back to a denovo assembled transcriptome with the following settings

Quote:
-all --end-to-end --score-min L,-0.1,-0.1 --no-discordant --no-mixed
and got the following bowtie2 std output

Quote:
44223325 reads; of these:
44223325 (100.00%) were paired; of these:
7691237 (17.39%) aligned concordantly 0 times
25521175 (57.71%) aligned concordantly exactly 1 time
11010913 (24.90%) aligned concordantly >1 times
82.61% overall alignment rate
I was able to confirm the number of reads which did map concordantly exactly 0 time by searching for the 'YT:Z:UP' tag. However, when I look for reads which are mapped concordantly exactly 1 time I get a number (25,531,318) which is about 100,000 higher than the number reported by bowtie2. I counted all lines in the .sam file where the 'XS:i' and 'YT:Z:UP' tags are not present. The number does not change if I include the presence of the 'AS:i' tag.

I am aware that the --all setting leaves the MAPQ without much meaning, so using this to filter out uniquely mapped reads is not possible. I could redo the analysis to avoid this problem, but would rather continue working with the same dataset to keep things consistent There are no mentions of this setting causing any other problems though.

Is there something I am missing, or did my settings do something unexpected?

Best,
Jan Philip
JPOeyen is offline   Reply With Quote
Old 03-03-2015, 07:38 AM   #19
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.

Note that you're typically best off simply filtering by MAPQ score.
dpryan is offline   Reply With Quote
Old 03-03-2015, 08:38 AM   #20
JPOeyen
Junior Member
 
Location: Bonn, Germany

Join Date: Mar 2015
Posts: 3
Default

Thank you for the swift reply. I however do not fully understand how uniqueness is defined, if not by the fact that a given read only maps to one location (given the set score thresholds etc). I understand that if the first location a read maps to is significantly better than the alternate locations (by MAPQ score feks) it could probably also be considered unique in some respect.

However, if it is true that uniquely mapped reads could also have a XS tag, I would have expected the number of reads without it to be significantly lower, not higher, than the number reported by bowtie. So I am still pretty puzzled by my results.

I will try to have a look at your posts regarding the MAPQ scores and seriously consider redoing my analyses.
JPOeyen is offline   Reply With Quote
Reply

Tags
bowtie2, unique reads, xs tag

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:16 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO