Seqanswers Leaderboard Ad

**mturchin20** · 05-21-2013, 01:39 PM

So I've looked into this a little further and thought I'd just put up what I'm currently thinking in case anyone has something else to contribute.

It looks like while the bwa samse/sampe produces X0/X1 tags, the bwa mem/bwa bwasw do not produce such tags. Instead (and now just specifically referring to bwa mem), it looks like the XP tag is used to identify both 1) long reads that have split alignments and 2) reads that have multiple alignments overall. In the former situation (of a long read being split), both reads will have a XP tag. Using the -M flag will cause the shorter read of the two to be flagged as "the alignment is not primary" in the bitwise FLAG field (column 2). Not including the -M flag will leave both reads as appearing 'primary' (which I'm guessing is what causes the issues with Picard/GATK that the documentation mentions?). In the latter situation (of a read having multiple hits overall), you can see the other sequence hits using the -a flag. In these situations, the 'primary' read does not contain a XP tag but the sub-primary reads do.

Given this situation, I still think it isn't straight forward to determine whether a read is 'uniquely mapped' or not. I've read about using the MAPQ value as a way to determine this, e.g. a MAPQ of 0 means the read is not uniquely mapped, but I feel like I see examples where the MAPQ is >0 and there are multiple alignments for that read. For example:

HXBHIOF01CJXWW 0 10 69647562 22 23M1D5M2I39M * 0 0 AACCTCAGGTGATCCGTCCGCCTGGCCTGCCCCAAAGTGCTGGGCTTACAGGCATGTGCCACCTTGCCC EE???:5332=CCC==<=>B@@><;5557444244;??CCC;;;;AED???EFECED>==?;:766854 NM:i:5 AS:i:46 XS:i:36
HXBHIOF01CJXWW 256 X 94956752 0 23M1D5M2I33M6S * 0 0 * * NM:i:5 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 272 19 16193041 0 39M2I5M1D23M * 0 0 * * NM:i:7 AS:i:36 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 256 3 169612909 0 23M1D5M2I39M * 0 0 * * NM:i:8 AS:i:35 XP:Z:10,+69647562,23M1D5M2I39M,22,5;
HXBHIOF01CJXWW 256 12 122144914 0 22M1D6M2I26M13S * 0 0 * * NM:i:4 AS:i:34 XP:Z:10,+69647562,23M1D5M2I39M,22,5;

The first read has a MAPQ of 22 even though it maps to other regions in the genome (all with MAPQ of 0, admittedly). Additionally this is not a long read that has been split (excluding the -a flag only produces the first read).

So I guess my current thinking is that I should still use a MAPQ threshold (10 or 30?), but that this may not produce a full set of 'uniquely mapping' reads?

**rpique** · 07-05-2013, 05:24 PM

Can you use the difference between the AS tag and the XS tag?
I can't see the XS tag in your output, it may come with the newest version.

**Kennels** · 08-08-2013, 09:11 PM

The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
https://github.com/lh3/bwa/blob/master/NEWS

**tnp** · 04-16-2014, 11:57 AM

bam tags AS and XS

A comment to how uniquely mapped reads can be extracted. As far as I understand a read is uniquely mapped when AS and XS tags differ - pls correct me if im wrong. Actually I have seen XS values being bigger than AS values which I dont really understand - Cant find a thorough description of what XS orhter than its the score for a suboptimal alignment.

ex with XS bigger than AS:
HWUSI-EAS664L:24:64FGCAAXX:4:11:3337:6807 99 gi|407955691|dbj|AP012495.1| 1359455 0 101M = 1359704 351 CCAATGGCCGCTGCCAGTGGCGTTTTGTCGTGTCTTTCCGGGTTGGACTCAAGTTGATAGTTACCGGA
TAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTC IIHGDIIIIGIIDIIIIEHIIIIGIIGHIIIIHHHIIIIIHIEIIIHIIIHGHIIIFGHIFIHHHIICIGHHHIIIBCGHGEGHGGDGHBEGGIGEFEAD? NM:i:8 MD:Z:3G4T13A1A0A10A16A0C46 AS:i:62 XS:i:68

So is this an example of a uniquely mapped read ?

**trotos** · 07-15-2015, 06:54 AM

Originally posted by Kennels View Post

The "SA" tag is now used (v0.7.5) to identify reads which have chimeric alignment:
https://github.com/lh3/bwa/blob/master/NEWS

hello, I dig up this post because of your comment on the SA: tag
could you please elaborate, the provided link is broken.
the main question is:
are the reads with the SA: code still unique? what about the ones that also incorporate the XA: code?
Here are some examples

Code:

BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 256     chr2    161581931       0       32M17H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        CAC<E?E<E????DDF@DBEB<D?9??D99D<        NM:i:0  MD:Z:32 AS:i:32 XS:i:20 SA:Z:chr12,87311999,-,32M17S,0,0;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        .):639=>?>?8??<)7;7)=8=8)0>8>?.6        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?@AB@AABBBBABBB<ABBB?BAB?BB?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        256     chr2    161581931       1       32M19H  *       0       0       TATCCATTTCATCTGGATTTTCTAGTTTATTT        B?BBBBBBBB=ABBBBBBB?A?ABB0=A?BBB        NM:i:0  MD:Z:32 AS:i:32 XS:i:27 SA:Z:chr12,87311997,-,34M17S,1,1;       XA:Z:chr12,+46991828,32M19S,1;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 0       chr11   22585338        0       18S33M  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTGATTATAGTTTATTTTGCT     IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHFGIHIIIIFEIGHEGHCHF     NM:i:0  MD:Z:33 AS:i:33 XS:i:33 SA:Z:chr17,59076122,+,33M18S,0,0;       XA:Z:chr3,+175713108,51M,4;chr2,-116989725,16S35M,1;chr19,+21698327,7S29M15S,0;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1212:1519:12286 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     ??>?9@?=.:().))?>.)6.?>8>0)8=8=)7;7)<??8?>?>=936:).     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:10843:5627 16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABBABABABBBBBBBB?BB?BAB?BBBA<BBBABBBBAA@BA@?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;
       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360        16      chr12   87311997        1       34M17S  *       0       0       TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA     BBBBBBABB>A<B>B>A?7BBB?A=0BBA?A?BBBBBBBA=BBBBBBBB?B     NM:i:1  MD:Z:0A33       AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0;       XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 16      chr12   87311999        0       32M17S  *       0       0       TATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA       AA==5.DB8BB8=/??*<D99D??9?D<BEBD@FDD????E<E?E<CAC       NM:i:0  MD:Z:32 AS:i:32 XS:i:30 SA:Z:chr2,161581931,+,32M17S,0,0;       XA:Z:chr9,-104599381,49M,4;chr3,+170653467,49M,4;chr12,+46991828,49M,5;
BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 256     chr17   59076122        0       33M18H  *       0       0       TTTTCCCATTCTGTAGATTGTCTGTTTACTCTG       IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHF       NM:i:0  MD:Z:33 AS:i:33 XS:i:32 SA:Z:chr11,22585338,+,18S33M,0,0;       XA:Z:chr4,-151801895,19S32M,0;

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

BWA-MEM and X0/X1/etc flags

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News