SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   bowtie2 bit 2 of the flag (http://seqanswers.com/forums/showthread.php?t=70353)

KamilSJaron 07-18-2016 08:55 AM

bowtie2 bit 2 of the flag
 
Hello,

I am wondering what does mean "The second bit (2 in decimal, 0x2 in hexadecimal) is set if the read is part of a pair that aligned in a paired-end fashion."

I have mate pair reads mapped with following entries in sam:

Code:

H8:C19M1ACXX:5:1101:6777:2322  97      Sc_hox  1694486 42      101M    =      1691781 -2806  AGCCGGGCTGGAGCTTACCTGGCTGACAGGAACTTCTCTGGCTAGCATATACGATTTTCGGTGCACCGTGTAGATCTGCTTGGAGATTACTAGTAAGTGTC  CCCFFFFFHHHHHJJJJJJHGHJJJIIIIJIIJIIJJJJIIJJJIJJJJJIIJGIIIJGHHHFFFEE>;A?CCDDDDCCDDDB:CCDCCACCCCBCB?ACD  AS:i:-10        XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:93G0G6    YS:i:0  YT:Z:DP
H8:C19M1ACXX:5:1101:6777:2322  145    Sc_hox  1691781 42      101M    =      1694486 2806    TACTGCTGAAGGCAATATTAGTTCCCCTCACAATGCGAAACTAATGTTATAGATTATATTGAAAACTCATTGGTACTGTAAACTATATCATAAACATAGTA  CDDDDCDEEEEEEEFFFFFFHHHIIIJJJIHJJIJIIJJHJJIJJIIJIHHIJJJIJIIJJJIGHFJIIJJJJJGJJJJJJJJJJJJJHHHHHFFFFFCCC  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:101        YS:i:-10        YT:Z:DP

Bitflags of those reads in binary formats are (from left to right 1 2 4 8 16 32 64 128)

1 0 0 0 0 1 1 0
1 0 0 0 1 0 0 1

It seems that both reads are well aligned (full length), they are bout 3k from each other, they are on other strains...

The only thing I can not understand is, why it has not bit 2 activated, because everything else looks fine to me...

Cheers

dpryan 07-18-2016 10:52 AM

Bit 2 denotes aligning as a "proper pair according to the aligner". Firstly, the alignments point away from each other. Secondly a 3kb fragment size is about 6-10x wider than expected. Are these mate-pair alignments? If so, you'll need to tell the aligner that.

KamilSJaron 07-18-2016 11:06 AM

Quote:

Originally Posted by dpryan (Post 196886)
Bit 2 denotes aligning as a "proper pair according to the aligner". Firstly, the alignments point away from each other. Secondly a 3kb fragment size is about 6-10x wider than expected. Are these mate-pair alignments? If so, you'll need to tell the aligner that.

Cheers Ryan!

Yes, these are mate-pairs (That is why I expected this insert size). Is it really important for mapper to know that the library is mate-pair library? It is already post-processed - therefore only difference to pair-end library is the insert size...

I have not realized before, that they do not point to each other. If this is the reason, what is the interpretation of this observation? Because there are about half of pairs without flag 2 (I have to check if all of them are pointing away from each other)

dpryan 07-18-2016 11:11 AM

For mate-pairs it's expected that alignments point away from each other and have long insert sizes. Just make sure not to filter these out during a down-stream analysis step.

KamilSJaron 07-18-2016 03:52 PM

You are right. Twice.

Mate-pair indeed should, be pointing out from each other, but this is not the problem. And here comes, why you were right twice - bowtie thought that I have pair-end reads, therefore he considered correct the fraction of reads close to each other pointing out to each other.

So when I have plotted histogram of insert sizes, I found that all correct reads (with flag 2) have very small insert size and those, which have small insert size and still do not have flag 2, they are pointing out from each other (mate-pair orientation) https://plus.google.com/u/0/10778698...81109861311357

So my ultimate explanation is, that I just have poorly prepossessed mate-pairs, because about half of them are in fact just pare-end reads (i.e. they probably just had trimmed adapter from one of the outer sides).

Brian Bushnell 07-18-2016 05:57 PM

What kind of reads are these? Nextera LMP libraries, for example, are expected to produce a substantial fraction of short inserts, and those reads need to be specially processed first. If processed correctly, the long- and short-inserts will not end up in the same file.

KamilSJaron 07-19-2016 12:44 AM

Brian, yes, I guess so, given the fraction of short insert sizes with convergent orientation. There reads were trimmed (and qc looked good), therefore I expected that the facrtion of short inserts is filtered already - apparently it is not.

Mystery is solved, Thanks again Ryan.


All times are GMT -8. The time now is 11:39 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.