I'm trying to extract the set of reads from a bam file where neither the read itself, nor its mate aligned. To accomplish this, first I tried using the REQUIRE argument (-f) with the flag 12:
read unmapped
mate unmapped
When I ran it, the lines it was returning appeared to be aligned even though the bit flags seemed to imply that they were not (i.e. the bit flag did include read unmapped & mate unmapped). Here are a couple example lines returned from:
samtools view -f12 <bam>
ILLUMINA-1898B0:32:638L8AAXX:4:45:2010:18660 77 TRISPI_Contig6402 551 25 80M = 551 0 ACCGTAGGCCGCTACCGTAACCATGAACGCAGGTACAGATGCTCAGGAGTCCGGGAGTGACCAGACGAATTTCTAACAGA HHGHHHHHDHHGHHHHHHHHBHHHG<GEGGGGD<GGGEGDGDGGGFHHH@HDDHHBG3DBDDBDB@D>3>BBBB:CEBDE RG:Z:ALL_3_MERGED.1 XT:A:U NM:i:4 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:15A24A16C7G14
ILLUMINA-1898B0:32:638L8AAXX:3:89:8485:16433 93 TRISPI_Contig2320 1477 0 80M = 1477 0 TCCGTCAATGTCGACGATGATATACAGTTCCGCTGCGCAGAACCCGCCCGGATACCAAGTTCCACGACTGGACATGCATC CEDD<FFBEF<HIGHHDBFHEFFBIFHFBHIIIHBGGGGGIAIIGIIIIIHIIFGIIGIIHIIIIIIIIIIIIIIIIIII RG:Z:ALL_3_MERGED.1 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:33 XM:i:2 XO:i:0 XG:i:0 MD:Z:23C24A31
ILLUMINA-1898B0:32:638L8AAXX:4:30:1852:14147 157 TRISPI_Contig2320 1477 0 80M = 1477 0 TCCGTCAATGTCGACGATGATATACAGTTCCGCTGCGCAGAACCCGCCCGGATACCAAGTTCCACGACTGGACATGCATC B3DB@====8>>DBDEE<EE8ECBE@@@CD@I@IHFG@GG8CDAGGDGGGDGED>IIGGIGGG<GDDGDGII?GIIIGEB RG:Z:ALL_3_MERGED.1 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:33 XM:i:2 XO:i:0 XG:i:0 MD:Z:23C24A31
In every case I am seeing a reference being listed (as opposed to the '*' I expect for a query with no hit), and CIGAR strings that seem to also confirm some kind of mapping being done. While in many of the lines output, the mapping quality is 0, I am seeing some cases where the mapping quality is fairly good (such as the '25' in the first example above). But actually I didn't think mapping score would affect how samtools parses the bit flag.
Going by the bit flags in the produced output, it does appear that samtools view correctly filtered the data. But all of these alignments that were output seem to be at odds with their own bit flags, clearly being alignments even though the flag says they are not.
The bam file I'm working with was produced in 2012 from the bwa aligner using aln + sampe. Are current bit flag conventions incompatible with whatever may be in the older bam files? Can anyone explain whats going on?
And finally, in a related question, when I use the samtools view filter/require (-F/-f) arguments, when I use bit flags for multiple states, such as my read unmapped & mate unmapped above, does samtools combine them using AND or OR? I kind of expected the require argument to use AND, and the filter argument to use OR. But I haven't read anywhere exactly how it works.
read unmapped
mate unmapped
When I ran it, the lines it was returning appeared to be aligned even though the bit flags seemed to imply that they were not (i.e. the bit flag did include read unmapped & mate unmapped). Here are a couple example lines returned from:
samtools view -f12 <bam>
ILLUMINA-1898B0:32:638L8AAXX:4:45:2010:18660 77 TRISPI_Contig6402 551 25 80M = 551 0 ACCGTAGGCCGCTACCGTAACCATGAACGCAGGTACAGATGCTCAGGAGTCCGGGAGTGACCAGACGAATTTCTAACAGA HHGHHHHHDHHGHHHHHHHHBHHHG<GEGGGGD<GGGEGDGDGGGFHHH@HDDHHBG3DBDDBDB@D>3>BBBB:CEBDE RG:Z:ALL_3_MERGED.1 XT:A:U NM:i:4 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:4 XO:i:0 XG:i:0 MD:Z:15A24A16C7G14
ILLUMINA-1898B0:32:638L8AAXX:3:89:8485:16433 93 TRISPI_Contig2320 1477 0 80M = 1477 0 TCCGTCAATGTCGACGATGATATACAGTTCCGCTGCGCAGAACCCGCCCGGATACCAAGTTCCACGACTGGACATGCATC CEDD<FFBEF<HIGHHDBFHEFFBIFHFBHIIIHBGGGGGIAIIGIIIIIHIIFGIIGIIHIIIIIIIIIIIIIIIIIII RG:Z:ALL_3_MERGED.1 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:33 XM:i:2 XO:i:0 XG:i:0 MD:Z:23C24A31
ILLUMINA-1898B0:32:638L8AAXX:4:30:1852:14147 157 TRISPI_Contig2320 1477 0 80M = 1477 0 TCCGTCAATGTCGACGATGATATACAGTTCCGCTGCGCAGAACCCGCCCGGATACCAAGTTCCACGACTGGACATGCATC B3DB@====8>>DBDEE<EE8ECBE@@@CD@I@IHFG@GG8CDAGGDGGGDGED>IIGGIGGG<GDDGDGII?GIIIGEB RG:Z:ALL_3_MERGED.1 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:33 XM:i:2 XO:i:0 XG:i:0 MD:Z:23C24A31
In every case I am seeing a reference being listed (as opposed to the '*' I expect for a query with no hit), and CIGAR strings that seem to also confirm some kind of mapping being done. While in many of the lines output, the mapping quality is 0, I am seeing some cases where the mapping quality is fairly good (such as the '25' in the first example above). But actually I didn't think mapping score would affect how samtools parses the bit flag.
Going by the bit flags in the produced output, it does appear that samtools view correctly filtered the data. But all of these alignments that were output seem to be at odds with their own bit flags, clearly being alignments even though the flag says they are not.
The bam file I'm working with was produced in 2012 from the bwa aligner using aln + sampe. Are current bit flag conventions incompatible with whatever may be in the older bam files? Can anyone explain whats going on?
And finally, in a related question, when I use the samtools view filter/require (-F/-f) arguments, when I use bit flags for multiple states, such as my read unmapped & mate unmapped above, does samtools combine them using AND or OR? I kind of expected the require argument to use AND, and the filter argument to use OR. But I haven't read anywhere exactly how it works.
Comment