Hi all, I have searched seqanswers and can't find anything directly addressing this, so hopefully someone can help me out a little bit.
I am working at getting coverage information using mpileup and the `-BQ0 -d10000000 -f ref.fa' switches mentioned in the samtools manual (http://samtools.sourceforge.net/mpileup.shtml). I first mapped a set of reads to a reference sequence using bwa, and now have a bam file containing the result of this mapping.
Using the r963 version of samtools, I run the command:
samtools mpileup --BQ0 -d10000000 -f in.fa mapping.bam > test.pileup
It appears to work, however the resulting file does not contain the consensus base, for example:
chromosome coord ref coverage read_bases base_qualities
gmx:581|cmr:1005 1 N 1 ^]G h
gmx:581|cmr:1005 2 N 15 G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 N 15 CCCCCCCCCCCCCCC hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 N 20 CCCTCCCCCCCCCCC^>C^]C^>C^]C^>C h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 N 24 GGGGGGGGGGGGGGGGGGGG^>G^]G^>G^]G fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 N 24 GGGGGGGGGGGGGGGGGGGGGGGG h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 N 24 TGTAGGGTTGTGGGGGTGTTTTTT hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 N 26 TTTTTTTTTTTTTTTTTTTTTTTT^]T^>T h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 N 27 GGGGGGGGGGGGGGGGGGGGGGG-1NGGG^]G hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 10 N 28 CCCCCCCCGGCCCCCCCCCCTC*CCCG^>C hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 N 29 TTAT-2NNTTTTTTTTTTTTTTTTTTTTTTTT^]T hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 12 N 30 TTT*TTTTTTTTTTTTTTTTTTTTTTTTT^]T hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 N 32 GTT*GGGTGGGGGGTTGGGGGGGGGGGGGG^]G^]G hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh
gmx:581|cmr:1005 14 N 33 AATAAAAAAAAAAAAAATAATATAAGAAAAAA^]A hBB_BBBBBKMBBQSBaBhWBhBhhBXhhhhhh
gmx:581|cmr:1005 15 N 34 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT^>T hBB]BBBBBZXBBVVBfBa[BhBhfBVhhhghhc
However, if I use the now deprecated "pileup" and samtools r862:
samtools pileup pileup -B -Q10 -d10000000 -c -f in.fa mapping.bam > pileup_out.pileup
I get consensus calls, as well as the rest of the information I am looking for, as well as a few extra lines (apparently for inserts?):
chromosome coord ref consensus consensus_qual SNP_quality max_mapping_qual coverage read_bases base_qualities
gmx:581|cmr:1005 1 G G 30 0 60 1 ^]. h
gmx:581|cmr:1005 2 G G 72 0 32 15 .^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>. c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 C C 72 0 32 15 ............... hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 C C 58 0 35 20 ...T...........^>.^].^>.^].^>. h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 G G 99 0 38 24 ....................^>.^].^>.^]. fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 G G 99 0 38 24 ........................ h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 T K 77 77 38 24 .G.AGGG..G.GGGGG.G...... hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 T T 105 0 38 26 ........................^].^>. h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 G G 108 0 39 27 .......................-1C...^]. hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 9 * */* 69 0 39 27 * -C 26 1 0 0 0
gmx:581|cmr:1005 10 C C 12 0 39 28 ........GG..........T.*...G^>. hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 T T 85 0 40 29 ..A.-2TG........................^]. hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 11 * */* 49 0 40 29 * -TG 28 1 0 0 0
gmx:581|cmr:1005 12 T T 114 0 41 30 ...*.........................^]. hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 G G 3 0 43 32 .TT*...T......TT..............^].^]. hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh
I haven't been able to figure out how to get the most recent version of samtools to report the reference and consensus bases, I assume that I'm missing some silly little thing, but has anyone gotten this to work in mpileup? If so, what is the trick?
thanks,
SamH
I am working at getting coverage information using mpileup and the `-BQ0 -d10000000 -f ref.fa' switches mentioned in the samtools manual (http://samtools.sourceforge.net/mpileup.shtml). I first mapped a set of reads to a reference sequence using bwa, and now have a bam file containing the result of this mapping.
Using the r963 version of samtools, I run the command:
samtools mpileup --BQ0 -d10000000 -f in.fa mapping.bam > test.pileup
It appears to work, however the resulting file does not contain the consensus base, for example:
chromosome coord ref coverage read_bases base_qualities
gmx:581|cmr:1005 1 N 1 ^]G h
gmx:581|cmr:1005 2 N 15 G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 N 15 CCCCCCCCCCCCCCC hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 N 20 CCCTCCCCCCCCCCC^>C^]C^>C^]C^>C h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 N 24 GGGGGGGGGGGGGGGGGGGG^>G^]G^>G^]G fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 N 24 GGGGGGGGGGGGGGGGGGGGGGGG h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 N 24 TGTAGGGTTGTGGGGGTGTTTTTT hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 N 26 TTTTTTTTTTTTTTTTTTTTTTTT^]T^>T h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 N 27 GGGGGGGGGGGGGGGGGGGGGGG-1NGGG^]G hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 10 N 28 CCCCCCCCGGCCCCCCCCCCTC*CCCG^>C hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 N 29 TTAT-2NNTTTTTTTTTTTTTTTTTTTTTTTT^]T hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 12 N 30 TTT*TTTTTTTTTTTTTTTTTTTTTTTTT^]T hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 N 32 GTT*GGGTGGGGGGTTGGGGGGGGGGGGGG^]G^]G hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh
gmx:581|cmr:1005 14 N 33 AATAAAAAAAAAAAAAATAATATAAGAAAAAA^]A hBB_BBBBBKMBBQSBaBhWBhBhhBXhhhhhh
gmx:581|cmr:1005 15 N 34 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT^>T hBB]BBBBBZXBBVVBfBa[BhBhfBVhhhghhc
However, if I use the now deprecated "pileup" and samtools r862:
samtools pileup pileup -B -Q10 -d10000000 -c -f in.fa mapping.bam > pileup_out.pileup
I get consensus calls, as well as the rest of the information I am looking for, as well as a few extra lines (apparently for inserts?):
chromosome coord ref consensus consensus_qual SNP_quality max_mapping_qual coverage read_bases base_qualities
gmx:581|cmr:1005 1 G G 30 0 60 1 ^]. h
gmx:581|cmr:1005 2 G G 72 0 32 15 .^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>. c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 C C 72 0 32 15 ............... hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 C C 58 0 35 20 ...T...........^>.^].^>.^].^>. h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 G G 99 0 38 24 ....................^>.^].^>.^]. fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 G G 99 0 38 24 ........................ h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 T K 77 77 38 24 .G.AGGG..G.GGGGG.G...... hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 T T 105 0 38 26 ........................^].^>. h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 G G 108 0 39 27 .......................-1C...^]. hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 9 * */* 69 0 39 27 * -C 26 1 0 0 0
gmx:581|cmr:1005 10 C C 12 0 39 28 ........GG..........T.*...G^>. hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 T T 85 0 40 29 ..A.-2TG........................^]. hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 11 * */* 49 0 40 29 * -TG 28 1 0 0 0
gmx:581|cmr:1005 12 T T 114 0 41 30 ...*.........................^]. hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 G G 3 0 43 32 .TT*...T......TT..............^].^]. hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh
I haven't been able to figure out how to get the most recent version of samtools to report the reference and consensus bases, I assume that I'm missing some silly little thing, but has anyone gotten this to work in mpileup? If so, what is the trick?
thanks,
SamH
Comment