SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
what's difference between 'pileup' and 'mpileup' at samtools? dkrtndhkd Bioinformatics 2 02-01-2012 08:24 AM
samtools pileup output, base quality does not match original ones wwmm933 Bioinformatics 3 11-30-2011 09:52 PM
samtools pileup base quality swapna Genomic Resequencing 0 02-23-2011 03:36 AM
samtools pileup vs. mpileup doc2r Bioinformatics 0 01-17-2011 02:52 PM
samtools pileup coverage guavajuice Bioinformatics 4 05-27-2010 02:30 PM

Reply
 
Thread Tools
Old 05-04-2011, 10:07 AM   #1
SamH
Member
 
Location: Moscow, ID

Join Date: Sep 2010
Posts: 16
Default samtools pileup vs mpileup for computing per-base coverage

Hi all, I have searched seqanswers and can't find anything directly addressing this, so hopefully someone can help me out a little bit.

I am working at getting coverage information using mpileup and the `-BQ0 -d10000000 -f ref.fa' switches mentioned in the samtools manual (http://samtools.sourceforge.net/mpileup.shtml). I first mapped a set of reads to a reference sequence using bwa, and now have a bam file containing the result of this mapping.

Using the r963 version of samtools, I run the command:
samtools mpileup --BQ0 -d10000000 -f in.fa mapping.bam > test.pileup
It appears to work, however the resulting file does not contain the consensus base, for example:
chromosome coord ref coverage read_bases base_qualities
gmx:581|cmr:1005 1 N 1 ^]G h
gmx:581|cmr:1005 2 N 15 G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G^>G c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 N 15 CCCCCCCCCCCCCCC hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 N 20 CCCTCCCCCCCCCCC^>C^]C^>C^]C^>C h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 N 24 GGGGGGGGGGGGGGGGGGGG^>G^]G^>G^]G fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 N 24 GGGGGGGGGGGGGGGGGGGGGGGG h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 N 24 TGTAGGGTTGTGGGGGTGTTTTTT hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 N 26 TTTTTTTTTTTTTTTTTTTTTTTT^]T^>T h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 N 27 GGGGGGGGGGGGGGGGGGGGGGG-1NGGG^]G hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 10 N 28 CCCCCCCCGGCCCCCCCCCCTC*CCCG^>C hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 N 29 TTAT-2NNTTTTTTTTTTTTTTTTTTTTTTTT^]T hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 12 N 30 TTT*TTTTTTTTTTTTTTTTTTTTTTTTT^]T hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 N 32 GTT*GGGTGGGGGGTTGGGGGGGGGGGGGG^]G^]G hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh
gmx:581|cmr:1005 14 N 33 AATAAAAAAAAAAAAAATAATATAAGAAAAAA^]A hBB_BBBBBKMBBQSBaBhWBhBhhBXhhhhhh
gmx:581|cmr:1005 15 N 34 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT^>T hBB]BBBBBZXBBVVBfBa[BhBhfBVhhhghhc


However, if I use the now deprecated "pileup" and samtools r862:
samtools pileup pileup -B -Q10 -d10000000 -c -f in.fa mapping.bam > pileup_out.pileup
I get consensus calls, as well as the rest of the information I am looking for, as well as a few extra lines (apparently for inserts?):
chromosome coord ref consensus consensus_qual SNP_quality max_mapping_qual coverage read_bases base_qualities
gmx:581|cmr:1005 1 G G 30 0 60 1 ^]. h
gmx:581|cmr:1005 2 G G 72 0 32 15 .^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>.^>. c^B`BbBaBXTBB[X
gmx:581|cmr:1005 3 C C 72 0 32 15 ............... hLBaBRBRB]OBBWQ
gmx:581|cmr:1005 4 C C 58 0 35 20 ...T...........^>.^].^>.^].^>. h^B`B^BaB`ZBB`XBfBh_
gmx:581|cmr:1005 5 G G 99 0 38 24 ....................^>.^].^>.^]. fXB`BWBYBZQBBZSBdBh^BhBh
gmx:581|cmr:1005 6 G G 99 0 38 24 ........................ h^B`B`BaBaWBB`\BhBg^BhBh
gmx:581|cmr:1005 7 T K 77 77 38 24 .G.AGGG..G.GGGGG.G...... hKB_BKBBBKKBBQJBhBeZBhBh
gmx:581|cmr:1005 8 T T 105 0 38 26 ........................^].^>. h^BaB`BBB\WBB^ZBhBh^BhBhfB
gmx:581|cmr:1005 9 G G 108 0 39 27 .......................-1C...^]. hBB]BPBBBQQBBGTBgBhaBgBghB`
gmx:581|cmr:1005 9 * */* 69 0 39 27 * -C 26 1 0 0 0
gmx:581|cmr:1005 10 C C 12 0 39 28 ........GG..........T.*...G^>. hBB_BIBBBHHBBGFBaBh[BhBhhBNh
gmx:581|cmr:1005 11 T T 85 0 40 29 ..A.-2TG........................^]. hBBRBZBBBXXBBMVBhBhcBhBhhBThg
gmx:581|cmr:1005 11 * */* 49 0 40 29 * -TG 28 1 0 0 0
gmx:581|cmr:1005 12 T T 114 0 41 30 ...*.........................^]. hBB_B_BBB_ZBB^ZBhBefBhBhhBThhh
gmx:581|cmr:1005 13 G G 3 0 43 32 .TT*...T......TT..............^].^]. hBB_BBBBBSJBBIJBhB\[BhBhhBUhhhfh

I haven't been able to figure out how to get the most recent version of samtools to report the reference and consensus bases, I assume that I'm missing some silly little thing, but has anyone gotten this to work in mpileup? If so, what is the trick?

thanks,

SamH
SamH is offline   Reply With Quote
Old 05-04-2011, 10:42 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

mpileup doesn't do consensus calls.

mpileup will show you reference bases. The all-N thing usually means that there's some disconnect between the names of the reference sequences in your reference fasta, and the names in the .bam file, and the name of the reference index. I'm kind of surprised that it worked in pileup and not mpileup. Maybe mpileup doesn't like the | symbol, but pileup tolerates it?
swbarnes2 is offline   Reply With Quote
Old 05-05-2011, 11:01 AM   #3
SamH
Member
 
Location: Moscow, ID

Join Date: Sep 2010
Posts: 16
Default

Quote:
Originally Posted by swbarnes2 View Post
mpileup doesn't do consensus calls.

mpileup will show you reference bases. The all-N thing usually means that there's some disconnect between the names of the reference sequences in your reference fasta, and the names in the .bam file, and the name of the reference index. I'm kind of surprised that it worked in pileup and not mpileup. Maybe mpileup doesn't like the | symbol, but pileup tolerates it?
Hi, thanks for the tip! I re-ran everything after substituting a more simple name into the reference fasta file, and it worked as expected.

SamH
SamH is offline   Reply With Quote
Old 05-05-2011, 11:53 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Sounds like a bug needs to be reported to the sam-tools mailing list...
maubp is offline   Reply With Quote
Old 05-06-2011, 03:29 PM   #5
SamH
Member
 
Location: Moscow, ID

Join Date: Sep 2010
Posts: 16
Default

Quote:
Originally Posted by maubp View Post
Sounds like a bug needs to be reported to the sam-tools mailing list...
Thanks for the suggestions!

After digging around a little more , I've figure out that bwa truncates the fasta reference ID at the first space, samtools mpileup doesn't handle this well and introduces "N"s for all reference positions, samtools pileup has different behavior and produces the proper reference base.

Of course, "pileup" isn't included in the latest version of samtools.

I posted this to the samtools-help mailing list, hopefully it gets seen by the right people.

SamH
SamH is offline   Reply With Quote
Old 08-10-2011, 02:14 PM   #6
adansonia
Junior Member
 
Location: california

Join Date: Jul 2011
Posts: 2
Default mpileup problem

I have had the same problem using a .bam file I obtained after mapping reads also with bwa. The option pileup works but with mpileup I don't get the reference bases, only N in all the positions. Exactly the same case than the one reported by samH.
adansonia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO