Seqanswers Leaderboard Ad

**swbarnes2** · 01-09-2012, 09:30 AM

'bcftools view -bvcg' will only show you the variant locations. So you won't get sites where all the samples in the .bam are homozygous reference. 'bcftools view -bcg' should do the trick.

**jflowers** · 01-09-2012, 09:54 AM

bcftools view -bvcg will give a variants-only vcf. However, whether this is a good way of getting genotypes is a bit unclear, but may not be appropriate (depending on your application).

In this post (http://biostar.stackexchange.com/que...quality-scores), lh3 suggests that mpileup / bcftools is not really meant for calling genotypes, but instead making inferences about the data when genotypes are unknown. In fact, lh3 suggests that calling genotypes is a "flaw" of bcftools (see also the original paper, Bioinformatics. 27(21) 2987-2993).

This seems like a subtle, but important point.

**ashkot** · 01-10-2012, 12:10 PM

i think this is a really good point. what i am after is to identify the genotype at all genomic locations but it seems that mplieup is really meat to find variations.

also, i deleted the -v switch and i still cannot get certain genomic locations to show up in my VCF file. should i assume the absence of that location to be homozygous normal at that location?

**jflowers** · 01-10-2012, 01:31 PM

I have some of the same questions as you ashkot when it comes to getting a consensus genome, [especially in light of the "flaw" concerning genotypes inferred by bcftools mentioned by lh3 I mentioned in my previous post on this thread]. So, I'm not really sure I can be of much help.

The topic of generating a consensus has also been discussed here:

New consensus from VCF - SEQanswers

http://seqanswers.com/forums/showthread.php?t=9818

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I would defer to comments made in that thread since lh3 developed the methods in question. But, what's not clear to me is whether the "flaws" with using bcftools to get genotypes in the vcf (mentioned previously in this thread) lead to bad genotypes being propagated throughout the consensus. Or does the method of calling genotypes in the consensus using vcfutils.pl vcf2fq differ from the method used by bcftools to call genotypes in the vcf. Can someone more knowledgable chime in here?

Anyhow, generating a consensus is described here:

Multisample SNP Calling

http://samtools.sourceforge.net/mpileup.shtml

As far as missing sites in your vcf file, have you looked at the sites that are missing using samtools tview to see if there are reads spanning those sites?

**aggp11** · 01-11-2012, 07:02 AM

Originally posted by ashkot View Post

Hi there,

First, indexing the bam file using samtool index
Second, sorting the bam file using samtools sort eg2.bam eg2.sorted
Third, generating a bcf file using samtools mpileup -uf reference_genome_file eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf

Hi Ashwin,

I don't know if this would matter, but I have always sorted the bam file first and then created it's index.

The thing is, based on the steps that you have mentioned, your index file would be based on your original bam file, and since you are using the sorted bam file in your mpileup, this might be causing some issues.

Again this is just a thought, but you might want to try it once.

Thanks,
Praful

**swbarnes2** · 01-11-2012, 09:42 AM

Originally posted by aggp11 View Post

Hi Ashwin,
The thing is, based on the steps that you have mentioned, your index file would be based on your original bam file, and since you are using the sorted bam file in your mpileup, this might be causing some issues.

Again this is just a thought, but you might want to try it once.

Thanks,
Praful

You are right, but mpileup doesn't look at the index, so that's not the issue. The issue is that mpileup really isn't designed to do what the poster wants to do with it, so it's working correctly, but not in the off-label way that the poster hoped it would.

I'm not sure what tool would be more appropriate. What I think will work is making a .bed file of your genotype loci, and using intersectBed with that and your .bam. That'll filter out all the reads that don't cover your genotype loci, which I think will make the vcf smaller, and easier to manage. Then make the all-points vcf by leaving out the -v, then use intersectBed again with that .bed against the vcf. That should leave only the mpileup entries that hit your genotype loci, and since it's all points, the 0/0 entries shoud be there.

**aggp11** · 01-11-2012, 10:10 AM

Originally posted by swbarnes2 View Post

You are right, but mpileup doesn't look at the index, so that's not the issue. The issue is that mpileup really isn't designed to do what the poster wants to do with it, so it's working correctly, but not in the off-label way that the poster hoped it would.

I'm not sure what tool would be more appropriate. What I think will work is making a .bed file of your genotype loci, and using intersectBed with that and your .bam. That'll filter out all the reads that don't cover your genotype loci, which I think will make the vcf smaller, and easier to manage. Then make the all-points vcf by leaving out the -v, then use intersectBed again with that .bed against the vcf. That should leave only the mpileup entries that hit your genotype loci, and since it's all points, the 0/0 entries shoud be there.

Thanks for the information on mpileup. And I like your idea of using intersectBed here.

**swbarnes2** · 01-11-2012, 11:06 AM

Oh, and I'll add if you get rid of the -v, and a locus doesn't show up, I'm pretty srue that's an indicator that you have no reads there, or at least no reads of any quality.

**ashkot** · 01-12-2012, 05:22 PM

hi there, thanks for you input, i am going to give intersetBed a try and see what i get. I'll keep everybody informed about the outcome.

thanks again.

ashwin

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

SAMTools mpileup issue

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News