Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
mpileup multiple samples - individual DP4/AF1 values catriona Bioinformatics 3 07-22-2014 03:49 PM
Cufflinks (Cuffdiff) v0.9.3 multiple samples keebs42 Bioinformatics 4 11-06-2013 02:20 AM
SNP base calling for multiple samples shuang Bioinformatics 2 09-07-2011 02:06 PM
how to analyze multiple samples using tophat nunu_ping RNA Sequencing 5 03-06-2011 05:57 AM
mpileup on haploid samples agc Bioinformatics 1 11-24-2010 04:14 AM

Thread Tools
Old 06-16-2011, 01:22 PM   #1
Junior Member
Location: San Francisco

Join Date: Jun 2011
Posts: 6
Unhappy Samtools mpileup on multiple samples

I have multiple samples from which I would like to call SNP. I've been trying either to call them separately or all at once

samtools mpileup -uf hg19.fa sample1.bam | bcftools view -bvcg - > out.bcf
samtools mpileup -uf hg19.fa sample1.bam sample2.bam | bcftools view -bvcg - > out.bcf

In the first case, I can get coverage (DP) at each position where a SNP is encountered, but only for the sample that has the SNP. That is, I would like to get the coverage at that position in all samples, but I could not.

In the second case, for SNP I know are present in both samples, I found only 1 value for DP, which I assume is the combined coverage of both samples (see below).
Is there any way I can get DP and DP4 for all samples even if the SNP is only in one?

chr1 15211 . T G 51 . DP=16;AF1=1;CI95=0.8333,1;DP4=3,0,10,3;MQ=30;FQ=-37.5;PV4=1,1.3e-05,1,1 GT:PL:GQ 1/1:46,9,0:21 1/1:23,9,0:21

Second question, I've been having a hard time trying to annotate the SNP calls in the vcf file (produced by bcftools view) with its rsID. I tried vcftools

>bgzip UCSC_SNP_v132
>zcat input.vcf.gz | vcftools_0.1.5/bin/fill-rsIDs -r UCSC_SNP_v132.gz | tabix-0.2.5/bgzip -c > out

but kept getting error below which suggests to me that I'm not using the right format for the input dbSNP. I used the BED format downloaded from UCSC. What am I doing wrong here?

tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory at vcftools_0.1.5/bin/fill-rsIDs line 20
main::error('tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory') called at vcftools_0.1.5/bin/fill-rsIDs line 93
main::fill_rsids('HASH(0x169602a0)', './UCSC_SNP_v132.gz') called at vcftools_0.1.5/bin/fill-rsIDs line 11

cristae8 is offline   Reply With Quote
Old 06-17-2011, 02:10 AM   #2
Location: Singapore

Join Date: Oct 2009
Posts: 44

Regarding your first question: The -D option keeps per sample read depth with mpileup.
DerSeb is offline   Reply With Quote
Old 06-17-2011, 04:22 PM   #3
Junior Member
Location: San Francisco

Join Date: Jun 2011
Posts: 6

Thanks, DerSeb!

Regarding my 2nd question, I realized that error was because tabix (called inside fill-rsIDs) was not on my path.
cristae8 is offline   Reply With Quote
Old 11-21-2011, 11:24 AM   #4
Location: USA

Join Date: Jun 2011
Posts: 23
Default how to annotate VCF files created by bcftools view?

I have 1 big vcf files which compares 12 samples by using samtools mpileup. And I would like annotate them? my reference genome is Hg19. How can annotate VCf files ?

Could you write commands and set up and where to download vcftools?

Thanks in advance.

Are you using only vcftools? What about ANNOVAR?
aslihan is offline   Reply With Quote
Old 11-21-2011, 02:24 PM   #5
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

No, here's no way to run multiple sample .bams together and get the DP4 of each sample. I Which is a pity, since that would be useful. What you get instead are other stats, which can give you an idea of how good the SNP is in each sample.
swbarnes2 is offline   Reply With Quote
Old 12-28-2011, 12:23 PM   #6
Compendia Bio
Location: Ann Arbor

Join Date: Oct 2010
Posts: 35

For annotating the VCF file you can use the Ensembl Variant Effect Predictor. It is very helpful and can also produce SIFT, PolyPhen and Condel consensus.

I would recommend using the Perl script with a pre-built cache.

Here is the command that I use to annotate on a Mac:
perl --species homo_sapiens -i SNP.vcf --format vcf -o SNP_effects.txt --sift b --polyphen b --condel b --gene --hgnc --canonical --regulatory --coding_only --no_intergenic --cache --compress gzcat --check_existing
Variant Effect Predictor can also be used to check the SNPs against the 1000 Genomes and HapMap projects but I was not able to get it running. It was always throwing an error.


Last edited by DineshCyanam; 12-28-2011 at 12:26 PM.
DineshCyanam is offline   Reply With Quote
Old 01-27-2012, 10:58 AM   #7
Location: USA

Join Date: Jun 2011
Posts: 23

Thank you. Sorry for my late. I will try them.
aslihan is offline   Reply With Quote

mpileup, vcftools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:20 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO