Seqanswers Leaderboard Ad

**Jose Blanca** · 09-07-2010, 11:00 PM

We had the same problem in a plant transcriptome, so we created a tool to do just that. It takes the individual bam files, it merges them and then it looks for the SNP with respect to the reference or to between the reads. You could do something like that.
Alternatively, you could give a spin to our tool, it might be useful to you, it is named ngs_backbone.

**KNS** · 09-08-2010, 12:36 AM

Dear Jose,
this is brilliant. I will gove it a shot - no need to reinvent the wheel. Big thanks for your quick reply!

**malcook** · 09-08-2010, 06:49 AM

consolidating pileup across multiple samples

Here are steps for using samtools to creating pileup output for three samples holding the sample allele at each site

Code:

# assuming for each of three read groups (samples) named s1,s2,s3 that you have one file of reads aligned to genome in $RefseqFasta

# use samtools to produce your sorted bam files {s1,s2,s3}.s.bam
...

# build the pileups:
samtools pileup -vcf $RefseqFasta s1.s.bam > s1.pileup.tab
samtools pileup -vcf $RefseqFasta s2.s.bam > s2.pileup.tab
samtools pileup -vcf $RefseqFasta s3.s.bam > s3.pileup.tab

# filter them using samtools.pl (or varfilter.py) (or awk)
samtools.pl varFilter $varFilterOptions s1.pileup.tab > s1.pileup.filtered.tab
samtools.pl varFilter $varFilterOptions s2.pileup.tab > s2.pileup.filtered.tab
samtools.pl varFilter $varFilterOptions s3.pileup.tab > s3.pileup.filtered.tab

# produce consolidated "list of sites" (c.f. [url]http://samtools.sourceforge.net/samtools.shtml[/url]) holding at least one filtered pileup
# note: the s{1,2,3} relies upon your shell being bash.
cut -f 1-2  s{1,2,3}.pileup.filtered.tab | sort  --key=1,1 --key=2,2n | uniq > s1-3.sites.tab

# extract the pileup for each sample at each site
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s1.s.bam > s1.pileup.atsites.tab
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s2.s.bam > s2.pileup.atsites.tab
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s3.s.bam > s3.pileup.atsites.tab

# Use whatever tools you have to compare s{1,2,3}.pileup.atsites.tab
# They all are in the same order and have the same values in the 
# first three columns (chromosome, position, reference sequence).
# Put them in a database and write join commands?  
# Use the unix `paste` command to see them side-by-side?
# Load them into "R" and compare them there?
# Use ensembl'e snp_effect_predictor to predict consequences?
# ...

I typically craft the above logic using Makefiles....

Cheers,

Malcolm

**lh3** · 09-08-2010, 10:43 AM

Use mpileup

**sdavis** · 10-29-2010, 04:21 AM

Originally posted by lh3 View Post

Use mpileup

Using mpileup does a fantastic job of simplifying the workflow. However, the per-sample information (strand bias, sequence depth, etc.) are largely lost. Is there a way to maintain that sample-level information (DP, SB, and potentially others) so that more interesting filtering can be done on individual samples? In the end, we are typically interested in having very high quality variant calls per sample and want to guard against errors at that level and not so much at the level of the population.

**lh3** · 10-29-2010, 04:26 AM

For now, call candidates first and then run mpileup on the candidates to get the pileup format (i.e., without the "-g" option). From pileup, you will know DP. As to strand bias, you may perform an exact test based on the pileup string, like what bcftools is doing.

Or when you get the candidate lists, collect information individually by running mpileup on each individual.

EDIT: in future, pileup lines may be folded into VCF, but this will not be soon.

**lh3** · 10-29-2010, 08:18 AM

I have just committed a new revision by which you can compute per-sample DP and SP (phred-scaled strand bias P-value), via the -D and -S options of mpileup. They are not the default behavior. In addition, -S may have performance overhead as Fisher's exact test really takes time.

EDIT: for now, I think DP and SP are the most informative ones.

**Jose Blanca** · 10-31-2010, 01:04 AM

We keep, sample and library information as well as a mean quality per allele. That's why we use our own snp caller and not pileup. If pileup kept that information we would have use it.

**lh3** · 10-31-2010, 06:12 AM

mpileup keeps per sample information. this is what it is designed for. the quality of each base is there, so you can also compute a mean. on the other hand, I am not sure how much library information may help. I guess we mostly use one library per sample now.

nonetheless, for deep resequencing of a couple of individuals, the model for snp calling is not important. any reasonable snp callers will give sensible results. no need to be bound to samtools. just you should apply the baq thing before snp calling.

**aslihan** · 12-02-2011, 09:40 AM

Hi Ih3

Could you write the command for it? Where I put -D and -S?

samtools mpileup -EuDSf *.fa s1.sorted.bam s2.sorted.bam
bcftools view -bvcg
bcftools/vcfutils.pl varFilter -D100

Thanks

I have just committed a new revision by which you can compute per-sample DP and SP (phred-scaled strand bias P-value), via the -D and -S options of mpileup. They are not the default behavior. In addition, -S may have performance overhead as Fisher's exact test really takes time.

EDIT: for now, I think DP and SP are the most informative ones.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Compile multi-individual SNP tables from bowtie/samtools mappings

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News