SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compile samtools on Windows so it doesn't need Cygwin imilne Bioinformatics 3 04-05-2016 05:31 AM
Multi-read problem with bowtie and cufflinks fpauler Bioinformatics 3 02-15-2012 11:46 AM
samtools mpileup and bcftools command lines to compare individual differences? aslihan Bioinformatics 1 11-17-2011 02:20 PM
Large-Scale, Multi-Sample SNP Analysis Video DNASTAR Vendor Forum 0 10-26-2011 07:11 AM
Removing multi-matched sequences in Bowtie gwilson Bioinformatics 8 01-11-2010 11:58 PM

Reply
 
Thread Tools
Old 09-07-2010, 06:07 AM   #1
KNS
Member
 
Location: CH

Join Date: Aug 2010
Posts: 17
Default Compile multi-individual SNP tables from bowtie/samtools mappings

I am sure someone must have solved this before, and I wonder if there is not an obviously easy way to do this:

I have GAIIx-RAD data of bowtie-reference-mapped individuals. The resulting individual SNP data (produced with samtools in the form of pileups) is easily merged in a single table, but one major problem remains:

The samtools pileup only reports on SNPs which are different wrt the reference sequence, but not on those SNPs which happen to be identical to the reference (i.e. those which show differences in other individuals).

Is there any easy way of discriminating the absence of data from DNA-sequence identity of any given SNP with the reference sequence?

Any pointers most welcome!
KNS is offline   Reply With Quote
Old 09-07-2010, 11:00 PM   #2
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

We had the same problem in a plant transcriptome, so we created a tool to do just that. It takes the individual bam files, it merges them and then it looks for the SNP with respect to the reference or to between the reads. You could do something like that.
Alternatively, you could give a spin to our tool, it might be useful to you, it is named ngs_backbone.
Jose Blanca is offline   Reply With Quote
Old 09-08-2010, 12:36 AM   #3
KNS
Member
 
Location: CH

Join Date: Aug 2010
Posts: 17
Default

Dear Jose,
this is brilliant. I will gove it a shot - no need to reinvent the wheel. Big thanks for your quick reply!
KNS is offline   Reply With Quote
Old 09-08-2010, 06:49 AM   #4
malcook
Member
 
Location: 66206

Join Date: Sep 2009
Posts: 23
Default consolidating pileup across multiple samples

Here are steps for using samtools to creating pileup output for three samples holding the sample allele at each site
Code:
# assuming for each of three read groups (samples) named s1,s2,s3 that you have one file of reads aligned to genome in $RefseqFasta

# use samtools to produce your sorted bam files {s1,s2,s3}.s.bam
...

# build the pileups:
samtools pileup -vcf $RefseqFasta s1.s.bam > s1.pileup.tab
samtools pileup -vcf $RefseqFasta s2.s.bam > s2.pileup.tab
samtools pileup -vcf $RefseqFasta s3.s.bam > s3.pileup.tab

# filter them using samtools.pl (or varfilter.py) (or awk)
samtools.pl varFilter $varFilterOptions s1.pileup.tab > s1.pileup.filtered.tab
samtools.pl varFilter $varFilterOptions s2.pileup.tab > s2.pileup.filtered.tab
samtools.pl varFilter $varFilterOptions s3.pileup.tab > s3.pileup.filtered.tab

# produce consolidated "list of sites" (c.f. http://samtools.sourceforge.net/samtools.shtml) holding at least one filtered pileup
# note: the s{1,2,3} relies upon your shell being bash.
cut -f 1-2  s{1,2,3}.pileup.filtered.tab | sort  --key=1,1 --key=2,2n | uniq > s1-3.sites.tab

# extract the pileup for each sample at each site
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s1.s.bam > s1.pileup.atsites.tab
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s2.s.bam > s2.pileup.atsites.tab
samtools pileup -l s1-3.sites.tab -c -f $RefseqFasta s3.s.bam > s3.pileup.atsites.tab

# Use whatever tools you have to compare s{1,2,3}.pileup.atsites.tab
# They all are in the same order and have the same values in the 
# first three columns (chromosome, position, reference sequence).
# Put them in a database and write join commands?  
# Use the unix `paste` command to see them side-by-side?
# Load them into "R" and compare them there?
# Use ensembl'e snp_effect_predictor to predict consequences?
# ...
I typically craft the above logic using Makefiles....

Cheers,

Malcolm
malcook is offline   Reply With Quote
Old 09-08-2010, 10:43 AM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Use mpileup
lh3 is offline   Reply With Quote
Old 10-29-2010, 04:21 AM   #6
sdavis
Member
 
Location: Maryland

Join Date: Jan 2010
Posts: 14
Default

Quote:
Originally Posted by lh3 View Post
Use mpileup
Using mpileup does a fantastic job of simplifying the workflow. However, the per-sample information (strand bias, sequence depth, etc.) are largely lost. Is there a way to maintain that sample-level information (DP, SB, and potentially others) so that more interesting filtering can be done on individual samples? In the end, we are typically interested in having very high quality variant calls per sample and want to guard against errors at that level and not so much at the level of the population.
sdavis is offline   Reply With Quote
Old 10-29-2010, 04:26 AM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

For now, call candidates first and then run mpileup on the candidates to get the pileup format (i.e., without the "-g" option). From pileup, you will know DP. As to strand bias, you may perform an exact test based on the pileup string, like what bcftools is doing.

Or when you get the candidate lists, collect information individually by running mpileup on each individual.

EDIT: in future, pileup lines may be folded into VCF, but this will not be soon.

Last edited by lh3; 10-29-2010 at 04:35 AM.
lh3 is offline   Reply With Quote
Old 10-29-2010, 08:18 AM   #8
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I have just committed a new revision by which you can compute per-sample DP and SP (phred-scaled strand bias P-value), via the -D and -S options of mpileup. They are not the default behavior. In addition, -S may have performance overhead as Fisher's exact test really takes time.

EDIT: for now, I think DP and SP are the most informative ones.

Last edited by lh3; 10-29-2010 at 08:21 AM.
lh3 is offline   Reply With Quote
Old 10-31-2010, 01:04 AM   #9
Jose Blanca
Member
 
Location: Valencia, Spain

Join Date: Aug 2009
Posts: 70
Default

We keep, sample and library information as well as a mean quality per allele. That's why we use our own snp caller and not pileup. If pileup kept that information we would have use it.
Jose Blanca is offline   Reply With Quote
Old 10-31-2010, 06:12 AM   #10
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

mpileup keeps per sample information. this is what it is designed for. the quality of each base is there, so you can also compute a mean. on the other hand, I am not sure how much library information may help. I guess we mostly use one library per sample now.

nonetheless, for deep resequencing of a couple of individuals, the model for snp calling is not important. any reasonable snp callers will give sensible results. no need to be bound to samtools. just you should apply the baq thing before snp calling.
lh3 is offline   Reply With Quote
Old 12-02-2011, 08:40 AM   #11
aslihan
Member
 
Location: USA

Join Date: Jun 2011
Posts: 23
Default

Hi Ih3

Could you write the command for it? Where I put -D and -S?

samtools mpileup -EuDSf *.fa s1.sorted.bam s2.sorted.bam
bcftools view -bvcg
bcftools/vcfutils.pl varFilter -D100

Thanks


I have just committed a new revision by which you can compute per-sample DP and SP (phred-scaled strand bias P-value), via the -D and -S options of mpileup. They are not the default behavior. In addition, -S may have performance overhead as Fisher's exact test really takes time.

EDIT: for now, I think DP and SP are the most informative ones.
aslihan is offline   Reply With Quote
Reply

Tags
bowtie, reference mapping, results table, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO