SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mpileup/BCFtools pipeline not picking up indels (suggestions please) cam.jack Bioinformatics 7 05-17-2013 01:05 PM
Pipeline to simulate short reads and call indels pmaugeri Bioinformatics 17 03-01-2012 03:45 PM
samtools indels Robby Bioinformatics 3 11-08-2011 07:02 AM
samtools tview produces "Floating point exception" on big file? pmaugeri Bioinformatics 2 10-28-2011 08:06 AM
indels from samtools coonya Bioinformatics 2 06-06-2011 04:12 PM

Reply
 
Thread Tools
Old 06-15-2011, 09:51 AM   #1
Hkins552
Member
 
Location: Baltimore MD

Join Date: Jun 2011
Posts: 18
Question Samtools pipeline produces many indels

Hi All,
I'm new to the community and just beginning to understand many of the programs used for SNP selection. In my current exome seq project, I am looking for homozygous SNPs in a consanguineous pedigree with multiple affected siblings. My pipeline is as follows:

previously aligned input.bam files were obtained from sequencing facility

$ samtools sort input.bam input.sorted.bam

$ samtools rmdup s input.sorted.bam input.sorted.rmdup.bam

$ samtools index input.sorted.rmdup.bam

$ samtools faidx HG19.fa

$ samtools mpileup uf HG19.fa input.sorted.rmdup.bam > variants.raw

$ bcftools view bvcg variants.raw > variants.raw.bcf

$ bcftools view variants.raw.bcf | vcfutils.pl varFilter d 3 D 1000 G 20 > variants.flt.vcf

Afer this, I discard common variants using the 1000 genomes and/or dbSNP
I then grep for homozygous variants, variants shared among affected individuals and found in the heterozygous state in the unaffected parents.

This protocol seems to be generating a large number of INDELs (>50%) compared to SNPs. Is this unusual? Should I have more stringent filters in place?

Also, once I have obtained a final list of variants, what programs are recommended currently for functional analysis?
Hkins552 is offline   Reply With Quote
Old 06-15-2011, 10:13 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Try

samtools mpileup –Buf HG19.fa input.sorted.rmdup.bam > variants.raw

In my experience, the BAQ calculations eat SNPs. -B will turn those calculations off.

If you compare the two pileups with and without the BAQ, what you may see is that the BAQ calculations are drastically dropping the quality scores of real SNPs, causing the SNP caller to ignore them, due to low quality. I believe this is an attempt to reduce false positives due to indels.
swbarnes2 is offline   Reply With Quote
Old 06-17-2011, 04:30 AM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Finding indels is much harder, no one so far really understands that. Samtools is calling many false indels, and the existing database is lacking many true indels.

As to BAQ, if you really care about sensitivity, disable it. For general purposes, use it. BAQ trades a couple percent sensitivity for hugely improved specificity.
lh3 is offline   Reply With Quote
Reply

Tags
analysis, filtering, functional, indels, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO