SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
read group: GATK or BWA option? m_elena_bioinfo Bioinformatics 9 12-09-2012 09:53 AM
How to deal with no calls from GATK Unifiedgenotyper for indels audqf Bioinformatics 2 02-01-2012 02:53 PM
BWA:getting hits with given number of mismatches and indels Chandana Bioinformatics 0 01-11-2012 10:03 AM
Errors from Picard and GATK on a BWA paired-end BAM oiiio Bioinformatics 2 12-07-2011 07:50 PM
GATK: VCF file for Local realignment around indels jorge Bioinformatics 2 10-10-2011 11:15 PM

Reply
 
Thread Tools
Old 12-05-2011, 12:23 AM   #1
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default 6-99bp indels with BWA/GATK

Hi,

I am using BWA and GATK to detect mutations in BRCA1. The BRCA1 sequences have been Sanger validated and contain known mutations. I am achieving a fair degree of accuracy so far, successfully detecting 99% of SNPs and over 90% of Indels. The majority of false negatives are for Indels over 5 bp in size. These range from 6-99bp in length. Can anyone recommend what command line parameters/values could be used to get the aligner to pick up some of the larger indels?

Thanks in advance.
gavin.oliver is offline   Reply With Quote
Old 12-05-2011, 02:48 AM   #2
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

I am now getting all Indels up to 29bp in length. I achieved this by increasing the maximum number of permitted gap extensions with bwa aln -e 50.

I will continue to experiment in order to get the larger indels.
gavin.oliver is offline   Reply With Quote
Old 12-05-2011, 05:30 AM   #3
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Do you perform a base recalibration step with GATK before calling indels?
adaptivegenome is offline   Reply With Quote
Old 12-05-2011, 05:40 AM   #4
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Quote:
Originally Posted by genericforms View Post
Do you perform a base recalibration step with GATK before calling indels?
Indeed I do.
gavin.oliver is offline   Reply With Quote
Old 12-05-2011, 07:10 AM   #5
oiiio
Senior Member
 
Location: USA

Join Date: Jan 2011
Posts: 104
Default

I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.

Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high.

If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help.
oiiio is offline   Reply With Quote
Old 12-05-2011, 07:23 AM   #6
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Quote:
Originally Posted by oiiio View Post
I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.

Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high.

If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help.
I am pretty sure my commands are very standard. Nonetheless, you are welcome to have a look!

for file in *fastq; do bwa aln -e 50 -f ${file%%.fastq}.sai chr17hg19 ${file}; done

for file in *sai; do bwa samse chr17hg19 ${file} ${file%%.sai}.fastq > ${file%%.sai}.sam; done

for file in *bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/SortSam.jar I=${file} O=${file%%.bam}_sorted.bam SO=coordinate; done

for file in *_sorted.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/MarkDuplicates.jar I=${file} O=${file%%.bam}_ndup.bam M=metric TMP_DIR=./tmp REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=LENIENT; done

for file in *ndup.bam; do java -jar /home/goliver/ngs_software/picard-tools-1.53/AddOrReplaceReadGroups.jar I=${file} O=${file%%.bam}_rg.bam SO=coordinate ID=1 LB=Z PL=illumina PU=Z SM=Z; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/BuildBamIndex.jar I=${file} O=${file}.bai; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ../ref_chr17.hg19.fa -o ${file%%.bam}.intervals -I ${file}; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -I ${file} -R ../ref_chr17.hg19.fa -T IndelRealigner -o ${file%%.bam}_2.bam -targetIntervals ${file%%.bam}.intervals --known ../GATK/dbsnp_132.b37.vcf; done

for file in *_2.bam; do java -Xmx20g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -R ../ref_chr17.hg19.fa -knownSites ../GATK/dbsnp_132.b37.vcf -I ${file} -T CountCovariates -cov QualityScoreCovariate -cov DinucCovariate -cov ReadGroupCovariate -cov CycleCovariate -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina -nt 4; done

for file in *_2.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -l INFO -R ../ref_chr17.hg19.fa -T TableRecalibration -I ${file} -o ${file%%.bam}.final.bam -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina; done

for file in *final.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -I ${file} -R ../ref_chr17.hg19.fa -o ${file%%.bam}.vcf; done
gavin.oliver is offline   Reply With Quote
Old 12-05-2011, 08:00 PM   #7
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends
Jon_Keats is offline   Reply With Quote
Old 12-05-2011, 11:52 PM   #8
gavin.oliver
Senior Member
 
Location: uk

Join Date: Jan 2010
Posts: 110
Default

Quote:
Originally Posted by Jon_Keats View Post
Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends
This particular dataset is all single end. I am pretty certain the larger indels can still be detected though...
gavin.oliver is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:14 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.