![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
read group: GATK or BWA option? | m_elena_bioinfo | Bioinformatics | 9 | 12-09-2012 10:53 AM |
How to deal with no calls from GATK Unifiedgenotyper for indels | audqf | Bioinformatics | 2 | 02-01-2012 03:53 PM |
BWA:getting hits with given number of mismatches and indels | Chandana | Bioinformatics | 0 | 01-11-2012 11:03 AM |
Errors from Picard and GATK on a BWA paired-end BAM | oiiio | Bioinformatics | 2 | 12-07-2011 08:50 PM |
GATK: VCF file for Local realignment around indels | jorge | Bioinformatics | 2 | 10-11-2011 12:15 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
![]()
Hi,
I am using BWA and GATK to detect mutations in BRCA1. The BRCA1 sequences have been Sanger validated and contain known mutations. I am achieving a fair degree of accuracy so far, successfully detecting 99% of SNPs and over 90% of Indels. The majority of false negatives are for Indels over 5 bp in size. These range from 6-99bp in length. Can anyone recommend what command line parameters/values could be used to get the aligner to pick up some of the larger indels? Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
![]()
I am now getting all Indels up to 29bp in length. I achieved this by increasing the maximum number of permitted gap extensions with bwa aln -e 50.
I will continue to experiment in order to get the larger indels. |
![]() |
![]() |
![]() |
#3 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
Do you perform a base recalibration step with GATK before calling indels?
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: USA Join Date: Jan 2011
Posts: 105
|
![]()
I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.
Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high. If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help. |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
![]() Quote:
for file in *fastq; do bwa aln -e 50 -f ${file%%.fastq}.sai chr17hg19 ${file}; done for file in *sai; do bwa samse chr17hg19 ${file} ${file%%.sai}.fastq > ${file%%.sai}.sam; done for file in *bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/SortSam.jar I=${file} O=${file%%.bam}_sorted.bam SO=coordinate; done for file in *_sorted.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/MarkDuplicates.jar I=${file} O=${file%%.bam}_ndup.bam M=metric TMP_DIR=./tmp REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=LENIENT; done for file in *ndup.bam; do java -jar /home/goliver/ngs_software/picard-tools-1.53/AddOrReplaceReadGroups.jar I=${file} O=${file%%.bam}_rg.bam SO=coordinate ID=1 LB=Z PL=illumina PU=Z SM=Z; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/BuildBamIndex.jar I=${file} O=${file}.bai; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ../ref_chr17.hg19.fa -o ${file%%.bam}.intervals -I ${file}; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -I ${file} -R ../ref_chr17.hg19.fa -T IndelRealigner -o ${file%%.bam}_2.bam -targetIntervals ${file%%.bam}.intervals --known ../GATK/dbsnp_132.b37.vcf; done for file in *_2.bam; do java -Xmx20g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -R ../ref_chr17.hg19.fa -knownSites ../GATK/dbsnp_132.b37.vcf -I ${file} -T CountCovariates -cov QualityScoreCovariate -cov DinucCovariate -cov ReadGroupCovariate -cov CycleCovariate -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina -nt 4; done for file in *_2.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -l INFO -R ../ref_chr17.hg19.fa -T TableRecalibration -I ${file} -o ${file%%.bam}.final.bam -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina; done for file in *final.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -I ${file} -R ../ref_chr17.hg19.fa -o ${file%%.bam}.vcf; done |
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 279
|
![]()
Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
![]()
This particular dataset is all single end. I am pretty certain the larger indels can still be detected though...
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|