![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| read group: GATK or BWA option? | m_elena_bioinfo | Bioinformatics | 9 | 12-09-2012 09:53 AM |
| How to deal with no calls from GATK Unifiedgenotyper for indels | audqf | Bioinformatics | 2 | 02-01-2012 02:53 PM |
| BWA:getting hits with given number of mismatches and indels | Chandana | Bioinformatics | 0 | 01-11-2012 10:03 AM |
| Errors from Picard and GATK on a BWA paired-end BAM | oiiio | Bioinformatics | 2 | 12-07-2011 07:50 PM |
| GATK: VCF file for Local realignment around indels | jorge | Bioinformatics | 2 | 10-10-2011 11:15 PM |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
Hi,
I am using BWA and GATK to detect mutations in BRCA1. The BRCA1 sequences have been Sanger validated and contain known mutations. I am achieving a fair degree of accuracy so far, successfully detecting 99% of SNPs and over 90% of Indels. The majority of false negatives are for Indels over 5 bp in size. These range from 6-99bp in length. Can anyone recommend what command line parameters/values could be used to get the aligner to pick up some of the larger indels? Thanks in advance. |
|
|
|
|
|
#2 |
|
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
I am now getting all Indels up to 29bp in length. I achieved this by increasing the maximum number of permitted gap extensions with bwa aln -e 50.
I will continue to experiment in order to get the larger indels. |
|
|
|
|
|
#3 |
|
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
Do you perform a base recalibration step with GATK before calling indels?
|
|
|
|
|
|
#4 |
|
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
|
|
|
|
|
|
#5 |
|
Member
Location: USA Join Date: Jan 2011
Posts: 95
|
I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.
Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high. If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help. |
|
|
|
|
|
#6 | |
|
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
Quote:
for file in *fastq; do bwa aln -e 50 -f ${file%%.fastq}.sai chr17hg19 ${file}; done for file in *sai; do bwa samse chr17hg19 ${file} ${file%%.sai}.fastq > ${file%%.sai}.sam; done for file in *bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/SortSam.jar I=${file} O=${file%%.bam}_sorted.bam SO=coordinate; done for file in *_sorted.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/MarkDuplicates.jar I=${file} O=${file%%.bam}_ndup.bam M=metric TMP_DIR=./tmp REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=LENIENT; done for file in *ndup.bam; do java -jar /home/goliver/ngs_software/picard-tools-1.53/AddOrReplaceReadGroups.jar I=${file} O=${file%%.bam}_rg.bam SO=coordinate ID=1 LB=Z PL=illumina PU=Z SM=Z; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/BuildBamIndex.jar I=${file} O=${file}.bai; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ../ref_chr17.hg19.fa -o ${file%%.bam}.intervals -I ${file}; done for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -I ${file} -R ../ref_chr17.hg19.fa -T IndelRealigner -o ${file%%.bam}_2.bam -targetIntervals ${file%%.bam}.intervals --known ../GATK/dbsnp_132.b37.vcf; done for file in *_2.bam; do java -Xmx20g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -R ../ref_chr17.hg19.fa -knownSites ../GATK/dbsnp_132.b37.vcf -I ${file} -T CountCovariates -cov QualityScoreCovariate -cov DinucCovariate -cov ReadGroupCovariate -cov CycleCovariate -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina -nt 4; done for file in *_2.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -l INFO -R ../ref_chr17.hg19.fa -T TableRecalibration -I ${file} -o ${file%%.bam}.final.bam -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina; done for file in *final.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -I ${file} -R ../ref_chr17.hg19.fa -o ${file%%.bam}.vcf; done |
|
|
|
|
|
|
#7 |
|
Senior Member
Location: Phoenix, AZ Join Date: Mar 2010
Posts: 251
|
Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends
|
|
|
|
|
|
#8 |
|
Senior Member
Location: uk Join Date: Jan 2010
Posts: 110
|
This particular dataset is all single end. I am pretty certain the larger indels can still be detected though...
|
|
|
|
![]() |
| Thread Tools | |
|
|