Unconfigured Ad

**gavin.oliver** · 12-05-2011, 03:48 AM

I am now getting all Indels up to 29bp in length. I achieved this by increasing the maximum number of permitted gap extensions with bwa aln -e 50.

I will continue to experiment in order to get the larger indels.

**adaptivegenome** · 12-05-2011, 06:30 AM

Do you perform a base recalibration step with GATK before calling indels?

**gavin.oliver** · 12-05-2011, 06:40 AM

Originally posted by genericforms View Post

Do you perform a base recalibration step with GATK before calling indels?

Indeed I do.

**oiiio** · 12-05-2011, 08:10 AM

I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.

Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high.

If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help.

**gavin.oliver** · 12-05-2011, 08:23 AM

Originally posted by oiiio View Post

I have been trying to call indels with GATK UnifiedGenotyper from BWA-mapped BAMs for some time now, but with no success.

Did you have to use anything outside of the default parameters with UnifiedGenotyper or COuntCovariates/TableRecalibration? Others with this problem have found that it could be sequencing error rates in the sample were too high.

If you dont mind, could you post a couple command lines from your pipeline? I'm particularly interested in your UnifiedGenotyper and base recalibration commands. It would be an immense help.

I am pretty sure my commands are very standard. Nonetheless, you are welcome to have a look!

for file in *fastq; do bwa aln -e 50 -f ${file%%.fastq}.sai chr17hg19 ${file}; done

for file in *sai; do bwa samse chr17hg19 ${file} ${file%%.sai}.fastq > ${file%%.sai}.sam; done

for file in *bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/SortSam.jar I=${file} O=${file%%.bam}_sorted.bam SO=coordinate; done

for file in *_sorted.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/MarkDuplicates.jar I=${file} O=${file%%.bam}_ndup.bam M=metric TMP_DIR=./tmp REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=LENIENT; done

for file in *ndup.bam; do java -jar /home/goliver/ngs_software/picard-tools-1.53/AddOrReplaceReadGroups.jar I=${file} O=${file%%.bam}_rg.bam SO=coordinate ID=1 LB=Z PL=illumina PU=Z SM=Z; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/picard-tools-1.53/BuildBamIndex.jar I=${file} O=${file}.bai; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ../ref_chr17.hg19.fa -o ${file%%.bam}.intervals -I ${file}; done

for file in *rg.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -I ${file} -R ../ref_chr17.hg19.fa -T IndelRealigner -o ${file%%.bam}_2.bam -targetIntervals ${file%%.bam}.intervals --known ../GATK/dbsnp_132.b37.vcf; done

for file in *_2.bam; do java -Xmx20g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -R ../ref_chr17.hg19.fa -knownSites ../GATK/dbsnp_132.b37.vcf -I ${file} -T CountCovariates -cov QualityScoreCovariate -cov DinucCovariate -cov ReadGroupCovariate -cov CycleCovariate -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina -nt 4; done

for file in *_2.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -l INFO -R ../ref_chr17.hg19.fa -T TableRecalibration -I ${file} -o ${file%%.bam}.final.bam -recalFile ${file%%.bam}.recal.csv --default_read_group 1 --default_platform illumina; done

for file in *final.bam; do java -Xmx3g -jar /home/goliver/ngs_software/GenomeAnalysisTK-1.2-24-g6478681/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -I ${file} -R ../ref_chr17.hg19.fa -o ${file%%.bam}.vcf; done

**Jon_Keats** · 12-05-2011, 09:00 PM

Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends

**gavin.oliver** · 12-06-2011, 12:52 AM

Originally posted by Jon_Keats View Post

Do you have any paired-end data as opposed to single-ended as you methods suggest? The indel alignment should be better with paired-ends than single ends

This particular dataset is all single end. I am pretty certain the larger indels can still be detected though...

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 108 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

6-99bp indels with BWA/GATK

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News