![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
error during GATK indel realigner | david.tamborero | Bioinformatics | 6 | 07-18-2012 06:30 AM |
problem about GATK indel VQSR | wanguan2000 | Bioinformatics | 2 | 11-07-2011 07:15 AM |
GATK indel settings help please | inou13 | Bioinformatics | 2 | 10-25-2011 09:09 AM |
Good Indel Calling Options? | Hkins552 | Bioinformatics | 1 | 08-24-2011 01:33 PM |
SNP and indel calling methods | Rachelly | Bioinformatics | 1 | 11-24-2010 04:45 AM |
![]() |
|
Thread Tools |
![]() |
#1 | ||
Member
Location: Houston, Texas Join Date: Jul 2011
Posts: 44
|
![]()
I'm trying to get variant calling to work using a small set of simulated reads of the phi X genome. The problem I'm seeing is that indels are assigned a quality of zero and their vcf entries are incorrect.
I used wgsim from samtools to generate simulated paired-end fastq read files of varying depths of coverage, then used bwa to align the simulated reads to the phi X reference. I followed the GATK recommendations to realign around possible indels, then to recalibrate, then I ran the UnifiedGenotyper using the -glm INDEL option along with --output-mode EMIT_ALL_SITES. The analysis info shows something like this: Quote:
The corresponding vcf file shows (I added a space after the "GT:" to keep ![]() Quote:
What can I do to 1) make the quality not zero; and 2) get the vcf file to correctly report the alternative allele? |
||
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Houston, Texas Join Date: Jul 2011
Posts: 44
|
![]()
I tried that--lowering the standard to zero, in fact--and without EMIT_ALL_SITES there is no output. It's like the indels are treated as nonvariant sites.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: USA Join Date: Jan 2011
Posts: 105
|
![]()
Hi guys,
I have this exact problem. And I have tried -mmq 0 -mbq 0 -stand_emit_conf 0 -stand_call_conf 0 without seeing anything. I welcome any verification of my (non) findings with these parameters. I also tried using real base qualities from real data without seeing any change... still actively working on this, and I would like to hear any solution or suggestion people have. Last edited by oiiio; 11-18-2011 at 03:33 PM. |
![]() |
![]() |
![]() |
#5 |
Member
Location: Houston, Texas Join Date: Jul 2011
Posts: 44
|
![]()
I've done some experimenting and discovered that the problem comes from the amount of sequencing error in the input data.
The data I am using is simulated using wgsim (distributed with samtools). The default error rate is .02. With that rate, UnifiedGenotyper makes no indel calls, irrespective of setting call / emit confidences to zero and minimum base quality to zero. Snp calls work just fine, though. UnifiedGenotyper is happy to make indel calls when the error rate is set to zero, and it works about half the time with an error rate of 0.01. I haven't found any parameters that allow the user to specify sensitivity to sequencing errors. Anybody else have ideas? |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: USA Join Date: Jan 2011
Posts: 105
|
![]()
Nice find Alex,
When you say it works half the time, do you mean that it is calling about half the indels in the sample? Or that half the samples you run UnifiedGenotyper on are making calls at all? |
![]() |
![]() |
![]() |
#7 |
Member
Location: Houston, Texas Join Date: Jul 2011
Posts: 44
|
![]()
I've done around a dozen trials, and it seems to be all or nothing. Either all candidate sites are called, or none of them are.
edit: An error rate of 0.005 seems ok, too. Last edited by Alex Renwick; 12-02-2011 at 12:28 PM. |
![]() |
![]() |
![]() |
Tags |
gatk, indel, unifiedgenotyper |
Thread Tools | |
|
|