Hello everybody,
I have started working with MiSeq and I am trying to sequence custom amplicons with 500 cycles kit (250 nucleotides sequenced by R1 and R2). I have not managed to detect all the indels, specially long indels (above 18-40 nucleotides). My library contains 48 amplicons of 300 nucleotides on average and they have high coverage (around 10000 reads each). As the MiSeq reporter results are not accurate, I am using bwa to generate .bam, then samtools to sort and index the .bam files and finally GATK to detect variants. I have tried with -glm BOTH, and also witn -glm INDEL but both failed.
When I load the .bam in the IGV I can see that there is an insertion or a deletion in the corresponding nucleotide and all the SNPs are correctly shown but then when I obtained the .vcf by GATK I obtain a lot of false positives and false negatives and I have no long indels.
I have also tried PINDEL but I have not detected long indels either, perhaps because the R1 and R2 sequences are overlapped.
Could anybody tell me how can I improve my variants detection, specially the indels?
Thanks in advanced!
Lourdes
I have started working with MiSeq and I am trying to sequence custom amplicons with 500 cycles kit (250 nucleotides sequenced by R1 and R2). I have not managed to detect all the indels, specially long indels (above 18-40 nucleotides). My library contains 48 amplicons of 300 nucleotides on average and they have high coverage (around 10000 reads each). As the MiSeq reporter results are not accurate, I am using bwa to generate .bam, then samtools to sort and index the .bam files and finally GATK to detect variants. I have tried with -glm BOTH, and also witn -glm INDEL but both failed.
When I load the .bam in the IGV I can see that there is an insertion or a deletion in the corresponding nucleotide and all the SNPs are correctly shown but then when I obtained the .vcf by GATK I obtain a lot of false positives and false negatives and I have no long indels.
I have also tried PINDEL but I have not detected long indels either, perhaps because the R1 and R2 sequences are overlapped.
Could anybody tell me how can I improve my variants detection, specially the indels?
Thanks in advanced!
Lourdes
Comment