Some of our whole-genome libraries end up with low insert sizes (e.g. ~150) for 2x100 bp sequencing with Illumina HiSeq. I'm concerned about the effect this will have on variant calling.
Do you know how samtools and/or GATK deal with paired-end reads that overlap? I believe that samtools assumes the reads are independent. Therefore, if there is a PCR error in the middle of your insert, it may appear as two reads (the overlapping ends of a read pair). With low-coverage sequencing data this could lead to a significant number of false variants.
Is there a good way to deal with this?
Many thanks for your suggestions.
Do you know how samtools and/or GATK deal with paired-end reads that overlap? I believe that samtools assumes the reads are independent. Therefore, if there is a PCR error in the middle of your insert, it may appear as two reads (the overlapping ends of a read pair). With low-coverage sequencing data this could lead to a significant number of false variants.
Is there a good way to deal with this?
Many thanks for your suggestions.
Comment