I just happened upon a couple question threads in various places that warn against using both paired and non-paired reads for SNP calling.
It appears that paired and unpaired reads are on two different playing fields as far as their mapping quality, which makes sense, and this makes a difference in inferring SNPs downstream. I've tried looking for more information about this and potential ways to overcome it, but I'm not really seeing anything.
I thought I would pose the question to this forum and see what more knowledgeable people can explain. Namely, a more detailed explanation of this problem would be nice for those of us who don't know a ton about what is going on under the hood of various variant calling pipelines (e.g., samtools and GATK). Is it possible to overcome this issue and take advantage of both paired and unpaired (i.e., single-end sequencing or pairs broken upstream)? Are there tools that can account for this?
Any thoughts are appreciated. I have some datasets where using broken pairs would be great, rather than just ignoring them.
Cheers,
Daren
It appears that paired and unpaired reads are on two different playing fields as far as their mapping quality, which makes sense, and this makes a difference in inferring SNPs downstream. I've tried looking for more information about this and potential ways to overcome it, but I'm not really seeing anything.
I thought I would pose the question to this forum and see what more knowledgeable people can explain. Namely, a more detailed explanation of this problem would be nice for those of us who don't know a ton about what is going on under the hood of various variant calling pipelines (e.g., samtools and GATK). Is it possible to overcome this issue and take advantage of both paired and unpaired (i.e., single-end sequencing or pairs broken upstream)? Are there tools that can account for this?
Any thoughts are appreciated. I have some datasets where using broken pairs would be great, rather than just ignoring them.
Cheers,
Daren
Comment