Dear all,
I would like to discuss with you, if it is truly meaningful to merge overlapping paired end (PE) reads from Illumina exome or whole genome sequencing into a single (SE) read. Your first impulse is probably "yes, of course!", and I think that is what people usually do if they have overlapping PE data, but I'd like to invite you to rethink this concept with me.
Especially I want to raise the question if it is valid to count overlapping PE reads twice in the overlapping region. This depends on whether you consider these reads as independent. Naturally they come from the same amplification, i.e. the same cluster, but the sequencing of the two PE reads is independent. A contra argument is that we sequenced the same read twice, and if there is something in the DNA fragment that "triggers" a sequencing error in one read, then it is likely to occur in the other reads as well, leading us to count a sequencing error twice. However, if there is a true SNP in the fragment, we will correctly sequence it twice and have twice the read support for this SNP. However, this argument is only valid if you consider these two reads to be independent of each other, which I am not sure if this correct.
Below I summarize the arguments pro and contra merging again. I'd appreciate your thoughts on the matter.
PRO merging
- merging an overlapping PE reads gives us one longer SE read and longer reads are better
- if the DNA fragment "triggers" a sequencing error, both reads will have it. If we merge, there will only be one read with the error.
- merging gives us higher base confidence in the overlapping region.
- it is not valid to count reads coming from the same DNA fragment twice (not sure if this is correct).
CONTRA merging
- if the two paired reads are independent, merging will result in an artificial reduction of coverage, i.e. we throw away data.
Finally for my specific exome sequencing projects, these thoughts let me to the question if I should try to avoid overlapping PE reads, i.e. change my study design. I'd be happy if you contribute your thoughts on this matter as well. Thank you very much.
http://seqanswers.com/forums/showthread.php?t=61370
I would like to discuss with you, if it is truly meaningful to merge overlapping paired end (PE) reads from Illumina exome or whole genome sequencing into a single (SE) read. Your first impulse is probably "yes, of course!", and I think that is what people usually do if they have overlapping PE data, but I'd like to invite you to rethink this concept with me.
Especially I want to raise the question if it is valid to count overlapping PE reads twice in the overlapping region. This depends on whether you consider these reads as independent. Naturally they come from the same amplification, i.e. the same cluster, but the sequencing of the two PE reads is independent. A contra argument is that we sequenced the same read twice, and if there is something in the DNA fragment that "triggers" a sequencing error in one read, then it is likely to occur in the other reads as well, leading us to count a sequencing error twice. However, if there is a true SNP in the fragment, we will correctly sequence it twice and have twice the read support for this SNP. However, this argument is only valid if you consider these two reads to be independent of each other, which I am not sure if this correct.
Below I summarize the arguments pro and contra merging again. I'd appreciate your thoughts on the matter.
PRO merging
- merging an overlapping PE reads gives us one longer SE read and longer reads are better
- if the DNA fragment "triggers" a sequencing error, both reads will have it. If we merge, there will only be one read with the error.
- merging gives us higher base confidence in the overlapping region.
- it is not valid to count reads coming from the same DNA fragment twice (not sure if this is correct).
CONTRA merging
- if the two paired reads are independent, merging will result in an artificial reduction of coverage, i.e. we throw away data.
Finally for my specific exome sequencing projects, these thoughts let me to the question if I should try to avoid overlapping PE reads, i.e. change my study design. I'd be happy if you contribute your thoughts on this matter as well. Thank you very much.
http://seqanswers.com/forums/showthread.php?t=61370
Comment