 04-13-2011, 08:27 AM #1 chariko Member   Location: Spain Join Date: Jun 2010 Posts: 56 Determine paired end overlapping I have a paired end Illumina exome data set that might have overlapped at the ends. My fragment size has 105 bp. I aligned my samples with bwa and generated my bam files with samtools. My question is: Is there any way to determine how many reads overlapped? How can I determine the distance between the paired ends (in case they didnīt overlap) or how much they overlapped? If this percentage is high, I am thinking about reanalyzing my data generating fragments of 75 or 50bp. Do you think that’s correct? Which percentage could be the cut-off to consider it high? Thanks Last edited by chariko; 04-13-2011 at 08:29 AM.
 04-28-2011, 06:59 AM #2 volks Member   Location: hd.de Join Date: Jun 2010 Posts: 81 in the SAM/BAM the TLEN column will tell you the template length. if it is smaller than 2*105bp you have overlapping ends (if there are no indels). try this to see the template length distribution of the first million reads: samtools view PEalignment.bam | head -n 1000000 | cut -f 9 | sort -n | uniq -c why would you want to generate fragments?
chariko
Thanks for your answer, with the TLEN column I could manage it.

Regarding your question, the problem of having too much overlap is that I will miss the advantages of an paired end experiment that is detection of structural variants in the genome between the pairs for example. So if I generated 75 bases fragments I would have less overlapping. I know it's better if you work with longer reads but I thought this could be a solution. Also I think there are softwares that can get this info even if pairs overlap but I donīt know yet which of them. Do you have an idea?

Last edited by chariko; 04-28-2011 at 11:57 PM.

 Tags exome, overlap, paired end