Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post wenhuang Bioinformatics 25 04-29-2011 10:51 AM FredOnSeq Illumina/Solexa 6 04-18-2011 05:19 PM FredOnSeq Bioinformatics 2 09-09-2010 01:27 AM krobison Bioinformatics 1 04-30-2010 11:58 AM edge 454 Pyrosequencing 10 10-01-2009 12:23 AM

 04-13-2011, 08:27 AM #1 chariko Member   Location: Spain Join Date: Jun 2010 Posts: 56 Determine paired end overlapping I have a paired end Illumina exome data set that might have overlapped at the ends. My fragment size has 105 bp. I aligned my samples with bwa and generated my bam files with samtools. My question is: Is there any way to determine how many reads overlapped? How can I determine the distance between the paired ends (in case they didnīt overlap) or how much they overlapped? If this percentage is high, I am thinking about reanalyzing my data generating fragments of 75 or 50bp. Do you think that’s correct? Which percentage could be the cut-off to consider it high? Thanks Last edited by chariko; 04-13-2011 at 08:29 AM.
 04-28-2011, 06:59 AM #2 volks Member   Location: hd.de Join Date: Jun 2010 Posts: 81 in the SAM/BAM the TLEN column will tell you the template length. if it is smaller than 2*105bp you have overlapping ends (if there are no indels). try this to see the template length distribution of the first million reads: samtools view PEalignment.bam | head -n 1000000 | cut -f 9 | sort -n | uniq -c why would you want to generate fragments?
04-28-2011, 11:52 PM   #3
chariko
Member

Location: Spain

Join Date: Jun 2010
Posts: 56

Quote:
 Originally Posted by volks in the SAM/BAM the TLEN column will tell you the template length. if it is smaller than 2*105bp you have overlapping ends (if there are no indels). try this to see the template length distribution of the first million reads: samtools view PEalignment.bam | head -n 1000000 | cut -f 9 | sort -n | uniq -c why would you want to generate fragments?
Thanks for your answer, with the TLEN column I could manage it.

Regarding your question, the problem of having too much overlap is that I will miss the advantages of an paired end experiment that is detection of structural variants in the genome between the pairs for example. So if I generated 75 bases fragments I would have less overlapping. I know it's better if you work with longer reads but I thought this could be a solution. Also I think there are softwares that can get this info even if pairs overlap but I donīt know yet which of them. Do you have an idea?

Last edited by chariko; 04-28-2011 at 11:57 PM.

 Tags exome, overlap, paired end