I am getting some unexpected results when I try and estimate the insert size on paired end and mate pair data. I am using CLC Genomics WB and doing a de novo assembly of my data. Once I have my contigs I separately mapped my paired end and mate pair data back to my contigs to get what I thought would be a good measurements of the insert size. However with both data sets I got a bimodal distribution of insert sizes. The majority of my reads are centered on values that are roughly expected (paired end = 300bp and mate pair = 3.5k) but a good fraction are much smaller around 100 bp for paired end and 1k for the mater pair library. Has anybody seen something like this before? I am wondering if I might have just done something wrong.
Thanks
Thanks
Comment