View Single Post
Old 11-28-2011, 09:58 AM   #1
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 503
Default how to filter unaligned duplicate reads

I've been given a data set (PE-100 reads from both standard and mate-pair libraries) for de novo assembly that's likely to contain a significant fraction of duplicates, based on the number of PCR cycles used to amplify the libraries. I'm aware of tools that filter duplicates based on alignment, but I'd like to do the same for the unaligned reads before attempting assembly (by identifying reads that have identical sequences at both the 5' and 3' ends). Any recommendations?

Thanks,
Harold

Last edited by HESmith; 11-28-2011 at 11:06 AM. Reason: typos
HESmith is offline   Reply With Quote