View Single Post
Old 09-13-2011, 08:15 PM   #3
Senior Member
Location: San Diego, CA

Join Date: May 2010
Posts: 116

There are some papers on this topic if you search in Google Scholar and other posts at seqanswer that discussed this topic (use the search function).

The short answer is that you can't tell for sure if the read is artificial or real. It is dependent on a number of factors such as sequencing technology used, expected coverage, read length, etc. There are some approaches that make some assumptions to identify artificial duplicates (e.g. metagenomic reads starting with the same bases are assumed to be duplicates).

I see a similar number of duplicates for 454/Roche sequencing independent of the type of sample sequenced (metagenome, metatranscriptome, ...).

Maybe you can give some more details about your data.
robs is offline   Reply With Quote