View Single Post
Old 10-18-2016, 06:15 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by horvathdp View Post
Is there a script, program (something in BBMAP?) or common way to remove duplicates based on the sequence identifier (as opposed to a kmer-or sequence based method since I want to retain all unique fragments at this point)?
Since you had asked about "based on sequence identifiers" originally .. but it sounds like you are just looking to de-duplicate the actual fastq reads.

dedupe.sh from BBMap is what you need. Depending on the size of your sequence file be ready to allocate adequate amount of RAM to the process.
GenoMax is offline   Reply With Quote