View Single Post
Old 06-18-2015, 08:06 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,089
Default

bbduk.sh from BBMap package can do this. If that sequence is at the end of the reads then,

Code:
$ bbduk.sh -Xmx1g in=reads.fq outm=matched.fq outu=unmatched.fq restrictleft=19 k=19 literal=ATTTTTTTTAGAAAAAAAA
In this case, all reads starting with "ATTTTTTTTAGAAAAAAAA" will end up in "matched.fq" and all other reads will end up in "unmatched.fq". Specifically, the command means "look for 19-mers in the leftmost 19 bp of the read", which will require an exact prefix match, though you can relax that if you want.

So you could bin all the reads with your known sequence, then look at the remaining reads to see what they have in common. You can do the same thing with the tail of the read using "restrictright" instead, though you can't use both restrictions at the same time.
GenoMax is offline   Reply With Quote