View Single Post
Old 09-10-2015, 12:21 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If the distances are important, I have another program that's slightly different, but may be useful. Usage:

Code:
msa.sh in=16S.fa out=mapped1.sam literal=CCTACGGGNGGCWGCAG

msa.sh in=16S.fa out=mapped2.sam literal=GGATTAGATACCCBDGTAGTC

cutprimers.sh in=16S.fa out=cut.fa sam1=mapped1.sam sam2=mapped2.sam
The sam files indicate the best alignment of the adapter sequence to every input sequence. This may not be optimal in your case since it allows indels (I designed it for PacBio reads) but in practice it really won't matter. The sam files may themselves be useful to you; and the output of cutprimers is the subsequences between the primers. msa.sh does not understand ambiguity codes (I think it converts them to N) so you may get better results if you manually list all permutations of the sequence as a comma-delimited list. I'll probably build in automatic permutation sometime in the future.

Also, it's probably best to only run this on the sequences after first filtering with BBDuk, because msa.sh is very aggressive and will output the most closely-matching site for each primer even if it's not a very good match (containing indels and substitutions, for example). Again, that's because I designed it to operate on PacBio 16S reads (to pull out the V4 region).
Brian Bushnell is offline   Reply With Quote