SEQanswers (
-   Bioinformatics (
-   -   Bowtie2 - align two PCR primers help (

danova 09-10-2015 07:13 AM

Bowtie2 - align two PCR primers help
Not sure its the correct strategy by im trying to achieve the following:

1- I have two primers (see below) that map to 16s amplicons and they generate an insert of 1200 to 1400 nt long.

The two primers are described below.

When i try to use bowtie2 to align the two primers to a reference database of different 16S sequences they are supposed to match i only get 600 hits (expected 400000 hits), meaning this primer set will align perfectly to 400000 different sequences of the database.

bowtie2 -f -t -a --no-dovetail --no-contain --no-overlap --no-mixed --fr --minins 800 --maxins 1500 -x 16S_database -1 forward_primer -2 reverse_primer

I have checked the orientation of the primers , -fr is the correct orientation.

It looks too me that bowtie2 handles degenerated primers differently then what i expected. Does anybody know how to improve the alignement of the degenerated primers (-N 1 option does not really improve). ANy idea ???

Brian Bushnell 09-10-2015 09:34 AM

This sounds like a good place to use BBDuk, depending on what exactly you're trying to do. in=16S.fasta out=matching.fasta literal=CCTACGGGNGGCWGCAG k=17 mm=f copyundefined=t

That will give you all sequences that contain the literal, after first making copies of the literal to represent every fully-specified (ACGT) version of the degenerate bases. You can then run it a second time with the other primer and a different value of k (which should equal the primer length).

danova 09-10-2015 11:50 AM

Hi Brian,
At the end I want to see if my degenrated primers (or the sum of the oligos as you suggest) match the same 16s. That will tell me the coverage of my primers in my 16s database.

Indeed that looks like a valid option. As you suggest i can align both separately and then combine primers that match the same 16s , check the distance between the primers to satisfy the 800 to 1500 nt in between. Then i guess extract the aligned sequences (start -end positions of the alignement).

Ill give it a try,

Brian Bushnell 09-10-2015 12:21 PM

If the distances are important, I have another program that's slightly different, but may be useful. Usage:

Code: in=16S.fa out=mapped1.sam literal=CCTACGGGNGGCWGCAG in=16S.fa out=mapped2.sam literal=GGATTAGATACCCBDGTAGTC in=16S.fa out=cut.fa sam1=mapped1.sam sam2=mapped2.sam

The sam files indicate the best alignment of the adapter sequence to every input sequence. This may not be optimal in your case since it allows indels (I designed it for PacBio reads) but in practice it really won't matter. The sam files may themselves be useful to you; and the output of cutprimers is the subsequences between the primers. does not understand ambiguity codes (I think it converts them to N) so you may get better results if you manually list all permutations of the sequence as a comma-delimited list. I'll probably build in automatic permutation sometime in the future.

Also, it's probably best to only run this on the sequences after first filtering with BBDuk, because is very aggressive and will output the most closely-matching site for each primer even if it's not a very good match (containing indels and substitutions, for example). Again, that's because I designed it to operate on PacBio 16S reads (to pull out the V4 region).

danova 09-10-2015 12:30 PM

HI Brian,
Im also working with Pacbio so definetly it looks pretty much close to what im trying to achieve. Im currently also evaluating pacbio primers and having the sam output is really helpfull. I need to look in detail to this and will post back.

danova 09-11-2015 11:41 AM

That was really great and worked as expected, however i need more control over the alignements. How can you count the number of mismatches (limit to 3 mismatches, with no indels)

Have tried also bbduk but get only small number of aligned sequences (419). I expect at least 60% of my database to match this primer.
Executing jgi.BBDukF [in=ALL16s_GOLD_HOMD.fa, out=matching.fasta, literal=CCTACGGGNGGCWGCAG, k=17, mm=f, copyundefined=t]

BBDuk version 35.43
Memory: max=29020m, free=28565m, used=455m

Added 8 kmers; time: 0.018 seconds.
Memory: max=29020m, free=28263m, used=757m

Input is being processed as unpaired
Started output streams: 0.012 seconds.
Processing time: 0.239 seconds.

Input: 12551 reads 19118812 bases.
Contaminants: 12132 reads (96.66%) 18494334 bases (96.73%)
Result: 419 reads (3.34%) 624478 bases (3.27%)

Time: 0.275 seconds.
Reads Processed: 12551 45.59k reads/sec
Bases Processed: 19118k 69.45m bases/sec

All times are GMT -8. The time now is 09:50 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.