Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 - align two PCR primers help

    Hi,
    Not sure it´s the correct strategy by i´m trying to achieve the following:

    1- I have two primers (see below) that map to 16s amplicons and they generate an insert of 1200 to 1400 nt long.

    The two primers are described below.
    >forward_primer
    CCTACGGGNGGCWGCAG
    >reverse_primer
    GGATTAGATACCCBDGTAGTC

    When i try to use bowtie2 to align the two primers to a reference database of different 16S sequences they are supposed to match i only get 600 hits (expected 400000 hits), meaning this primer set will align perfectly to 400000 different sequences of the database.

    bowtie2 -f -t -a --no-dovetail --no-contain --no-overlap --no-mixed --fr --minins 800 --maxins 1500 -x 16S_database -1 forward_primer -2 reverse_primer


    I have checked the orientation of the primers , -fr is the correct orientation.

    It looks too me that bowtie2 handles degenerated primers differently then what i expected. Does anybody know how to improve the alignement of the degenerated primers (-N 1 option does not really improve). ANy idea ???

  • #2
    This sounds like a good place to use BBDuk, depending on what exactly you're trying to do.

    bbduk.sh in=16S.fasta out=matching.fasta literal=CCTACGGGNGGCWGCAG k=17 mm=f copyundefined=t

    That will give you all sequences that contain the literal, after first making copies of the literal to represent every fully-specified (ACGT) version of the degenerate bases. You can then run it a second time with the other primer and a different value of k (which should equal the primer length).

    Comment


    • #3
      Hi Brian,
      At the end I want to see if my degenrated primers (or the sum of the oligos as you suggest) match the same 16s. That will tell me the coverage of my primers in my 16s database.

      Indeed that looks like a valid option. As you suggest i can align both separately and then combine primers that match the same 16s , check the distance between the primers to satisfy the 800 to 1500 nt in between. Then i guess extract the aligned sequences (start -end positions of the alignement).

      I´ll give it a try,
      thanks,
      david
      Last edited by danova; 09-10-2015, 11:54 AM.

      Comment


      • #4
        If the distances are important, I have another program that's slightly different, but may be useful. Usage:

        Code:
        msa.sh in=16S.fa out=mapped1.sam literal=CCTACGGGNGGCWGCAG
        
        msa.sh in=16S.fa out=mapped2.sam literal=GGATTAGATACCCBDGTAGTC
        
        cutprimers.sh in=16S.fa out=cut.fa sam1=mapped1.sam sam2=mapped2.sam
        The sam files indicate the best alignment of the adapter sequence to every input sequence. This may not be optimal in your case since it allows indels (I designed it for PacBio reads) but in practice it really won't matter. The sam files may themselves be useful to you; and the output of cutprimers is the subsequences between the primers. msa.sh does not understand ambiguity codes (I think it converts them to N) so you may get better results if you manually list all permutations of the sequence as a comma-delimited list. I'll probably build in automatic permutation sometime in the future.

        Also, it's probably best to only run this on the sequences after first filtering with BBDuk, because msa.sh is very aggressive and will output the most closely-matching site for each primer even if it's not a very good match (containing indels and substitutions, for example). Again, that's because I designed it to operate on PacBio 16S reads (to pull out the V4 region).

        Comment


        • #5
          HI Brian,
          I´m also working with Pacbio so definetly it looks pretty much close to what i´m trying to achieve. I´m currently also evaluating pacbio primers and having the sam output is really helpfull. I need to look in detail to this and will post back.
          david

          Comment


          • #6
            Brian,
            That was really great and worked as expected, however i need more control over the alignements. How can you count the number of mismatches (limit to 3 mismatches, with no indels)

            Have tried also bbduk but get only small number of aligned sequences (419). I expect at least 60% of my database to match this primer.
            Executing jgi.BBDukF [in=ALL16s_GOLD_HOMD.fa, out=matching.fasta, literal=CCTACGGGNGGCWGCAG, k=17, mm=f, copyundefined=t]

            BBDuk version 35.43
            Initial:
            Memory: max=29020m, free=28565m, used=455m

            Added 8 kmers; time: 0.018 seconds.
            Memory: max=29020m, free=28263m, used=757m

            Input is being processed as unpaired
            Started output streams: 0.012 seconds.
            Processing time: 0.239 seconds.

            Input: 12551 reads 19118812 bases.
            Contaminants: 12132 reads (96.66%) 18494334 bases (96.73%)
            Result: 419 reads (3.34%) 624478 bases (3.27%)

            Time: 0.275 seconds.
            Reads Processed: 12551 45.59k reads/sec
            Bases Processed: 19118k 69.45m bases/sec
            Last edited by danova; 09-11-2015, 12:43 PM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X