Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble: getting reads with Reverse primer

    Hi SeqA team,

    I've hit another road block.
    Data information: 16s V3-V4 region, demultiplexed. Illumina, Paired end. PhiX 5%

    I've forward, and reverse primer information.
    F: CCTACGGGDGGCWGCA
    R: GGACTACHVGGGTMTCTAATC

    When I try to grep sequences with Forward primer in R1, I get reads from 1k-12k. With all degenerate primer combination.

    However, I get less than 10 sequences with reverse primer, degenerate combination on R2.
    I tried with ^ (starts), ($)ends with, nothing seems to give me reads for reverse primer.
    Tried with reverse complement of reverse primer, no results.

    1)
    Wouldn't R2 sequences have reverse primers?

    2)
    I went ahead with forward, and reverse primer information to assemble the reads, with Mothur. I didn't get any output.
    When I removed reverse primer information, I managed to get an output of decent file size.


    The 16S sequencing was done as the protocol
    Bioinformaticscally calm

  • #2
    Have you tried to find the reverse complement of R (i.e. GATTAGAKACCCBDGTAGTCC) with either reads?

    Comment


    • #3
      Hi genomax,
      Thanks for your reply.

      I tried with degenerate primers of reverse complement (GGACTACHVGGGTMTCTAATC).
      No outputs. 0 .

      I double checked with the one you shared. No outputs. 0.

      I double checked with sequencing center, they said they have used R as reverse primers. And no other processing has been done apart from index removal (during de-multiplex).

      Weird.
      Bioinformaticscally calm

      Comment


      • #4
        By the way, BBDuk processes degenerate bases (with the copyundefined flag), and looks for both the forward and reverse-complement... you might find that to be a more robust alternative to grep.

        bbduk.sh in=reads.fq out=matching.fq literal=CCTACGGGDGGCWGCA k=16 mm=f copyundefined=t

        To get reads containing both the F and R sequences would require 2 sequential passes.

        Comment


        • #5
          @Brian @bio_informatics could also use "kmercountexact.sh" to see what oligo is overrepresented? In case the sequence of R is incorrect?

          Comment


          • #6
            Hi Brian,
            Thanks for adding information about bbduk. I shall try with that, too.

            But the question haunts. Why there are no reads with reverse primer: either way - forward or reverse complement?
            Bioinformaticscally calm

            Comment


            • #7
              Hi genomax,

              Thanks for the idea.
              I pulled out fastqc, and in that looked for overrepresented sequences.
              For reverse read.

              Of many sequences, I've:
              Over:
              GACTACTGGGGTATCTAATCCTGTTTGATCCCCACGCTTTCGCACATCAG

              Rev Primer:
              GGACTACHVGGGTMTCTAATC

              Clearly this is unmatched until the length of Reverse primer. (even leaving degenerate bases)

              Originally posted by GenoMax View Post
              @Brian @bio_informatics could also use "kmercountexact.sh" to see what oligo is overrepresented? In case the sequence of R is incorrect?
              Last edited by bio_informatics; 06-30-2015, 11:18 AM.
              Bioinformaticscally calm

              Comment


              • #8
                Originally posted by GenoMax View Post
                @Brian @bio_informatics could also use "kmercountexact.sh" to see what oligo is overrepresented? In case the sequence of R is incorrect?
                That's true, but since many parts of the V3-V4 region are so highly conserved... and, well, with amplicon sequencing in general, it might be hard to get any useful signal unless you restrict the search to only the end of the read, by trimming everything other than the first 22bp before the analysis, then reverse-complement the reads and repeat to get the other end.

                You should actually be able to just do this visually - the primer sequence in question should be the first or last 22bp. What do you see there?

                Comment


                • #9
                  Originally posted by Brian Bushnell View Post
                  You should actually be able to just do this visually - the primer sequence in question should be the first or last 21bp. What do you see there?
                  Thanks for your reply.
                  I pulled out few sequence, trim until 21 bp. Below are 3 lines after trimming 21:

                  GACTACTCGGGTCTCTAATCC
                  GACTACTTGGGTATCTAATCC
                  GACTACAAGGGTCTCTAATCC

                  GGACTACHVGGGTMTCTAATC
                  To which my reverse primer is terrible in matching with them.
                  Last edited by bio_informatics; 06-30-2015, 12:04 PM.
                  Bioinformaticscally calm

                  Comment


                  • #10
                    Hopefully the right region has been amplified in that dataset.

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      Hopefully the right region has been amplified in that dataset.
                      Amen!, I too hope the same.
                      Primer is clearly different then the ones I'm able to see.
                      Bioinformaticscally calm

                      Comment


                      • #12
                        Someone must have used a wrong primer. Not much you can do but report to experimental folks.

                        Comment


                        • #13
                          Actually, that's a perfect match:

                          Code:
                           GACTACAAGGGTCTCTAATCC
                          [B]GGACTAC[U]HV[/U]GGGT[U]M[/U]TCTAATC[/B]
                          They only differ at the degenerate symbols, and the degenerate symbols are:

                          H: A or C or T
                          V: A or C or G
                          M: A or C

                          ...which include the bases in question. For some reason it's offset by one base though. But that's probably not an issue with the primer, just where reading starts.

                          Comment


                          • #14
                            Originally posted by Brian Bushnell View Post
                            Actually, that's a perfect match:

                            Code:
                             GACTACAAGGGTCTCTAATCC
                            [B]GGACTAC[U]HV[/U]GGGT[U]M[/U]TCTAATC[/B]
                            They only differ at the degenerate symbols, and the degenerate symbols are:

                            H: A or C or T
                            V: A or C or G
                            M: A or C

                            ...which include the bases in question. For some reason it's offset by one base though. But that's probably not an issue with the primer, just where reading starts.
                            Thanks. This is what I have used for all the data received. All files have trimmed primer.
                            Bioinformaticscally calm

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X