Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract subregions from Miseq 16srRNA data?

    We did 2X300 pair end sequence with Miseq to cover V1-V3 regions of 16srRNA. We use primer 27F (AGAGTTTGATCATGGCTCAG) for V1, and position 534-518 R (ATTACCGCGGCTGCTGG) for V3 regions.

    Now, if I want to check V1 region only, I extract the reads have 27F primer:

    grep "AGAGTTTGATCATGGCTCAG" Read1
    grep "AGAGTTTGATCATGGCTCAG" Read2

    Then keep the first 92bp for both read1/read2 as V1 region.

    My question, do I need to consider reverse complimentary seq for 27F primer for both pair end reads? Am I right to extract from both paired read with same primer? Why my left reads (with V1 primer) from above have lots of V3 primer (over 30%)?

    Is there recommend way to extract subregion for this pair end sequence, i.e V1 or V3 only with known design primer?

    Thanks
    Last edited by LijunZ; 03-16-2017, 06:19 AM.

  • #2
    I suggest that you first merge the reads. Then, you can find and cut the primer-flanked regions using the BBMap package, like this:

    Code:
    msa.sh in=merged.fq out=locations1.sam query=AGAGTTTGATCATGGCTCAG
    
    msa.sh in=merged.fq out=locations2.sam query=???????????
    
    cutprimers.sh in=merged.fq out=V1 sam1=locations1.sam sam2=locations2.sam
    For the second query, rather than ????????, you need to use the sequence flanking the opposite side of the V1 region, whatever that is.

    Comment


    • #3
      The end of v3 is ~515. V1-V3 should be using pairs like 27f & 519R. 27F, 173F are both forward primers and won't give you any product? I'm guessing you were given these sequences from someone else? I'd go back and clarify what primers they actually used.
      Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

      Comment


      • #4
        Thanks Brian:
        I have tried to merge pair reads, actually, we did two experiences to cover V1-3 and V3-5 regions for the same samples with over 1million reads pairs. However we didn't get enough joined reads, especially for V3-5 regions. Bioanalysor show the peak for V35 around 600-800bp, which have lots of off targets. we can not merge these pair end reads.

        I want to extract subregions from these pair reads to compare with the merged result. also want to extract V3 region from both experiences V1-3 and V3-5 to check whether the results consistent for the same sample.

        Do you think it is doable?

        Thanks again.

        Comment


        • #5
          Thanks thermophile:

          You are right, we are using 27F for V1, and position 534-518 (ATTACCGCGGCTGCTGG) for V3. I put typo as 173F.

          if I check V1 primer, I can get around 47% reads:

          grep "AGAGTTTGATCATGGCTCAG" Read1 > V1_R1_read --> 47% reads
          grep "AGAGTTTGATCATGGCTCAG" Read2 > V1_R2_read --> 43% reads

          if I check V3: I got 25% reads:
          grep "ATTACCGCGGCTGCTGG" Read1 >V3_R1_Read --> 26% reads
          grep "ATTACCGCGGCTGCTGG" Read2 >V3_R2_Read --> 28% reads

          However, in V1 output reads (V1_R1/2_read), ~25% reads have V3 primer (or V3 primer reverse complementary)

          and in V3 output reads (V3_R1/2_read), ~40% reads have V1 primer (or V1 primer reverse complementary)

          For over 1million pair reads originally designed, we can only have 700k reads for V1 if we cat V1_R1_read and V1_R2_read and remove V3 primer reads; and 300k reads for v3 if we cat V3_R1_Read and V3_R2_Read and remove V1 primer reads.

          Is it reasonable?

          Thanks

          Comment


          • #6
            Something bad may have happened to the run if only 25% have the reverse primer (that should be near the beginning of R2). It's the middle of the region that you are likely going to miss when sequencing such long amplicons. So if you can't find the primer, something is wrong. If you are losing lots of sequences in the merging-that's expected because your amplicons are too long for miseq.

            If you want to just look at V3, don't merge the reads because you will have essentially no overlap. Instead pull R2 from the v1-3 set and R1 from v3-5, do in silico pcr to trim out just v3.


            ETA: AAAHHHH actually, your issues may be that you are using grep for degenerate primers. There are many positions in those primers that have variable bases. Look for tools for "in silico pcr" because they are build to deal with degeneracies
            Last edited by thermophile; 03-16-2017, 07:01 AM.
            Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

            Comment


            • #7
              BBDuk, by the way, deals with degenerate bases and automatically reverse-complements as well. You can do something like this with a degenerate sequence to filter / see how many match (they will be called "contaminants"), optionally including some number of substitutions with the "hdist" flag:

              bbduk.sh in=reads.fq outm=matched.fq k=16 mm=f literal=ACGTNNAKCCWTTACA copyundefined

              Comment


              • #8
                Thanks Brian and thermophile:

                Actually I use cutadapt to grep designed primer, and allowed 2 mismatch over primer for degenerated bases. then remove reverse complementary primer in the same reads.

                For the low outcome of V3 read, could it come from PCR design? We used M13 (CAGGGTTTTCCCAGTCACGAC) + designed primer(ATTACCGCGGCTGCTGG in V13; CCTACGGGAGGCAGCAG in V35) in PCR to guide V3 region. Both experience in V13 and V35 have lower reads outcome of V3 only region compared with V1 and V5 only.

                BTW, is it normal if we extract only V3 from V13 and V35 from same sample, but the otu table quite different from the same sample? Here, we use different primer for V3 region in both experience. (in V35 we use 341F to guide V3, and in V13, we use 534R to guide V3 region)

                Thanks for your reply. Much appreciate it.
                Last edited by LijunZ; 03-16-2017, 09:49 AM.

                Comment


                • #9
                  Why were you using M13? you were cloning these first?

                  are you giving cutadapt the degenerate primers and allowing for 2 mismatches? or giving it one of the possible non-degenerate primers and allowing for 2 mismatches? The first should work, the second should result in losing many sequences because there are degeneracies and mismatch potential which equals more than 2 differences
                  Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X