SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unclassified 16SrRNA using green gene taxonomy Uzobiogene General 2 06-09-2016 07:15 AM
extract data from fasta-files with perl?? anna_ Bioinformatics 20 02-17-2016 07:29 AM
extract data from txt files sunnycqcn Bioinformatics 2 09-14-2015 07:55 AM
Extract reads according to selected indexes (MiSeq)) nazen Bioinformatics 2 06-12-2012 11:25 PM
PubMed: 16SrRNA bacterial tag-encoded amplicons pyrosequencing for rapid deciphering Newsbot! Literature Watch 0 05-05-2009 05:00 AM

Reply
 
Thread Tools
Old 03-15-2017, 08:18 AM   #1
LijunZ
Junior Member
 
Location: PA, USA

Join Date: Feb 2017
Posts: 5
Default How to extract subregions from Miseq 16srRNA data?

We did 2X300 pair end sequence with Miseq to cover V1-V3 regions of 16srRNA. We use primer 27F (AGAGTTTGATCATGGCTCAG) for V1, and position 534-518 R (ATTACCGCGGCTGCTGG) for V3 regions.

Now, if I want to check V1 region only, I extract the reads have 27F primer:

grep "AGAGTTTGATCATGGCTCAG" Read1
grep "AGAGTTTGATCATGGCTCAG" Read2

Then keep the first 92bp for both read1/read2 as V1 region.

My question, do I need to consider reverse complimentary seq for 27F primer for both pair end reads? Am I right to extract from both paired read with same primer? Why my left reads (with V1 primer) from above have lots of V3 primer (over 30%)?

Is there recommend way to extract subregion for this pair end sequence, i.e V1 or V3 only with known design primer?

Thanks

Last edited by LijunZ; 03-16-2017 at 06:19 AM.
LijunZ is offline   Reply With Quote
Old 03-15-2017, 01:56 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,473
Default

I suggest that you first merge the reads. Then, you can find and cut the primer-flanked regions using the BBMap package, like this:

Code:
msa.sh in=merged.fq out=locations1.sam query=AGAGTTTGATCATGGCTCAG

msa.sh in=merged.fq out=locations2.sam query=???????????

cutprimers.sh in=merged.fq out=V1 sam1=locations1.sam sam2=locations2.sam
For the second query, rather than ????????, you need to use the sequence flanking the opposite side of the V1 region, whatever that is.
Brian Bushnell is offline   Reply With Quote
Old 03-16-2017, 05:56 AM   #3
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 170
Default

The end of v3 is ~515. V1-V3 should be using pairs like 27f & 519R. 27F, 173F are both forward primers and won't give you any product? I'm guessing you were given these sequences from someone else? I'd go back and clarify what primers they actually used.
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Old 03-16-2017, 05:58 AM   #4
LijunZ
Junior Member
 
Location: PA, USA

Join Date: Feb 2017
Posts: 5
Default

Thanks Brian:
I have tried to merge pair reads, actually, we did two experiences to cover V1-3 and V3-5 regions for the same samples with over 1million reads pairs. However we didn't get enough joined reads, especially for V3-5 regions. Bioanalysor show the peak for V35 around 600-800bp, which have lots of off targets. we can not merge these pair end reads.

I want to extract subregions from these pair reads to compare with the merged result. also want to extract V3 region from both experiences V1-3 and V3-5 to check whether the results consistent for the same sample.

Do you think it is doable?

Thanks again.
LijunZ is offline   Reply With Quote
Old 03-16-2017, 06:18 AM   #5
LijunZ
Junior Member
 
Location: PA, USA

Join Date: Feb 2017
Posts: 5
Default

Thanks thermophile:

You are right, we are using 27F for V1, and position 534-518 (ATTACCGCGGCTGCTGG) for V3. I put typo as 173F.

if I check V1 primer, I can get around 47% reads:

grep "AGAGTTTGATCATGGCTCAG" Read1 > V1_R1_read --> 47% reads
grep "AGAGTTTGATCATGGCTCAG" Read2 > V1_R2_read --> 43% reads

if I check V3: I got 25% reads:
grep "ATTACCGCGGCTGCTGG" Read1 >V3_R1_Read --> 26% reads
grep "ATTACCGCGGCTGCTGG" Read2 >V3_R2_Read --> 28% reads

However, in V1 output reads (V1_R1/2_read), ~25% reads have V3 primer (or V3 primer reverse complementary)

and in V3 output reads (V3_R1/2_read), ~40% reads have V1 primer (or V1 primer reverse complementary)

For over 1million pair reads originally designed, we can only have 700k reads for V1 if we cat V1_R1_read and V1_R2_read and remove V3 primer reads; and 300k reads for v3 if we cat V3_R1_Read and V3_R2_Read and remove V1 primer reads.

Is it reasonable?

Thanks
LijunZ is offline   Reply With Quote
Old 03-16-2017, 06:58 AM   #6
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 170
Default

Something bad may have happened to the run if only 25% have the reverse primer (that should be near the beginning of R2). It's the middle of the region that you are likely going to miss when sequencing such long amplicons. So if you can't find the primer, something is wrong. If you are losing lots of sequences in the merging-that's expected because your amplicons are too long for miseq.

If you want to just look at V3, don't merge the reads because you will have essentially no overlap. Instead pull R2 from the v1-3 set and R1 from v3-5, do in silico pcr to trim out just v3.


ETA: AAAHHHH actually, your issues may be that you are using grep for degenerate primers. There are many positions in those primers that have variable bases. Look for tools for "in silico pcr" because they are build to deal with degeneracies
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

Last edited by thermophile; 03-16-2017 at 07:01 AM.
thermophile is offline   Reply With Quote
Old 03-16-2017, 09:03 AM   #7
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,473
Default

BBDuk, by the way, deals with degenerate bases and automatically reverse-complements as well. You can do something like this with a degenerate sequence to filter / see how many match (they will be called "contaminants"), optionally including some number of substitutions with the "hdist" flag:

bbduk.sh in=reads.fq outm=matched.fq k=16 mm=f literal=ACGTNNAKCCWTTACA copyundefined
Brian Bushnell is offline   Reply With Quote
Old 03-16-2017, 09:47 AM   #8
LijunZ
Junior Member
 
Location: PA, USA

Join Date: Feb 2017
Posts: 5
Default

Thanks Brian and thermophile:

Actually I use cutadapt to grep designed primer, and allowed 2 mismatch over primer for degenerated bases. then remove reverse complementary primer in the same reads.

For the low outcome of V3 read, could it come from PCR design? We used M13 (CAGGGTTTTCCCAGTCACGAC) + designed primer(ATTACCGCGGCTGCTGG in V13; CCTACGGGAGGCAGCAG in V35) in PCR to guide V3 region. Both experience in V13 and V35 have lower reads outcome of V3 only region compared with V1 and V5 only.

BTW, is it normal if we extract only V3 from V13 and V35 from same sample, but the otu table quite different from the same sample? Here, we use different primer for V3 region in both experience. (in V35 we use 341F to guide V3, and in V13, we use 534R to guide V3 region)

Thanks for your reply. Much appreciate it.

Last edited by LijunZ; 03-16-2017 at 09:49 AM.
LijunZ is offline   Reply With Quote
Old 03-16-2017, 10:30 AM   #9
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 170
Default

Why were you using M13? you were cloning these first?

are you giving cutadapt the degenerate primers and allowing for 2 mismatches? or giving it one of the possible non-degenerate primers and allowing for 2 mismatches? The first should work, the second should result in losing many sequences because there are degeneracies and mismatch potential which equals more than 2 differences
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO