SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MiSeq V4 16S run low OTU evenness Egansbay Illumina/Solexa 16 12-04-2015 11:16 AM
16S Miseq run with 96 indexed samples marcpavi Illumina/Solexa 7 09-26-2015 01:15 AM
MiSeq 16S amplicon bad quality fish Illumina/Solexa 10 04-06-2015 07:43 PM
QIIME constraints and time to run for 16S Illumina danwiththeplan Metagenomics 4 03-27-2013 01:24 PM
QIIME constraints and time to run for 16S Illumina danwiththeplan Bioinformatics 0 03-26-2013 01:46 PM

Reply
 
Thread Tools
Old 05-11-2015, 07:22 AM   #1
MiSeqUserLUX
Junior Member
 
Location: Luxembourg

Join Date: Nov 2014
Posts: 6
Default Illumina MiSeq 16S run quality

Hi SEQ-users,

We are currently following the Illumina Demonstrated Protocol for 16S sequencing on the MiSeq (24-96 samples) for stool & saliva samples. We are experiencing some inconsistencies with the results of the run metrics (e.g. %Q30 ranges between 64 - 85%, cluster density between 421 to 1328 K/mm2 etc) between each run.

I was just wondering for those who use the same protocol,

1. What does your 16S MiSeq sequencing run look like? (in terms of %Q30, cluster density, % aligned etc).
Have you set specific run metrics for run acceptance?
(I have attached the run summary of our recent run, let me know your thoughts.)

2. What is the minimum number of sequences you process per sample?
Apart from the Illumina document, do you know any publication that recommends a certain number of reads per sample?

3. We are using USEARCH for quality filtering before assembly. However we get very low R2 reads. Would you recommend other quality filtering tools?

Any answers or suggestions would be greatly appreciated.

Thank you in advance.
Attached Images
File Type: png May 2015 16S Run.png (141.2 KB, 78 views)
MiSeqUserLUX is offline   Reply With Quote
Old 05-11-2015, 07:37 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,300
Default

Are you spiking in phiX and if so at what concentration? Are these same libraries being run repeatedly or different sample pools? What method are you using for estimation of concentration? Do you expect the reads to overlap and are you using any software to do the read merge before you quality trim?
GenoMax is offline   Reply With Quote
Old 05-11-2015, 08:30 AM   #3
MiSeqUserLUX
Junior Member
 
Location: Luxembourg

Join Date: Nov 2014
Posts: 6
Default

Quote:
Originally Posted by GenoMax View Post
Are you spiking in phiX and if so at what concentration? Are these same libraries being run repeatedly or different sample pools? What method are you using for estimation of concentration? Do you expect the reads to overlap and are you using any software to do the read merge before you quality trim?
Yes. For the attached run summary, I spiked-in 10% of 20pM PhiX and loaded 3.5 pM library. We use KAPA qPCR to quantify the libraries before the normalization step & pooling. We have run the same samples twice, some of them this is the 3rd time. We sequenced the V3-V4 region (around 466 bp stretch) and used 2x300 PE reads (miseq v3 kit). We expect the reads to overlap at least 40 bp.

I am not really an expert in Bioinformatics but the way our bioinformatician set-up our pipeline is to
1. validate first the DNA sequences from the miseq using USEARCH by quality filtering each R1 & R2 separately). Then once they pass the quality filtering, they will go to the next step which is...
2. bacterial classification by cleaning, clustering, taxonomic assignment, building of abundance matrix using UPARSE.

R2 rarely pass the quality filtering as we usually get only about 4000 reads per sample that pass.. Is this too low?

Are we doing it differently from all the others? Is there a better way?
MiSeqUserLUX is offline   Reply With Quote
Old 05-11-2015, 09:24 AM   #4
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 192
Default

Our 16S runs using the V4 region generally have Q30 ~ 85-95%, density of ~1000K/mm2 with no PhiX spike. I'll check with on how much we load.

No specific run metrics for acceptance, we use the standard Illumina "does it pass spec" criteria. For 16S, you really don't need many reads per sample as you will rarify later in the analysis anyways. We aim for 100k reads per sample just to make sure most/all of them will have enough to be included, but the saturation curves generally plateau very quickly (even as low as 4-6k reads). Look at the HMP papers.

We use the fastq-join utility to join reads (it's a quality score aware joiner, so low Q score pairs will be discarded). Is it possible that you're setting the quality filter too strict?
Attached Images
File Type: png Screenshot from 2015-05-11 10:11:21.png (91.8 KB, 45 views)
fanli is offline   Reply With Quote
Old 05-11-2015, 09:56 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,300
Default

Quote:
Originally Posted by MiSeqUserLUX View Post
R2 rarely pass the quality filtering as we usually get only about 4000 reads per sample that pass.. Is this too low?
That is odd. Have you run FastQC on these? Can you post Q-score plots for read 1 and 2? Do you know what Q-score cut-off your informatics people are using?
GenoMax is offline   Reply With Quote
Old 05-11-2015, 10:09 AM   #6
microgirl123
Senior Member
 
Location: New England

Join Date: Jun 2012
Posts: 188
Default

Quote:
We expect the reads to overlap at least 40 bp.
This might be part of your problem. I'm not a bioinformatics person, but I usually see a much larger overlap recommended (except by Illumina). The ends of Read 1 and Read 2 (especially) are much lower in quality than the start. If you only have a small overlap, you are trying to stick together two bad quality sections of your reads, which causes problems.

Quote:
I spiked-in 10% of 20pM PhiX and loaded 3.5 pM library
I'm confused by this also. I spike 10% of 12.5 pM PhiX into a 9.5 pM library and see runs similar to the one you linked (cluster density ~900K, 10% phiX aligned). You're loading a lot more phiX and a lot less library, and only seeing 15% align.

Last edited by microgirl123; 05-11-2015 at 10:12 AM.
microgirl123 is offline   Reply With Quote
Old 05-11-2015, 11:55 PM   #7
MiSeqUserLUX
Junior Member
 
Location: Luxembourg

Join Date: Nov 2014
Posts: 6
Default

Quote:
If you only have a small overlap, you are trying to stick together two bad quality sections of your reads, which causes problems.
We perform 2x300 bp PE reads for a 466 bp amplicon (we have an overlap of around 140 bp before the quality filtering). After quality filtering, we expect the reads to overlap by at least 40 bp. Then we will perform merging and classification. In this case, is 40 bp overlap after quality filtering still too small? Or is it enough?

Quote:
Do you know what Q-score cut-off your informatics people are using?
Quote:
Is it possible that you're setting the quality filter too strict?
The quality reads selection criteria that our bioinformatician has set are as follows:
Expected error of global reads sequence < 1
Each reads nucleotide Q score > 3
Length > 250 bp (to have an overlap > 40bp after quality filtering)

Is this too strict or just right? As per our bioinformatician, the Qscore and Expected error values are the ones recommended by Uparse developers and in the Uparse publication.

An example of our quality filtering result is attached.

I have also just ran a FASTQC on one of the samples. Attached are the results.

Any thoughts?
Thank you in advance.
MiSeqUserLUX is offline   Reply With Quote
Old 05-12-2015, 03:28 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,300
Default

I think you meant to say Q30 (not 3) since your data does not seem to have any reads below Q5.

If that is indeed Q30 (and above) then it seems to be a very stringent filter. Since the reads are expected to overlap perhaps the merging should be done prior and Q-score used as a criteria to keep the base with the higher quality (if the merge is not perfect). Look into BBMerge (http://seqanswers.com/forums/showthread.php?t=43906) or FLASH as options http://ccb.jhu.edu/software/FLASH/.
GenoMax is offline   Reply With Quote
Old 05-12-2015, 11:29 PM   #9
MiSeqUserLUX
Junior Member
 
Location: Luxembourg

Join Date: Nov 2014
Posts: 6
Default

Thank you GenoMax and thank you all for your replies.
I had a look at our pipeline closely and indeed there was something that needs to be fixed (UPARSE workflow recommends merging of paired reads first before read quality filtering.) So your right, merging needs to be done first.

For some reasons I don't know why our bioinformatician set-up the pipeline this way:

STEP 1. reads quality filtering of R1 and R2 separately (this is where a lot of our reads are discarded and the bioinformatician tells me that the MiSeq data are not usable)

IF the sequences pass STEP 1,
then what will be done is step 2...

STEP 2. back to scratch>> merging of paired reads, read quality filtering.... assembly....

I believe starting from step 2 would be sufficient.
MiSeqUserLUX is offline   Reply With Quote
Old 07-24-2015, 07:43 PM   #10
GA-J
Member
 
Location: USA

Join Date: Jul 2015
Posts: 25
Default

Hello, Fanli, I like your result of 16s V4 Miseq run. I would like to know how much you loaded? And the size of your library is ?

Thanks.
GA-J is offline   Reply With Quote
Old 07-28-2015, 08:18 AM   #11
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 192
Default

We load 8.0 pM library using the 515F/806R primers detailed here:
http://www.earthmicrobiome.org/emp-s...protocols/16s/

Edit: 1.8pM was for NextSeq runs

Last edited by fanli; 07-30-2015 at 06:44 AM.
fanli is offline   Reply With Quote
Old 08-03-2015, 07:05 AM   #12
GA-J
Member
 
Location: USA

Join Date: Jul 2015
Posts: 25
Default

Fanli, thank you for your information. Two more questions, did you use Miseq V2 kit for this 16s V4 run? Why no Phix spike in(how do you decide no Phix, any protocol mentioned or you tested it out? )? I want to change my protocol, but I would like to know why. Thank you very much.
GA-J is offline   Reply With Quote
Old 08-03-2015, 07:09 AM   #13
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 192
Default

Yes, these numbers are for v2 kits. We've found that there's little difference with a small PhiX spike on our particular MiSeq, but I don't really see the harm in doing something in the 5% range. You generally aren't going to be constrained for sequencing depth with 16S anyways.
fanli is offline   Reply With Quote
Old 08-03-2015, 09:09 AM   #14
GA-J
Member
 
Location: USA

Join Date: Jul 2015
Posts: 25
Default

Thank you, Fanli.
GA-J is offline   Reply With Quote
Old 09-04-2015, 04:53 AM   #15
RickC7
Member
 
Location: Baton Rouge, Louisiana

Join Date: Feb 2010
Posts: 22
Default

Not meaning to hijack the thread, but can anyone explain why we see such low 1st cycle intensities with 16s libraries? I see this in both v3-v4 and v4 only libaries, perhaps due to low diversity? If I look at run summary from targeted reseq or phix, then the 1st cycle intensities are comparable and normally in the 300-400 range, but 16s runs are usually <50. Thanks for any insight.
RickC7 is offline   Reply With Quote
Old 09-04-2015, 09:07 AM   #16
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 192
Default

I think low diversity would only cause jaggedness in the intensity profile, but maybe you should check w/ tech support.

Our 16S runs have 1st cycle intensities ~150.
Attached Images
File Type: png Screenshot from 2015-09-04 10:03:42.png (120.1 KB, 19 views)
fanli is offline   Reply With Quote
Old 09-04-2015, 09:22 AM   #17
RickC7
Member
 
Location: Baton Rouge, Louisiana

Join Date: Feb 2010
Posts: 22
Default strange

hi fanli,

hmmm, the run summary you posted on 5-11-2015 shows read 1 intensity at 17 and read 4 intensity at 53. The other run summary from the OP also shows low intensity 1st cycle. This also fits with what I see in 16s libraries on miseq. So, 3 different machines, 3 different places, 3 similarly low 1st cycle intensities...
Tech support is telling me this is the reason I am having issues with run completion. Last 3 runs are terminating randomly, run1- cycle 385, run2-cycle 60, run3 - cycle 587. Tech support has been helpful in replacing kits, but miseq is still out of commission. Have arrange for libraries to be sequenced on another miseq, Qc all checks out so I have little to no concern about the libraries. One comment that came out was " your 1st cycle intensities are very low..." I get a stopped run and funky z-stage errors, z-stage replaced but same issue persists.
RickC7 is offline   Reply With Quote
Old 09-04-2015, 09:38 AM   #18
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 192
Default

My bad - that last screenshot is Called Int. Yeah, you're right - the Cycle 1 intensities for my last run are 30 and 39 for read 1 and read 3, respectively.

We haven't had any issues with runs terminating randomly for 16S libraries though. Although now that I think about it, we did have one bacterial WGS run that died on cycle 515 or so. Something about a .NET framework error and tech support said they hadn't seen it before.
fanli is offline   Reply With Quote
Old 11-18-2016, 03:26 AM   #19
BioGenomics
Member
 
Location: Belgium

Join Date: Apr 2009
Posts: 24
Default

Hi all,

what is the minimum Q-value you would suggest for a 16s read/merged amplicon Trimming/clipping ?

thanks
BioGenomics is offline   Reply With Quote
Old 11-18-2016, 07:19 AM   #20
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 179
Default

I don't trim based on qscore, I use the qscores to merge reads. I use mothur which impliments pandaseq for it's read merging
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO