SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Questionable diagnostic plots from Cummerbund/DESeq2 DPCook RNA Sequencing 3 11-06-2019 02:15 AM
Human whole-genome sequencing data analysis with low mapping rate zinky Bioinformatics 4 11-05-2014 07:39 PM
Sequencing low complexity libraries: effects on data casbon Illumina/Solexa 7 09-06-2011 12:51 AM
Sequencing facility chunhui Service Providers 11 06-23-2011 06:28 AM
Recommendations for sequencing facility crh Core Facilities 9 06-15-2011 02:13 PM

Reply
 
Thread Tools
Old 11-04-2019, 08:30 PM   #1
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default Low data output from questionable sequencing facility

Hi all,

I have been getting some questionable runs from a sequencing facility and was wondering if there could be any clues as to whether it is their fault or my own fault in lab prep.

I am sequencing fish gut microbiome libraries on a Miseq 2x300 and the first run I sent off was great (was a different facility); lowest sample had 30k reads. Average was 100k per sample about.

Then we switched facilities and since then I have never achieved anything close to my first run. And these are all very similar libraries, with near identical protocols except now we use MagBind bead cleans instead of AMPure (very similar product). Now all my runs average about 30k reads per sample, many getting less than 5k, and my latest Miseq run only generated 7 million reads total . I've attached a picture of QC reports for the good run and an example of a current bad one.

Is it that easy for a facility to mess up so many miseq runs? They have been loading at 12-13pM with 20-25% PhiX. My last library was qubited at 26nM.
Another piece of info is that the core recently dropped their prices in half. Don't want to give specifics but it is less than $800. So VERY cheap IMO. Not sure if this could influence what goes on at a facility.

From the reports %PF is around 93-95% and the Q30% is around 84-89% for these runs.

Let me know if I should provide any other information.
Any feedback is much appreciated!
Thanks,
Sam
Attached Images
File Type: png Run comparison.png (71.8 KB, 8 views)

Last edited by samd; 11-06-2019 at 06:46 PM.
samd is offline   Reply With Quote
Old 11-05-2019, 02:55 AM   #2
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,230
Default

More info for following would be useful:
1- Are the libraries 6S V regions and overall prep workflow
2- cluster density
3- How many libraries are multiplexed
4- Read output
nucacidhunter is offline   Reply With Quote
Old 11-05-2019, 09:02 AM   #3
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 422
Default

This is using custom sequencing primers?

The heating and cooling elements are not calibrated exactly the same for all MiSeqs (by Illumina). Some risky custom sequencing primer designs can work with some Miseqs better than others.
luc is offline   Reply With Quote
Old 11-05-2019, 10:16 AM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 511
Default

The $800 is very very low. Even twice that would be on the low end at many service providers for 2x300 v3 MiSeq.

Have you compared Qubit numbers to qPCR to see if there is a mismatch in those approaches to quantifying your library?
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 11-06-2019, 10:15 AM   #5
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi all,
appreciate the responses

@nucacidhunter
1. Primers are the EMP 515-806 V4 region
2. Cluster density was about 477 if I remember correctly
3. Just 1 library. Usually I was doing 160 samples per run and this time I reduced the run to about 100 samples and still got crummy results
4. Read output was actually 25M and then 23M PF which I guess is great actually but then 40% were undetermined and then when I import the fastq files into QIIME2 I only get about 7 million reads (this is before dada2, so still the entire files).
samd is offline   Reply With Quote
Old 11-06-2019, 10:22 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,992
Default

Quote:
2. Cluster density was about 477 if I remember correctly
That does not jive with read numbers. Are you referring to 23M total reads (R1+R2) or passing clusters?

That cluster density if not very high to get to 20+M clusters.
GenoMax is offline   Reply With Quote
Old 11-06-2019, 10:22 AM   #7
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

@luc

I used the Ilumina unique dual indexes so that should not be an issue right?

@SNPsaurus

I have not done that. I know that is much more accurate we just don't exactly have the capabilities in my lab. However, I can use another lab's qPCR machine I would just have to learn the protocol and all. Would this likely help get much better runs?
Sam
samd is offline   Reply With Quote
Old 11-06-2019, 10:29 AM   #8
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi GenoMax,

Under the Indexing QC tab it says Total Reads: 25,058,484 and then PF Reads: 23,419,084. And then the density is at 477. I believe this would be total reads: R1 +R2?
However, when I go to the Lane Metrics I see 11M reads PF, with density at 472. There is also 0.128 / 0.215 Phase/PrePhase%. Let me know if there is anything else I can look for that might help clarify.
Thanks

Last edited by samd; 11-06-2019 at 11:07 AM.
samd is offline   Reply With Quote
Old 11-06-2019, 11:14 AM   #9
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 511
Default

Quote:
Originally Posted by samd View Post
@SNPsaurus

I have not done that. I know that is much more accurate we just don't exactly have the capabilities in my lab. However, I can use another lab's qPCR machine I would just have to learn the protocol and all. Would this likely help get much better runs?
Sam
After seeing the updates it probably wouldn't help much. Some library preps can have a high percentage of non-functional DNA fragments but a PCR amplicon should be pretty reliable. And if you are getting 25 million raw reads and then just 11 million the issue is somewhere in the demultiplexing perhaps? Have you looked at the undetermined fastq file and looked for index sequences to see if they have Ns or are not present in the index sequence list?
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 11-06-2019, 01:58 PM   #10
itstrieu
Member
 
Location: Atlanta

Join Date: Nov 2018
Posts: 16
Default

Cluster density is kind of low for a v3 kit even for low diversity libraries. We usually sequence the V3V4 region on a V3 600 cycle kit and get around 50M PE reads on ~1000 K/mm2 with around 17 pM loading concentration.
itstrieu is offline   Reply With Quote
Old 11-06-2019, 02:27 PM   #11
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi SNPsaurus,

Ok so after discussing with them the 11M reads simply referred to the forward reads. The 23M PF refers to the forward and reverse. I see a 40% undetermined reads metric. Which is a bit high compared to 25% on my last run so I wonder if this is an indexing issue vs a sequencing issue?

@itstrieu: Damn I wish I could get those numbers. Would you recommend upping my concentration to 17pM instead of 13? Or is that dependent on other factors.
samd is offline   Reply With Quote
Old 11-06-2019, 02:30 PM   #12
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,230
Default

Quote:
Originally Posted by samd View Post
Hi all,
appreciate the responses

@nucacidhunter
1. Primers are the EMP 515-806 V4 region
2. Cluster density was about 477 if I remember correctly
3. Just 1 library. Usually I was doing 160 samples per run and this time I reduced the run to about 100 samples and still got crummy results
4. Read output was actually 25M and then 23M PF which I guess is great actually but then 40% were undetermined and then when I import the fastq files into QIIME2 I only get about 7 million reads (this is before dada2, so still the entire files).
Run stats seems within specs for the library type although they seem to be more cautious. To increase output safely following can be done:
1- Increasing cluster density to around 800k/mm2
2- Tunning PhiX% to 20 if after mapping undetermind reads majority of them origoinats from PhiX. Undetermind reads also could be reads that their index has not been assigned as a result of miss-matches in custom index primer or PCR primers itself.
nucacidhunter is offline   Reply With Quote
Old 11-06-2019, 02:36 PM   #13
itstrieu
Member
 
Location: Atlanta

Join Date: Nov 2018
Posts: 16
Default

Quote:
Originally Posted by samd View Post
Hi SNPsaurus,

Ok so after discussing with them the 11M reads simply referred to the forward reads. The 23M PF refers to the forward and reverse. I see a 40% undetermined reads metric. Which is a bit high compared to 25% on my last run so I wonder if this is an indexing issue vs a sequencing issue?

@itstrieu: Damn I wish I could get those numbers. Would you recommend upping my concentration to 17pM instead of 13? Or is that dependent on other factors.
I would say it is library dependent and what metrics you are aiming for. For V3V4 sequencing, we use spacer primers to add diversity to the run.

For V4 sequencing, we use 515F (Parada)–806R (Apprill) primers from EMB with a V2 500 cycle kit and we usually get about 35M PE reads that pass filter. For this library and kit, we load at around 8.75 pM but keep a log to have a floating average to determine the loading concentration for the next run.

Also it might depend on the MiSeq because we have two MiSeq and one performs slightly better. I usually vary the concentration in increments of 0.25 0.50 pM to be careful not to over cluster.

Last edited by itstrieu; 11-06-2019 at 02:42 PM.
itstrieu is offline   Reply With Quote
Old 11-06-2019, 02:43 PM   #14
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi nucacidhunter,

Sorry do you mean manually changing the cluster density to 800k/mm2? Is this something I could tell the facility to do?
And I have been using Ilumina UD indexes so I am guessing there shouldn't be any issues on that end.
Thanks,
Sam
samd is offline   Reply With Quote
Old 11-06-2019, 02:54 PM   #15
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi SNPsaurus,

Interesting. I have heard of the spacer primers but I think I am past the point in my PhD of redoing everything. But it would be nice.

This might be a dumb question but I thought a Miseq spits out 25M but I am seeing you and others say 35M or 50M? How is this possible?
samd is offline   Reply With Quote
Old 11-06-2019, 03:00 PM   #16
itstrieu
Member
 
Location: Atlanta

Join Date: Nov 2018
Posts: 16
Default

You can increase cluster density by increasing the loading concentration.

In my post they are paired end reads so double that if you where doing single end reads. Here is a link how many reads you should expect for various MiSeq kits but it is dependent on cluster density. https://www.illumina.com/systems/seq...fications.html

Edit. Sometimes you hear the word cluster and reads used together. I believe that for a v3 kit, it can generate around 25M unique cluster and each cluster can do two reads for paired end so it would output 50 M reads

Last edited by itstrieu; 11-06-2019 at 03:06 PM.
itstrieu is offline   Reply With Quote
Old 11-06-2019, 04:21 PM   #17
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Ok so I am guessing the "25M total reads" on Basespace actually means 50M since I did PE. Thanks for the suggestion I will look into that.

One thing I just remembered is the QC results were quite different from the first "good run" and the subsequent "bad runs". The good run has a nice skinny peak and the bad runs have lumpy peaks which I guess would be attributed to non-specific binding of primers? I've attached to the post in case anyone is interested or has any insight into that.
Again, thanks for all the feedback!
Sam
samd is offline   Reply With Quote
Old 11-06-2019, 04:47 PM   #18
itstrieu
Member
 
Location: Atlanta

Join Date: Nov 2018
Posts: 16
Default

Quote:
Originally Posted by samd View Post
Ok so I am guessing the "25M total reads" on Basespace actually means 50M since I did PE. Thanks for the suggestion I will look into that.

One thing I just remembered is the QC results were quite different from the first "good run" and the subsequent "bad runs". The good run has a nice skinny peak and the bad runs have lumpy peaks which I guess would be attributed to non-specific binding of primers? I've attached to the post in case anyone is interested or has any insight into that.
Again, thanks for all the feedback!
Sam
If the total reads is 25M under the indexing QC tabs in BaseSpace, it is actually the total PE reads. Under the Metrics tab, READS PF will be half of that. I would ask if they could rerun the library but at a higher concentration to target for a cluster density of around 900 K/mm2 for more reads.
itstrieu is offline   Reply With Quote
Old 11-06-2019, 06:45 PM   #19
samd
Member
 
Location: California

Join Date: Nov 2019
Posts: 10
Default

Hi itstrieu,

I see. Well I guess I am getting very low outputs then.. I will run that suggestion by them. It is just strange because my first run at Berkeley which I consider "good" was done at 12pM and I even tried upping it to 13pM here at UCLA and ended up getting fewer reads.
samd is offline   Reply With Quote
Reply

Tags
core, ilumina, low coverage, miseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO