SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
HiSeq 3000/4000 2x150 data scrosby Illumina/Solexa 10 06-04-2015 09:29 AM
Lotsa new toys from Illumina: HiSeq X Five, 3000, 4000, NextSeq 550 GW_OK Illumina/Solexa 53 05-20-2015 11:30 PM
Concerns for combining data from HiSeq 2000 and HiSeq 2500 jaaker Illumina/Solexa 1 02-04-2013 02:56 PM
3000 paired end library titration run sabrinaelias Bioinformatics 1 07-01-2010 11:29 AM
3000 Members Registered! ECO Site Announcements 0 05-09-2009 06:11 AM

Reply
 
Thread Tools
Old 05-07-2015, 12:41 AM   #1
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default First HiSeq 3000 data

Hello,

in case you are interested, we have posted some results from our first HiSeq 3000 runs as well as some information and considerations on the changes introduced by the new HiSeq 3000 and Hiseq 4000 sequencer generation. The yields on the first two runs were higher than expected at about 340 million reads per lane. The quality looks good.
There is also link to the complete data from a PhiX lane (including the clusters that did not "pass filter").

http://dnatech.genomecenter.ucdavis....data-download/

Btw, the cluster images do not give away the patterned character of the flowcells. Please see the attachment.

Best,
Lutz
Attached Images
File Type: png HS3Kclusterpicsbw.png (94.5 KB, 141 views)

Last edited by DNATECH; 05-07-2015 at 12:48 AM.
DNATECH is offline   Reply With Quote
Old 05-07-2015, 03:53 PM   #2
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Some Q30 plots.
Attached Images
File Type: png HS3KQ30plotPE100.png (47.5 KB, 77 views)
File Type: png HS3KQ30 lotPE150.png (51.0 KB, 74 views)
DNATECH is offline   Reply With Quote
Old 05-07-2015, 05:11 PM   #3
idedios
Member
 
Location: Irvine, CA

Join Date: Mar 2014
Posts: 18
Default

Sweet my company just got one and we'll do our training run in a couple weeks from now. The instrument was setup and configured nearly a month ago so it's bothering to see it idle for so long.
idedios is offline   Reply With Quote
Old 05-07-2015, 05:59 PM   #4
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

We are actually waiting for flow-cells for several weeks; I guess Illumina is surprised that anybody wants to use the new sequencers?
DNATECH is offline   Reply With Quote
Old 05-08-2015, 01:18 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

@DNATECH: Based on this (and your other post) it sounds like you need "near perfect libraries" to get good data from patterned flowcells. This could be a problem for core facilities, where "variable" quality libraries come in from customers.

It would be interesting to hear about your experiences as real world customer libraries start flowing through.
GenoMax is offline   Reply With Quote
Old 05-08-2015, 08:02 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Quote:
Originally Posted by DNATECH View Post
Hello,

in case you are interested, we have posted some results from our first HiSeq 3000 runs as well as some information and considerations on the changes introduced by the new HiSeq 3000 and Hiseq 4000 sequencer generation. The yields on the first two runs were higher than expected at about 340 million reads per lane.
You mean 340 million pass-filter clusters per lane?

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-08-2015, 08:34 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

What final concentration was the phiX library that you clustered? I mean after neutralization?

I mean is there no danger of overclustering anymore? That was what I was hoping for when I heard about the patterned flowcells...

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-08-2015, 08:52 AM   #8
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Yes, an average of 340 million clusters passing filter per lane.

Quote:
Originally Posted by pmiguel View Post
You mean 340 million pass-filter clusters per lane?

--
Phillip
DNATECH is offline   Reply With Quote
Old 05-08-2015, 09:44 AM   #9
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Hi Miguel,

the input was 5ul of PhiX at 2 nM. So far we have used 2 nM concentrations for all our libraries/lanes. Illumina recommends up to 3 nM.
From what our FAS told us, I got the impression under-loading could be more detrimental than over-loading.

Quote:
Originally Posted by pmiguel View Post
What final concentration was the phiX library that you clustered? I mean after neutralization?

I mean is there no danger of overclustering anymore? That was what I was hoping for when I heard about the patterned flowcells...

--
Phillip
DNATECH is offline   Reply With Quote
Old 05-08-2015, 10:37 AM   #10
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Hi GenoMax,

perhaps we are just being careful at the moment - since Illumina seems to be very careful and there is very little information so far. The customer samples (n=11) have been looking great so far except one; this sample had some larger low complexity component to it (which we were not aware off). For this sample the Q30 rates dropped after the first 60 to 70 bases of low complexity bases from 95% to 70%.

Quote:
Originally Posted by GenoMax View Post
@DNATECH: Based on this (and your other post) it sounds like you need "near perfect libraries" to get good data from patterned flowcells. This could be a problem for core facilities, where "variable" quality libraries come in from customers.

It would be interesting to hear about your experiences as real world customer libraries start flowing through.
DNATECH is offline   Reply With Quote
Old 05-08-2015, 11:07 AM   #11
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Quote:
Originally Posted by DNATECH View Post
Hi Miguel,

the input was 5ul of PhiX at 2 nM. So far we have used 2 nM concentrations for all our libraries/lanes. Illumina recommends up to 3 nM.
From what our FAS told us, I got the impression under-loading could be more detrimental than over-loading.
Wow, 2000 pM? I think the highest we ever went on the HiSeq2500 was 23 pM.

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-08-2015, 11:33 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

To get 2-2.5x more clusters (compared to a 2500) load 100x more? DNA binding in nanowells must not be very efficient.

Last edited by GenoMax; 05-08-2015 at 11:35 AM.
GenoMax is offline   Reply With Quote
Old 05-08-2015, 12:11 PM   #13
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Hi Pmiguel,

the basic procedure looks like:
- 5 ul of library (2 nM to 3 nM including PhiX)
- add 5 ul 0,1 N NaOH
- add 5 ul Tris (200mM)
- add 35 ul Enzyme Master Mix
- load all 50 ul onto cBot

Quote:
Originally Posted by pmiguel View Post
Wow, 2000 pM? I think the highest we ever went on the HiSeq2500 was 23 pM.

--
Phillip
DNATECH is offline   Reply With Quote
Old 05-08-2015, 01:16 PM   #14
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I finally finished downloading these, and I'll take a look at the quality from mapping. But before I do that, I always trim adapters... but I was never sure what kind of adapters PhiX reads had. They don't exactly match any adapters in my list, so I'll call them "PhiX adapters". Here they are, for reference:

>Read1_adapter
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAA
>Read2_adapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA

Also, at least for the first 4 million reads, 29.36% failed the chastity filter.
Brian Bushnell is offline   Reply With Quote
Old 05-08-2015, 03:20 PM   #15
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Hi Brian,

Thanks for looking at the data. The files that I uploaded have 482,680,800 reads. The sequencer generates "reads" for each single nanowell - no matter if it is loaded or not. Thus, the figure of 30% or higher "failing" reads is expected. The SAV viewer indicates a total of 482.68 million nanowells. According to Illumina 60% to 70% of clusters passing filter are considered to be very good; because the figure is calculated with respect to the total number of nanowells. I did intentionally upload files including all non-passing reads (the majority of the "not passing filter" data are likely simply empty nano-wells though).

Lutz


Quote:
Originally Posted by Brian Bushnell View Post
I finally finished downloading these, and I'll take a look at the quality from mapping. But before I do that, I always trim adapters... but I was never sure what kind of adapters PhiX reads had. They don't exactly match any adapters in my list, so I'll call them "PhiX adapters". Here they are, for reference:

>Read1_adapter
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAA
>Read2_adapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA

Also, at least for the first 4 million reads, 29.36% failed the chastity filter.

Last edited by DNATECH; 05-08-2015 at 03:38 PM.
DNATECH is offline   Reply With Quote
Old 05-08-2015, 04:08 PM   #16
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Thanks for the clarification, and thanks for sharing your data!

I did some mapping of the first 16m reads, and generated the following graphs:



The "Other" category refers to soft-clipped bases, which is very high in this case because PhiX is small so many of the reads went off the end (*Considering these reads have been adapter-trimmed, I have no idea what is being sequenced past the ends of the PhiX genome; it might be interesting to investigate). Overall the average error rate is below 1% but above 0.1% across the read. Read 2 has a higher-than-expected insertion rate in the first half of the read. Oddly, R2 has some Ns only in the first half, and R1 has some Ns only in the second half. Unlike other platforms, the error rate for R2 seems fairly flat across the read.


This is a different way of looking at the same data.


The quality accuracy graph indicates that again the Q-scores are binned, and like NextSeq V1, they are highly inflated. Over 70% of the bases were assigned Q41, but the average observed quality for Q41 bases was actually Q31.


The insert size distribution is fairly interesting for a couple reasons. It looks like the platform can probably handle inserts over 450bp fairly well; there were some short inserts, but they did not overwhelmingly out-compete the long ones. But the flat distribution of the short-insert tail is odd.

Lastly, it's worth noting that around 83% of the reads mapped to the reference with no mismatches or indels.

For comparison, I've attached the mhist of a 2x150bp HS2500 run (not on PhiX), below. To me the HS2500 looks better, but not drastically better, in terms of error rates.

Attached Images
File Type: png qhist.png (22.9 KB, 429 views)
File Type: png qahist.png (17.5 KB, 414 views)
File Type: png mhist.png (43.9 KB, 411 views)
File Type: png ihist.png (16.2 KB, 418 views)
File Type: png hs2500_mhist.png (50.1 KB, 413 views)

Last edited by Brian Bushnell; 05-08-2015 at 07:02 PM.
Brian Bushnell is offline   Reply With Quote
Old 05-09-2015, 09:05 AM   #17
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Quote:
Originally Posted by DNATECH View Post
Hi Pmiguel,

the basic procedure looks like:
- 5 ul of library (2 nM to 3 nM including PhiX)
- add 5 ul 0,1 N NaOH
- add 5 ul Tris (200mM)
- add 35 ul Enzyme Master Mix
- load all 50 ul onto cBot
Ah, that's very interesting. They were finally forced to kick that ridiculous 50X dilution/neutralization step to the curb.

So you cluster at 200-300 pM. About 10-15x what we use on our HiSeq2500.

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-11-2015, 03:15 PM   #18
DNATECH
Member
 
Location: Davis, CA

Join Date: Mar 2015
Posts: 29
Default

Thanks a lot for the detailed analysis Brian.
Lutz
DNATECH is offline   Reply With Quote
Old 05-12-2015, 06:56 AM   #19
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,287
Default

Quote:
Originally Posted by Brian Bushnell View Post

The insert size distribution is fairly interesting for a couple reasons. It looks like the platform can probably handle inserts over 450bp fairly well; there were some short inserts, but they did not overwhelmingly out-compete the long ones. But the flat distribution of the short-insert tail is odd.
About the size distribution of the library vs. size distribution of the amplicons that actually cluster. I created a thread some years ago about a somewhat extreme sample clustered on the MiSeq:

http://seqanswers.com/forums/showthread.php?t=20839

The 4th post in the thread, I actually converted the mass-based/log-linear plot results from the Agilent bioanalyzer chip to a linear, molecule-based plot. This way it can be directly compared to the insert sizes found by mapping the reads-pairs back to the genome from which they came.

The result showed that the shorter amplicons must have clustered preferentially. Really preferentially.

To me this has always suggested there must be some sort of competition for clustering that favors shorter amplicons.

At the much higher clustering concentrations using for the 3000/4000 this process may be exacerbated.

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-12-2015, 11:58 AM   #20
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by pmiguel View Post
The 4th post in the thread, I actually converted the mass-based/log-linear plot results from the Agilent bioanalyzer chip to a linear, molecule-based plot. This way it can be directly compared to the insert sizes found by mapping the reads-pairs back to the genome from which they came.

The result showed that the shorter amplicons must have clustered preferentially. Really preferentially.

To me this has always suggested there must be some sort of competition for clustering that favors shorter amplicons.
Impressive; I was under the impression that inserts much over 800bp simply would not bridge-amplify. Maybe we should try that approach! Anyway, rather than shorter molecules vastly out-competing longer molecules at all lengths, that could be a more of a case where the rates are fairly similar up to a point (1kbp?) after which longer molecules start failing to form clusters at all (even if there were no short molecules present). I'm just guessing, though.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
hiseq 3000, hiseq 4000

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO