SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
A first look at Illumina’s new NextSeq 500 AllSeq Vendor Forum 105 03-13-2017 01:39 PM
poly-G in NextSeq Asaf Illumina/Solexa 9 10-29-2015 01:08 AM
Picard failure on NextSeq data TonyBrooks Bioinformatics 4 09-23-2014 03:01 AM
Dual indexing on NextSeq bryanbriney Illumina/Solexa 1 06-19-2014 06:59 AM
100 Gb Data/Day – Nextseq 500 Sequencing Services Now Available on Genohub Genohub Vendor Forum 3 04-24-2014 08:28 AM

Reply
 
Thread Tools
Old 10-06-2014, 04:10 PM   #1
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default NextSeq Data

We recently acquired a NextSeq machine and are not very impressed with the data. I've uploaded a spreadsheet containing some of the statistics here:

https://drive.google.com/file/d/0B3l...ew?usp=sharing

The first tab is a HiSeq2000 2x150bp run. The insert size was below target, so I adapter-trimmed adapters before analyzing the data (no other preprocessing was run); and the HS2000 is not really spec'd to 2x150, so as you might imagine, the quality suffers toward the end. Regardless, it's pretty good. Looking at the mapping stats, 99.55% of the reads mapped, and overall 79.85% of the reads were error-free.

The next two tabs contain a couple of lanes of NextSeq bacterial sequence. Lane 1 generally seems to be the best, with quality dropping to a minimum at lane 4. But even for lane 1, only 96.47% of the reads mapped and 49.3% were perfect matches; by lane 4, 95.91% mapped and 38.91% were perfect. So the rate of reads with errors roughly tripled from HS2000 (which does not support 2x150bp runs) to NextSeq (which supposedly does), and as you can see on the "Average Quality by Position" and "Error Rate vs Read Position" graphs, the comparison would be brutal - an order of magnitude or more - if you consider 2x100bp reads. Also, if you look at the "Quality Score Accuracy" graph, the HS2000 quality scores are fairly accurate and typically underestimate quality, while the NextSeq ones are inaccurate and overestimate quality by about 10 dB (and are quantized), so you can't easily quality-trim the NextSeq data to improve it.

The "Library Uniqueness" graph, generated by sampling a kmer from each read and hashing it to see if it was seen before, is also very odd for NextSeq. It is wavy. The graph should monotonically decrease and any increase indicates a sudden error burst. So it seems maybe the period (~625000 reads) corresponds with an image frame, the clusters around the edges of the frame are blurry, as one might expect from low-quality or miscalibrated optics.

The Base Frequency vs Position graph is also interesting - NextSeq has a clear A/T ratio bias that is not present in HS data. The 3bp-wavelength sawtooth pattern probably has something to do with codon structure.

Does anyone else have data they'd like to share on NextSeq machines?

P.S. Command lines I used:

Code:
bbcountunique.sh in=reads.fq.gz reads=100000000 out=uniqueness.txt

bbduk.sh in=reads.fq.gz reads=4000000 ktrim=r k=25 hdist=1 mink=12 tbo tpe ref=nextera.fa,truseq.fa out=ktrimmed.fq.gz ow

bbmap.sh in=ktrimmed.fq.gz reads=4000000 mhist=mhist.txt ihist=ihist.txt bhist=bhist.txt idhist=idhist.txt ehist=ehist.txt qhist=qhist.txt idbins=200 qahist=qahist.txt aqhist=aqhist.txt indelhist=indelhist.txt gchist=gchist.txt

bbmerge.sh in=ktrimmed.fq.gz reads=4000000 ihist=ihist_merge.txt
Brian Bushnell is offline   Reply With Quote
Old 10-07-2014, 02:49 PM   #2
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,035
Default

Thanks Brian for posting your analysis results. I wonder if HiSeq reads are also from bacterial DNA library and prepared using the same protocol as NextSeq ones.
nucacidhunter is offline   Reply With Quote
Old 10-07-2014, 02:54 PM   #3
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

The HiSeq reads are bacterial, but from a collection of 26 different isolates mixed together to form a synthetic metagenomic community. I don't know much about the preparation protocols, but certainly the insert sizes differ substantially, so at least size selection was probably different; maybe shearing too.
Brian Bushnell is offline   Reply With Quote
Old 10-09-2014, 06:29 AM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 391
Default

Interesting, thanks very much for the detailed analysis and your thoughts. So the data looks a little worse than HiSeq, I agree, but they're at an early stage with the NextSeq chemistry. Far more serious would be the use of low quality optics, which would be understandable at that price point.

Any thoughts or observations on de novo assembly or SNP calling ? I believe I saw a post on SeqAnswers saying SNP calling works fine on the NextSeq at the expense of a few more indel errors (compared to HiSeq data).

We are interested in a direct comparison against the Ion Proton. I see these details indicate the indel error rate is a lot lower here than that what I've heard comes off the Proton. This is very important for getting good de novo assemblies of course.

Thanks again.
colindaven is offline   Reply With Quote
Old 07-22-2015, 08:39 AM   #5
rocksd
Member
 
Location: Houston, TX

Join Date: Jul 2010
Posts: 14
Default

Quote:
Originally Posted by Brian Bushnell View Post
We recently acquired a NextSeq machine and are not very impressed with the data. I've uploaded a spreadsheet containing some of the statistics here:

https://drive.google.com/file/d/0B3l...ew?usp=sharing

The first tab is a HiSeq2000 2x150bp run. The insert size was below target, so I adapter-trimmed adapters before analyzing the data (no other preprocessing was run); and the HS2000 is not really spec'd to 2x150, so as you might imagine, the quality suffers toward the end. Regardless, it's pretty good. Looking at the mapping stats, 99.55% of the reads mapped, and overall 79.85% of the reads were error-free.

The next two tabs contain a couple of lanes of NextSeq bacterial sequence. Lane 1 generally seems to be the best, with quality dropping to a minimum at lane 4. But even for lane 1, only 96.47% of the reads mapped and 49.3% were perfect matches; by lane 4, 95.91% mapped and 38.91% were perfect. So the rate of reads with errors roughly tripled from HS2000 (which does not support 2x150bp runs) to NextSeq (which supposedly does), and as you can see on the "Average Quality by Position" and "Error Rate vs Read Position" graphs, the comparison would be brutal - an order of magnitude or more - if you consider 2x100bp reads. Also, if you look at the "Quality Score Accuracy" graph, the HS2000 quality scores are fairly accurate and typically underestimate quality, while the NextSeq ones are inaccurate and overestimate quality by about 10 dB (and are quantized), so you can't easily quality-trim the NextSeq data to improve it.

The "Library Uniqueness" graph, generated by sampling a kmer from each read and hashing it to see if it was seen before, is also very odd for NextSeq. It is wavy. The graph should monotonically decrease and any increase indicates a sudden error burst. So it seems maybe the period (~625000 reads) corresponds with an image frame, the clusters around the edges of the frame are blurry, as one might expect from low-quality or miscalibrated optics.

The Base Frequency vs Position graph is also interesting - NextSeq has a clear A/T ratio bias that is not present in HS data. The 3bp-wavelength sawtooth pattern probably has something to do with codon structure.

Does anyone else have data they'd like to share on NextSeq machines?

P.S. Command lines I used:

Code:
bbcountunique.sh in=reads.fq.gz reads=100000000 out=uniqueness.txt

bbduk.sh in=reads.fq.gz reads=4000000 ktrim=r k=25 hdist=1 mink=12 tbo tpe ref=nextera.fa,truseq.fa out=ktrimmed.fq.gz ow

bbmap.sh in=ktrimmed.fq.gz reads=4000000 mhist=mhist.txt ihist=ihist.txt bhist=bhist.txt idhist=idhist.txt ehist=ehist.txt qhist=qhist.txt idbins=200 qahist=qahist.txt aqhist=aqhist.txt indelhist=indelhist.txt gchist=gchist.txt

bbmerge.sh in=ktrimmed.fq.gz reads=4000000 ihist=ihist_merge.txt
Hi Brian,

We are looking to purchasing a NextSeq. But we do have a concern regarding the quality of the reads generated on NextSeq. Do you have a better experience now with the NextSeq?

Your input is highly appreciated.

James
rocksd is offline   Reply With Quote
Old 07-22-2015, 09:19 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.
Brian Bushnell is offline   Reply With Quote
Old 07-22-2015, 01:45 PM   #7
rocksd
Member
 
Location: Houston, TX

Join Date: Jul 2010
Posts: 14
Default

Quote:
Originally Posted by Brian Bushnell View Post
V2 chemistry has substantially higher quality than V1; it's basically fine. However, it still has some issues with the barcode-reading cycles, which has caused problems with multiplexed runs; we've had some in which certain barcodes are misread ~95% of the time, and thus get demultiplexed into the unknown bin. Last I heard, Illumina was aware of this issue and working on it; not sure what the current status is.
Brian,

Thanks for your reply. Are those bar-codes (that were misread) from Illumina or are they custom ones that prepared by you or your end-user?

Thanks

James
rocksd is offline   Reply With Quote
Old 07-22-2015, 01:48 PM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,668
Default

I think they were Illumina TruSeq, but it's possible they were custom. They worked fine on HiSeq and MiSeq, though, and on NextSeq with V1 chemistry.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
illumina, nextseq, nextseq 500

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO