SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Vendor Forum



Similar Threads
Thread Thread Starter Forum Replies Last Post
NextSeq 500 and HiSeq X Ten Services Coming Soon to Genohub.com Genohub Vendor Forum 11 04-22-2014 09:46 AM
$1,000 Exomes|$6,500 Genomes from EdgeBio EdgeBio Vendor Forum 1 10-18-2012 01:54 PM
MiSeq 500 cycle kits available yet? Bucky Illumina/Solexa 6 08-14-2012 01:11 PM
help! samtools gave me more than 500,000 snps slowsmile Bioinformatics 1 12-15-2011 09:24 AM
500 errors on the wiki... dan Wiki Discussion 3 08-14-2011 08:35 AM

Reply
 
Thread Tools
Old 12-11-2014, 12:05 PM   #21
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The graphs I posted in this thread are from one NextSeq, but I have generated similar graphs from multiple libraries run on 3 independent NextSeq machines at 3 different facilities (one being Illumina), and they all look about the same.
Brian Bushnell is offline   Reply With Quote
Old 12-16-2014, 06:20 AM   #22
Innovelty
Member
 
Location: Storrs, CT

Join Date: Sep 2012
Posts: 13
Default

Damn, thanks Brian. I woke up this morning thinking that maybe I should try a NextSeq run instead of HiSeq 2000 for this chapter of my dissertation. It seemed like I might be able to get a slightly better assembly for the money, given the longer PE reads available. I don't so much think so, now.
Innovelty is offline   Reply With Quote
Old 12-16-2014, 06:33 AM   #23
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

You can (get the long reads )

Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.
GenoMax is offline   Reply With Quote
Old 12-16-2014, 07:43 AM   #24
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

Data quality is definitely inferior to both the MiSeq and HiSeq.
It's quick though, and perhaps more suited for counting applications, such as RNA-Seq and ChIPSeq than variant calling.
The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.

I'll attempt to post the QC from the Illumina PhiX we sequenced during training.
TonyBrooks is offline   Reply With Quote
Old 12-16-2014, 10:31 AM   #25
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by TonyBrooks View Post
The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.
Yep, I plan to plot the error rate across a genome and see if I can see some kind of pattern, but I have not had time to do that yet.

Quote:
I'll attempt to post the QC from the Illumina PhiX we sequenced during training.
That would be great!
Brian Bushnell is offline   Reply With Quote
Old 12-16-2014, 10:43 AM   #26
Innovelty
Member
 
Location: Storrs, CT

Join Date: Sep 2012
Posts: 13
Default

Quote:
Originally Posted by GenoMax View Post
You can (get the long reads )

Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.


One can... provided one has access to the machine, and more than a tiny pilot grant to work with. Sadly, when one works on non-model insects for non-agricultural/biomedical purposes, and one is only a wee third-year, one might only have ~$2500 to spend on the run itself. (Not that one is complaining. One is really super pleased about that.)

Enough of my de-railing, though -- really looking forward to updates from TonyBrooks, because my application is a de novo transcriptome project, primarily interested in the gene expression. Thanks one and all.
Innovelty is offline   Reply With Quote
Old 01-13-2015, 04:07 AM   #27
aeonsim
Member
 
Location: Belgium

Join Date: Jun 2011
Posts: 45
Default

So I've been getting some initial test data back from a Nextseq 500 and I'm really not happy with it compared to the Hiseq.

The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).

Attached are a number of different PDF's showing the data compared to HiSeq data from the same facility (same experienced technician's doing all the sequencing, library prep and everything). We resequenced our HiSeq libraries (PCR-Free 550bp insert) to compare like to like and you can clearly see the difference.

Two of the files show GATK's BQSR before after and plots for one of our typical Hiseq libraries (recalQC-randomHiseq.pdf) and a Hiseq library sequenced on the NextSeq (BQSR-NextSeq-Before-After.pdf). The difference is substantial and while these are not the same library the Hiseq is representative of what we usually get.

The second two files show the same library with 4 lanes of NextSeq sequence vs the Same library when sequenced on the Hiseq, you'll clearly be able to determine which comes from which machine (Nxt, Nxt, Nxt, Nxt, Hiseq).

Finally here are some alignment stats from Picard tools for the same library sequenced twice on the NextSeq (two different runs) vs the Stats for the same library from the HiSeq showing a 1-2% reduction in reads aligned and ~80% increase in mismatch rate.

Seq PCT_PF_READS_ALIGNED PF_MISMATCH_RATE PF_HQ_ERROR_RATE
NextSeq_R2 0.964464 0.022694 0.021512
NextSeq_R1 0.955108 0.025834 0.024588
HiSeq 0.973545 0.013678 0.013063


Now the data isn't entirely unusable for WGS if you have enough coverage you can still get variant calls out of it. However they're likely to have a higher FP and if you were looking for rare variants I would be very hesitant to use the data (especially for de novo mutations). For other uses this may be fine, but I've only experience with WGS and RNA-seq so I'll leave that for others to decide.
Attached Files
File Type: pdf BQSR-NextSeq-Before-After.pdf (343.9 KB, 98 views)
File Type: pdf recalQC-randomHIseq.pdf (245.8 KB, 59 views)
File Type: pdf nextSeq-Hiseq-Comp-web.pdf (392.8 KB, 95 views)
File Type: pdf next-seq-hiseq-comp2.pdf (614.4 KB, 77 views)
aeonsim is offline   Reply With Quote
Old 01-13-2015, 10:13 AM   #28
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by aeonsim View Post
The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).
I would certainly not want to use it for low-coverage variant calling!

Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.
Brian Bushnell is offline   Reply With Quote
Old 01-13-2015, 10:24 AM   #29
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,297
Default

Quote:
Originally Posted by Brian Bushnell View Post
I would certainly not want to use it for low-coverage variant calling!

Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.
One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

--
Phillip
pmiguel is offline   Reply With Quote
Old 01-13-2015, 10:31 AM   #30
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by pmiguel View Post
One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

--
Phillip
We are also allowing 0 mismatches in both cases (and typically end up with >20% of reads in the unknown bin, as a result). Right now our 2 leading candidate hypotheses are:

1) NextSeq has much lower cluster density;
2) NextSeq has a different order of {read1, read2, index1, index2, resynthesis} compared to HiSeq/MiSeq.
Brian Bushnell is offline   Reply With Quote
Old 01-21-2015, 11:39 PM   #31
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 497
Default

http://www.illumina.com/systems/next...ncer/kits.html

I find a NextSeq v2 kit here. Is it something new?
ymc is offline   Reply With Quote
Old 01-27-2015, 07:13 PM   #32
kentawan
Member
 
Location: Singapore

Join Date: Apr 2014
Posts: 14
Default

Quote:
Originally Posted by ymc View Post
http://www.illumina.com/systems/next...ncer/kits.html

I find a NextSeq v2 kit here. Is it something new?
I just gave my local distributor a call. He said that this kit will be ready for shipment on February 2015. Pricing will be the same as the v1 kits!

Finally some hope for NextSeq 500 users!
kentawan is offline   Reply With Quote
Old 03-03-2015, 05:28 PM   #33
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Hi Brian,

Thanks so much for this. I am trying to repeat your above commands, using interleaved files, and I get this error, can you help?
Thanks.

bbmap.sh maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
java -Djava.library.path=/bbmap/jni/ -ea -Xmx43110m -cp /bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, maxindel=200, in=trimmed.fq.gz, mhist=mhist.txt, bhist=bhist.txt, qhist=qhist.txt, qahist=qahist.txt]

BBMap version 34.56
Set match histogram output to mhist.txt
Set base content histogram output to bhist.txt
Set quality histogram output to qhist.txt
Set quality accuracy histogram output to qahist.txt
Retaining first best site only for ambiguous mappings.
No output file.
Exception in thread "main" java.lang.RuntimeException: Can't find file ref/genome/1/summary.txt
at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:815)
at fileIO.ReadWrite.getInputStream(ReadWrite.java:780)
at fileIO.TextFile.open(TextFile.java:277)
at fileIO.TextFile.<init>(TextFile.java:94)
at dna.Data.setGenome2(Data.java:839)
at dna.Data.setGenome(Data.java:785)
at align2.BBMap.loadIndex(BBMap.java:302)
at align2.BBMap.main(BBMap.java:32)
Elsie is offline   Reply With Quote
Old 03-03-2015, 05:35 PM   #34
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Elsie,

You have to index the reference first. For example:

bbmap.sh ref=genome.fasta

Wait for that to finish, then map.

-Brian
Brian Bushnell is offline   Reply With Quote
Old 03-03-2015, 05:36 PM   #35
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Thanks Brian, unfortunately I do not have a reference for this sequence!, so I'm assuming no way around this?
Elsie is offline   Reply With Quote
Old 03-03-2015, 05:38 PM   #36
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The only way around is to try to assemble it first. If it's a bacteria, and you have sufficient coverage, you can get a decent assembly in a few minutes with Velvet. BBMap will not work without an assembly, but it doesn't have to be a good assembly - a quick one with short contigs is fine for this purpose, as long as those contigs are several times larger than read length.
Brian Bushnell is offline   Reply With Quote
Old 03-03-2015, 05:39 PM   #37
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Thank you Brian, that is really helpful, and incredibly prompt. Thank you so much.
Elsie is offline   Reply With Quote
Old 03-03-2015, 06:00 PM   #38
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You're welcome - let us know if you discover anything interesting!
Brian Bushnell is offline   Reply With Quote
Old 03-06-2015, 03:10 AM   #39
Elsie
Member
 
Location: Australia

Join Date: Mar 2011
Posts: 85
Default

Hi Brian,
sorry, still having issues. I've now switched to some NextSeq data generated with mouse and human data. I'm getting Nas in my histograms, I think there is something wrong with my index, info.txt gives me:
#Chromosome sizes
#Generated on Fri Mar 06 21:58:51 EST 2015
#Version 5
#chrom scaffolds contigs length defined undefined startPad stopPad
1 4 41 493337098 479857220 13479878 8000 8000
2 5 61 512852808 496275279 16577529 8000 8000
3 3 70 439025362 428441699 10583663 8000 8000
4 3 78 468391081 456374140 12016941 8000 8000
5 3 46 424587819 413803382 10784437 8000 8000
6 4 155 387396307 372786010 14610297 8000 8000
What happened to the other chromosomes? I must be doing something wrong but I am just doing the bbmap ref command as indicated previously.
thanks.
Elsie is offline   Reply With Quote
Old 03-06-2015, 04:25 AM   #40
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Quote:
Originally Posted by Elsie View Post
Hi Brian,
sorry, still having issues. I've now switched to some NextSeq data generated with mouse and human data. I'm getting Nas in my histograms, I think there is something wrong with my index
What is Nas? (N's?)

The index should be ok. I think Brian is concatenating all chromosomes and then creating the index so that file is not a literal equivalent of human/mouse genome (file I have looks similar to yours).

Are you getting an error when you do the mapping?
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO