SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
denovo assembly for small rna sequencing vishwesh Bioinformatics 5 06-17-2014 06:33 AM
Coverage requirement for denovo sequencing seqqeq Bioinformatics 3 07-12-2013 07:56 AM
Denovo sequencing on known reference foxyg Bioinformatics 2 09-16-2010 08:01 AM
In Sequence: Invitrogen Developing Third-Gen Sequencer; Sequencing to Be a Focus Afte Newsbot! SOLiD 0 06-17-2008 03:52 PM

Reply
 
Thread Tools
Old 11-03-2015, 01:47 AM   #1
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default Which sequencer for deNovo sequencing?

Hi, good morning,

In my company we are planning to buy a Illumina sequencer to do deNovo sequencing of few animal genomes in a project. It is the first time we do such kind of experiments so we don't have experience in them and we have some doubts to decide which sequencer should we buy.

I understand that we have to buy a sequencer which generates large amounts of long reads with a high coverage, such the HiSeq2500 but perhaps its cost is too high.

I wonder if we could do deNovo assembly with a cheaper sequencer (as the NextSeq500). My question is: since the total output of the sequencer is significantly lower, would it be possible to do deNovo assembly by making some kind of operation to "divide" the genome and sequence a specific part of it in each experiment so that at the end we can assembly all the intermediate results?

Thank you very much

Best regards,
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 01:50 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you only need to do a few genomes then why not just pay a sequencing facility to do the sequencing for you? It seems like serious overkill to drop a few hundred thousand dollars on a machine that won't be fully used when you could just pay someone else a fraction of that to do all the work for you.
dpryan is offline   Reply With Quote
Old 11-03-2015, 02:04 AM   #3
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Thank you for your response. Of course, we could pay someone to do the sequencing but the first option is buying the sequencer because besides the deNovo assembly of those genomes, we want to perform RNA-Seq and resequencing (much more sequencings).

That's why we want to buy the sequencer but since the HiSeq 2500 is expensive, we would like to know if we can do deNovo assembly of animals with the NextGen500.

Thank you very much
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 02:41 AM   #4
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,223
Default

You can do de novo assembly, RNAseq and resequencing with NextSeq and even with MiSeq. The question is the cost which decreases as you move from MiSeq to NextSeq, HiSeq 2500, HiSeq 400 to HiSeq X. It probably will be cheaper if you outsource your sequencing than buying an instrument that will cost in maintenance, learning curve wastage and also other investment such as a qPCR machine, a microfluidic instrument, a person to operate it and etc.
nucacidhunter is offline   Reply With Quote
Old 11-03-2015, 03:08 AM   #5
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Ok, thank you very much. So just to be clear (sorry but my background isn't in genetics), I will propose to buy NextSeq 500 because we have to make many reseq. and RNA-seq experiments. But would it be possible to do deNovo of (for example) a local variety of pig with that sequencer? Do the reads will have enough coverage to assemble the genome? (if not we can pay someone to do deNovo seq.) How many sequencer runs are necessary to obtain that genome?

Thank you very much
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 03:20 AM   #6
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,223
Default

You can do pig genome de novo with two NextSeq high output run (around 80x coverage).
nucacidhunter is offline   Reply With Quote
Old 11-03-2015, 03:22 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

To simplify, yes, you can do de novo assembly from NextSeq reads. How many runs you will have to do is not something we can answer since we don't know anything about your local pig and its genome. More likely than not you'll do a run or two, see how the assembly looks and then sequence more as appropriate (or outsource as needed). You'll also need to define internally what you mean by "obtaining a genomes" (namely, what degree of "finished" the genome should have before being declared as good enough).
dpryan is offline   Reply With Quote
Old 11-03-2015, 03:26 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,966
Default

Sounds like you have not thought about the economics of purchasing a sequencer through.

Buying a sequencer makes sense only if you know that you are going to be using it regularly for the next ~3 years (life span of current technology). Cost of the sequencer is just one part of the equation. You will need to figure in some informatics support (either local or in BaseSpace) along with the cost of the consumables (not trivial) and an annual maintenance contract.

It is tempting to have a new piece of technology "in house" but it could turn out to be a white elephant if you can't make full use of it capacity/ability.

As others have already suggested, estimate the number of samples/runs you plan to do over a year and then sit down and compare the costs of getting this done commercially (http://allseq.com and http://genohub.com to comparison shop) versus buying a sequencer (remember to include the costs mentioned above in addition to cost of the sequencer).

Note: We are not trying to discourage you but rather ensuring that you consider all facts before making a decision.
GenoMax is offline   Reply With Quote
Old 11-03-2015, 03:47 AM   #9
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Thank you very much for your responses.

Of course, I know that it is not only about purchasing the sequencer and that there are so many costs that I have to consider... but we want to buy it because our plan is to do RNA-seq and reseq. of many samples and deNovo of few samples. I don't really know the exact number of runs but I'll try to calculate how many we will need to do so that it's worth buying the sequencer.

The reason why I've opened this thread is because I didn't know if I could do deNovo seq. of a pig with a sequencer like NextSeq500 because apparently all people use HiSeq. Evidently the cost of the HiSeq is higher and if I can do deNovo with the NextSeq I prefer that (altough it needs more sequencer runs) because the most of the experiments that we need to do is RNA-Seq and reseq.

Thank you very much, I founded your responses very helpful.
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 03:55 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,966
Default

Perfect.

Cost per base decreases as you go from a MiSeq --> NextSeq --> HiSeq versions but the capital outlay also increases significantly. If you know of local unmet needs for sequencing you could get a HiSeq and go into business of selling that spare sequencing capacity for profit
GenoMax is offline   Reply With Quote
Old 11-03-2015, 08:11 AM   #11
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Yes, you're right, unfortunately the rules of the project the sequencer will be associated to forbides us to deploy services in order to amortize it.
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 09:22 AM   #12
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by Alfonso-Rourich View Post
The reason why I've opened this thread is because I didn't know if I could do deNovo seq. of a pig with a sequencer like NextSeq500 because apparently all people use HiSeq.
That has nothing to do with the platform itself; it's because the HiSeq came out way before the NextSeq. There has not really been enough time yet to publish major NextSeq papers.

And to second what has already been said on this thread - sequencing machines are expensive, but the total cost of owning and operating them is way more than just the price of the machine. It will not be a worthwhile investment unless you can operate it (and all the associated robots and other equipment) at a very high capacity for several years, as happens at a for-profit sequencing center you could outsource the work to.
Brian Bushnell is offline   Reply With Quote
Old 11-03-2015, 02:19 PM   #13
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Yes, I know there are lots of costs which I have to consider. However, the sequencer will be placed in a center where its staff have background and experience in genetics and for the moment for them it is worth.

Thank you very much for your help

Regards
Alfonso-Rourich is offline   Reply With Quote
Old 11-03-2015, 05:26 PM   #14
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 407
Default

For mammalian genomes there is a good argument to be made to use the Discovar assembler - since it can generate some spectacularly great genome assemblies. The assembler however requires paired-end 250 bp reads (which the Hiseq3000/4000 and the Nextseqs do not offer) but the Hiseq2500 allows in rapid mode. The best option for PE250 bp read sequencing would again be outsourcing because buying a Hiseq2500 does not make much economical sense (but the used ones are cheap now).
luc is offline   Reply With Quote
Old 11-04-2015, 12:08 AM   #15
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

luc, thank you for your recommendation. Correct me if I'm wrong but the NextSeq specifications say that there are high output kits which can generate paired-end reads of 2x150 bp. Isn't that what you mean it is needed to perform deNovo assembly? (in fact they are longer reads).

Anyway, although we want to do deNovo assembly, for us purchasing the secuencer is worth because of RNA-Seq and reseq. and in the worst-case scenario we will outsource deNovo experiments.
Alfonso-Rourich is offline   Reply With Quote
Old 11-04-2015, 03:00 AM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,966
Default

No.

@luc is referring to DISCOVAR DeNovo which is an assembler meant for large genomes. It *requires* 2 x 250 bp reads which currently can only be produced by suitably equipped HiSeq 2500 (in the amount needed, so MiSeq practically does not count).

On a different note, DISCOVAR DeNovo also needs ~1 TB(+) of RAM to function well (e.g. you are going to use 500 million reads for assembly). You read that right!

Last edited by GenoMax; 11-04-2015 at 03:33 AM.
GenoMax is offline   Reply With Quote
Old 11-04-2015, 08:24 AM   #17
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

The one time I tried out DISCOVAR DeNovo on a mammalian genome I had to borrow a 512 GB machine. As is all too typical for bioinformatics programs DISCOVAR at times took up all CPUs and at other times poked along using a single CPU. Relevant lines from the log file follow:

Code:
physical memory: 504.74 GB

using 708,777,836 reads
data extraction complete, peak mem = 260.85 GB
3.27 hours used extracting reads

back from buildReadQGraph
memory in use = 191.83 GB, peak = 405.28 GB

1 peak mem usage = 405.28 GB
2.42 minutes used loading stuff
2 peak mem usage = 405.28 GB
launching gap assemblies, mem usage = 179,701,415,936

now processing 411707 blobs
memory in use = 191.38 GB, peak = 405.28 GB

contig line N50: 46,487
scaffold line N50: 108,870
total bases in 1 kb+ scaffolds: 2,223,980,361
total bases in 10 kb+ scaffolds: 2,102,334,133
There are 708,777,836 reads of mean length 229.9 and mean base quality 34.3.
MPL1 = mean length of first read in pair up to first error = 199
(normal range is 175-225 for 250 base reads)
Estimated chimera rate in read pairs (including mismapping) = 0.46%.
genomic read coverage, using 1 kb+ scaffolds for genome size estimate: 73.3

peak mem usage = 405.28 GB, total time = 40.9 hours
Since I had mate-pairs I followed up DISCOVAR with BESST and got a very nice 2.4 GB genome with max of 9.7 MB, N50 of 1.8 MB with 375 scaffolds at N50 or greater.

My "go-to" default assembler (ABySS) only came up with a 2.4 GB genome with max of 2.6 MB, N50 of 230 KB with 2,689 scaffolds at N50 or greater. So DISCOVAR/BESST is a nice option if you have the reads.
westerman is offline   Reply With Quote
Old 11-05-2015, 03:38 AM   #18
Alfonso-Rourich
Member
 
Location: Extremadura (Spain)

Join Date: Mar 2013
Posts: 11
Default

Ok, sorry, I thought that reads needed to be 250 bp long in total not 2x250 bp. We cannot afford (and it has no sense) a HiSeq but we could purchase a NextSeq and try to perform deNovo (besides RNA-Seq and resec. but for that there is no problem).

I work in a supercomputing center (but logically the sequencer will be used by a partner with background in genetics) so those computational requirements wouldn't be a problem. I'll have a look to DISCOVAR assembler.

Thank you, best regards
Alfonso-Rourich is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO