SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   low 454 coverage combined with high solexa coverage (http://seqanswers.com/forums/showthread.php?t=7202)

strob 10-07-2010 03:54 AM

low 454 coverage combined with high solexa coverage
 
Hi,

has anybody experience with combining following two datasets:

1X coverage of 454 reads (backbone)
30X coverage of solexa reads

background: we are talking about a non sequenced plant genome. So I would use the 1x 454 reads as a backbone for the solexa reads to perform a de novo assembly.

Question: is a 1X 454 coverage in this case a waste of money or a real help in the assembly? Somebody experience with this?

jimmybee 10-07-2010 04:46 AM

How repetitive is your plant genome?

natstreet 10-07-2010 05:16 AM

I don't have a good answer but this is something of a hot topic to me as we are doing much the same, although I have higher 454 coverage.

For plants a big factor can be how polymorphic your species is as well as the repeat structure.

In general, I would be really interested to know how people are effectively integrating 454 and Illumina data. Do you compile them on their own and then combine those assemblies or are you compiling the data all together? In either case, what assemblers are you using?

strob 10-07-2010 05:29 AM

highly repetitive....
we have the illumina dataset available. But we are thinking of adding a 454 low coverage set. I think we can do three things:
- all de novo (hybrid assembly)
- illumina de novo and than map them back on the 454 reads
- map the illumina reads directly to the 454 reads

Before doing this, I want to know if a 454 run will bring additional information.
Tools? I was thinking of MIRA

jimmybee 10-07-2010 05:40 AM

If its highly repetitive (my definition of highly would be >80%), then doing a 1x coverage run wouldn't be particularly effective, nor will it compliment the illumina data for the hybrid assembly. You'll need to figure out a few things like how finished do you want the sequence and what information do you want out of the assembly (eg. just good assembly of genes or repeats).

To answer natstreet: Hybrid assemblies with different types of data are the way to go for repetitive genomes (such as cereal crops). We've found that integrating differing types of data (paired end/fragment), different insert sizes and read lengths can been very beneficial to the assembly.

natstreet 10-07-2010 06:33 AM

Quote:

Hybrid assemblies with different types of data are the way to go for repetitive genomes (such as cereal crops). We've found that integrating differing types of data (paired end/fragment), different insert sizes and read lengths can been very beneficial to the assembly.
I have shotgun 454, paired end 454 and a range of paired end Illumina libraries as well as a mate pair library. I haven't yet found an assembler that can take all of the data for a hybrid assembly on any machine that I have access to. Velvet and Mira both take both types of data but have huge RAM requirements and are simply impractical to run. For hybrid cereal assemblies, what software are you using?

jimmybee 10-07-2010 06:45 AM

velvet. I feel your pain in regards to the RAM requirements. We only just got something can handle the requirements. I've compiled SOAPdenovo and Euler-SR but have yet to play around with them

glacerda 10-07-2010 10:14 AM

It is crucial to correct your reads prior to assembly (using the SOAPdenovo correction tool, SHREC or other). This will save memory in the assembly stage.

Last, SOAPdeNovo uses much less memory than velvet, although in my personal experience velvet produces slightly better assemblies.

Don't forget to optimize the parameters, specially the k-mer size. This has a great influence on memory/time and quality of assembly.


All times are GMT -8. The time now is 03:48 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.