SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Combined assembly of 454 and Solexa reads zqiqi0808 Bioinformatics 1 07-18-2011 06:50 AM
Maq Ė how to filter low coverage areas? icg Bioinformatics 0 01-06-2011 12:49 AM
compare distribution of coverage in solexa data scuellar Genomic Resequencing 3 03-10-2010 09:36 AM
Very high depth of coverage knott76 Bioinformatics 5 11-19-2009 12:27 AM
PubMed: Low-coverage massively parallel pyrosequencing of cDNAs enables proteomics in Newsbot! Literature Watch 0 04-09-2008 05:31 AM

Reply
 
Thread Tools
Old 10-07-2010, 03:54 AM   #1
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default low 454 coverage combined with high solexa coverage

Hi,

has anybody experience with combining following two datasets:

1X coverage of 454 reads (backbone)
30X coverage of solexa reads

background: we are talking about a non sequenced plant genome. So I would use the 1x 454 reads as a backbone for the solexa reads to perform a de novo assembly.

Question: is a 1X 454 coverage in this case a waste of money or a real help in the assembly? Somebody experience with this?
strob is offline   Reply With Quote
Old 10-07-2010, 04:46 AM   #2
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

How repetitive is your plant genome?
jimmybee is offline   Reply With Quote
Old 10-07-2010, 05:16 AM   #3
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

I don't have a good answer but this is something of a hot topic to me as we are doing much the same, although I have higher 454 coverage.

For plants a big factor can be how polymorphic your species is as well as the repeat structure.

In general, I would be really interested to know how people are effectively integrating 454 and Illumina data. Do you compile them on their own and then combine those assemblies or are you compiling the data all together? In either case, what assemblers are you using?
natstreet is offline   Reply With Quote
Old 10-07-2010, 05:29 AM   #4
strob
Member
 
Location: Belgium

Join Date: Nov 2008
Posts: 79
Default

highly repetitive....
we have the illumina dataset available. But we are thinking of adding a 454 low coverage set. I think we can do three things:
- all de novo (hybrid assembly)
- illumina de novo and than map them back on the 454 reads
- map the illumina reads directly to the 454 reads

Before doing this, I want to know if a 454 run will bring additional information.
Tools? I was thinking of MIRA
strob is offline   Reply With Quote
Old 10-07-2010, 05:40 AM   #5
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

If its highly repetitive (my definition of highly would be >80%), then doing a 1x coverage run wouldn't be particularly effective, nor will it compliment the illumina data for the hybrid assembly. You'll need to figure out a few things like how finished do you want the sequence and what information do you want out of the assembly (eg. just good assembly of genes or repeats).

To answer natstreet: Hybrid assemblies with different types of data are the way to go for repetitive genomes (such as cereal crops). We've found that integrating differing types of data (paired end/fragment), different insert sizes and read lengths can been very beneficial to the assembly.
jimmybee is offline   Reply With Quote
Old 10-07-2010, 06:33 AM   #6
natstreet
Member
 
Location: Sweden

Join Date: Nov 2009
Posts: 83
Default

Quote:
Hybrid assemblies with different types of data are the way to go for repetitive genomes (such as cereal crops). We've found that integrating differing types of data (paired end/fragment), different insert sizes and read lengths can been very beneficial to the assembly.
I have shotgun 454, paired end 454 and a range of paired end Illumina libraries as well as a mate pair library. I haven't yet found an assembler that can take all of the data for a hybrid assembly on any machine that I have access to. Velvet and Mira both take both types of data but have huge RAM requirements and are simply impractical to run. For hybrid cereal assemblies, what software are you using?
natstreet is offline   Reply With Quote
Old 10-07-2010, 06:45 AM   #7
jimmybee
Senior Member
 
Location: Adelaide, Australia

Join Date: Sep 2010
Posts: 119
Default

velvet. I feel your pain in regards to the RAM requirements. We only just got something can handle the requirements. I've compiled SOAPdenovo and Euler-SR but have yet to play around with them
jimmybee is offline   Reply With Quote
Old 10-07-2010, 10:14 AM   #8
glacerda
Member
 
Location: Brazil

Join Date: Aug 2008
Posts: 27
Default

It is crucial to correct your reads prior to assembly (using the SOAPdenovo correction tool, SHREC or other). This will save memory in the assembly stage.

Last, SOAPdeNovo uses much less memory than velvet, although in my personal experience velvet produces slightly better assemblies.

Don't forget to optimize the parameters, specially the k-mer size. This has a great influence on memory/time and quality of assembly.
glacerda is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO