Seqanswers Leaderboard Ad

**ECO** · 01-22-2011, 08:45 AM

This is probably a good recent paper to read:

http://www.nature.com/nmeth/journal/v8/n1/full/nmeth.1527.html

**CC_seqanswers** · 01-22-2011, 09:06 AM

Thanks a lot!

I know it's too much to ask but could you email me a copy of the paper? I don't have access to it.

Again, thanks!

CC

Originally posted by ECO View Post

This is probably a good recent paper to read:

http://www.nature.com/nmeth/journal/...meth.1527.html

**CC_seqanswers** · 01-22-2011, 10:39 AM

No worries. I got it from a friend.

Have a nice day!

CC

Originally posted by CC_seqanswers View Post

Thanks a lot!

I know it's too much to ask but could you email me a copy of the paper? I don't have access to it.

Again, thanks!

CC

**flxlex** · 01-24-2011, 12:54 AM

Originally posted by CC_seqanswers View Post

N50 contig:150 bp;
Max contig: ~3K
median contig: 130 bp

In my humble opinion, these are not very encouraging numbers. If you look at recent genomes (panda!, turkey, apple, cacao, strawberry, you name it), these have much better metrics with lower coverage. Also, don't you have scaffolds?

**rwenang** · 01-24-2011, 01:55 AM

I agree that its not an encouraging number, though the problem most likely lies in soap denovo configuration rather than the data itself. I used soap denovo several times and found it a bit hard to tweak, most assembly from a simulation dataset produce somewhat very short N50 contig. Afterwards, I redo it using CLC assembler, and it turns out fine.

My point is, you should try another assembler such as velvet,mira,clc. or contact the author asking for recommended config.

**natstreet** · 01-24-2011, 02:08 AM

Can you give us some basic background about what was sequenced - for example is it inbred or likely to be highly heterozygous? This will also affect your assembly results and different assemblers behave differently or require tweaking to best handle such cases.

The recent strawberry and cocoa genomes were both based on at least partially inbred lines. They both had decent amounts of 454 data and the cocoa project had some Sanger BAC sequences too.

We've found that the SOAP developers simply don't reply to emails so I would suggest trying Velvet if you have enough RAM or ABySS.

Are your reads all shotgun?

**CC_seqanswers** · 01-24-2011, 07:51 AM

We also tried AbySS, which only ended up slightly longer contigs.

Does CLS requires even more memory ?

Originally posted by rwenang View Post

I agree that its not an encouraging number, though the problem most likely lies in soap denovo configuration rather than the data itself. I used soap denovo several times and found it a bit hard to tweak, most assembly from a simulation dataset produce somewhat very short N50 contig. Afterwards, I redo it using CLC assembler, and it turns out fine.

My point is, you should try another assembler such as velvet,mira,clc. or contact the author asking for recommended config.

**CC_seqanswers** · 01-24-2011, 07:56 AM

1. I believe it's heterzygous and it's a plant which is supposed to have substantial repetitive sequences.

2. All the data are pure short reads containing 200bp short insert and 2k/5k mate pair reads.

3. We tried ABySS and it did not seem help a lot.

4. Can any one tell me that, with current avaialbe assembler, is it possible/feasible to do de novo assembly at all from a dataset containing pure short reads, such as Illumina data? If not, what else can help? Will 454 read, which are bit logner, help at all?

Originally posted by natstreet View Post

Can you give us some basic background about what was sequenced - for example is it inbred or likely to be highly heterozygous? This will also affect your assembly results and different assemblers behave differently or require tweaking to best handle such cases.

The recent strawberry and cocoa genomes were both based on at least partially inbred lines. They both had decent amounts of 454 data and the cocoa project had some Sanger BAC sequences too.

We've found that the SOAP developers simply don't reply to emails so I would suggest trying Velvet if you have enough RAM or ABySS.

Are your reads all shotgun?

**CC_seqanswers** · 01-24-2011, 07:57 AM

Sorry, another question. Is Velvet good only for small genome? The one we are working on is estimated to be around 800M.

Thanks so much!

Originally posted by CC_seqanswers View Post

1. I believe it's heterzygous and it's a plant which is supposed to have substantial repetitive sequences.

2. All the data are pure short reads containing 200bp short insert and 2k/5k mate pair reads.

3. We tried ABySS and it did not seem help a lot.

4. Can any one tell me that, with current avaialbe assembler, is it possible/feasible to do de novo assembly at all from a dataset containing pure short reads, such as Illumina data? If not, what else can help? Will 454 read, which are bit logner, help at all?

**rwenang** · 01-24-2011, 06:24 PM

I used it with a 36GB machine, but I never tried it with a 100x data before. There is a 30day trial license if you want to try at http://www.clcbio.com.
and for the record i dont get any incentive from recommending clc

Anyway, your case is quite interesting, there are several steps that I might do if I were in your shoes:

1. try reducing the reads up to 60x or less, either by removing duplicates (i suspect you have done this) or simple quality-based filtering. Some studies have shown that more coverage does not necessarily means better assembly. because abundance of reads might mess up the algorithm.

2. try allpaths-lg from broad. never used it but its the latest new assembler out there (i think).

3. try another assembler which are based on overlapping consensus, ie celera, phrap, etc. use a strict overlapping criteria. If the distribution of 100x data is good, then the assembly should be good. Though it might fail to detect repeats.

Originally posted by CC_seqanswers View Post

We also tried AbySS, which only ended up slightly longer contigs.

Does CLS requires even more memory ?

**Torst** · 08-01-2011, 09:56 PM

Originally posted by CC_seqanswers View Post

1. I believe it's heterzygous and it's a plant which is supposed to have substantial repetitive sequences.
2. All the data are pure short reads containing 200bp short insert and 2k/5k mate pair reads.
3. We tried ABySS and it did not seem help a lot.
4. Can any one tell me that, with current avaialbe assembler, is it possible/feasible to do de novo assembly at all from a dataset containing pure short reads, such as Illumina data? If not, what else can help? Will 454 read, which are bit logner, help at all?

1. That will confuse most assemblers.

2. So you have 3 Illumina libraries: 200 PE, 2000 MP, and 5000 MP. I assume the 100x is the combined depth. What were the readlengths (100bp?) and yields per library?

3. Did you just use default parameters? Are you sure you set up SOAPdenovo properly?

4. Yes, you should be able to do this on Velvet, but you will need a machine with about 900GB RAM eg. a 1TB Dell R910.

**edge** · 11-24-2012, 03:11 AM

Hi,

Can you email me a copy of the paper that you mention regarding "how to assess a de novo assembly result?"?
Many thanks.

I can't access it too

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to assess a de novo assembly result?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News