Seqanswers Leaderboard Ad

**flxlex** · 11-27-2013, 11:49 PM

A long insert library is definitely going to help your assembly. As for the library you have, picking one kmer size and fixing the library insert size may be suboptimal. You could try running preqc from SGA, see https://github.com/jts/sga/wiki/preqc, this can tll you a lot about your data. You may have gotten a library with a different insert size than you thought, for example. Also, try velvetOptimizer, part of velvet, to hget optimal settings for this program. I don't know enough about to other programs to be able to help you.

**gringer** · 11-28-2013, 02:44 AM

Have you tried Ray? This looks like the sort of thing it was designed for.

I find the coverage strange also. Your effective coverage should be much higher with a paired-end sequencing run. Perhaps there are so many duplicated errors that it is confusing the assembly process, and it's calculating an average read separation of 50bp rather than 300bp. Have you tried it out with a random subsample of your reads (say 20-40M)?

**mastal** · 11-28-2013, 04:26 AM

Originally posted by metheuse View Post

4. The reported coverage is about 50X, with genome size estimated to be ~70M. But if I calculate the coverage from my reads, it would be 160M x 2 x 100/70M = ~400x. How can it be decreased to 50x? Well, fastqc did show that there is high duplication level in the raw reads (>70%). Could this be the reason?

velvet calculates coverage as kmer coverage, the formula is in the velvet manual, so that would account for some of the difference between what you calculated and what velvet reports.

I find that the coverage velvet reports seems to be calculated on the number of reads it uses in the assembly, rather than the total number of reads.

The last line in the Log file after running velvetg gives the number of reads used in the assembly.

**metheuse** · 11-28-2013, 10:35 AM

Originally posted by flxlex View Post

A long insert library is definitely going to help your assembly. As for the library you have, picking one kmer size and fixing the library insert size may be suboptimal. You could try running preqc from SGA, see https://github.com/jts/sga/wiki/preqc, this can tll you a lot about your data. You may have gotten a library with a different insert size than you thought, for example. Also, try velvetOptimizer, part of velvet, to hget optimal settings for this program. I don't know enough about to other programs to be able to help you.

Thanks a lot! I'll take a look at SGA and velvetOptimizer.

**metheuse** · 11-28-2013, 10:40 AM

Originally posted by gringer View Post

Have you tried Ray? This looks like the sort of thing it was designed for.

I find the coverage strange also. Your effective coverage should be much higher with a paired-end sequencing run. Perhaps there are so many duplicated errors that it is confusing the assembly process, and it's calculating an average read separation of 50bp rather than 300bp. Have you tried it out with a random subsample of your reads (say 20-40M)?

Thanks! I'll take a look at Ray.
I didn't try with random subsamples? Would that help?

**metheuse** · 11-28-2013, 10:41 AM

Originally posted by mastal View Post

velvet calculates coverage as kmer coverage, the formula is in the velvet manual, so that would account for some of the difference between what you calculated and what velvet reports.

I find that the coverage velvet reports seems to be calculated on the number of reads it uses in the assembly, rather than the total number of reads.

The last line in the Log file after running velvetg gives the number of reads used in the assembly.

Thanks! It used 85% of the input reads, so still it can't explain the coverage difference.

**gringer** · 11-28-2013, 12:10 PM

Originally posted by metheuse View Post

I didn't try with random subsamples? Would that help?

It will reduce the number of random errors that appear multiple times. You could alternatively use the kmer-based error correction of SGA, but I'm not sure that SGA will work so well with 100bp reads.

**metheuse** · 12-02-2013, 01:51 PM

Thanks a lot for all your replies above. I'm currently trying velvetoptimizer and SGA on the data.

Besides, I have another question:
Do I have to choose the same Kmer size for error correction and assembly? I was using SOAPec which allows up to 27 as k-mer size, but I wanted to use k=75 in assembly. Would this difference matter?
Thanks.

**gringer** · 12-02-2013, 03:38 PM

Do I have to choose the same Kmer size for error correction and assembly?

No, they are separate independent functions.

**ctseto** · 12-13-2013, 07:15 AM

I've had pretty good results using SPAdes. Comparing my velvet runs against SPAdes from common files is pretty night and day.

An additional long-insert library might be helpful, especially if you've got an organism known to have lots of repetitive genome.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Assembly with a single library always fail?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News