Seqanswers Leaderboard Ad

**jnfass** · 08-10-2009, 10:07 AM

Blast to related species' transcriptome?

You could also run velvet with -amos_file yes, and convert to ace, and view the assemblies, to get a feel for what they look like and see if you're comfortable with the assemblies.

Finally, velvet seems to work best with kmer coverages from 20-30X ... your data set may not get up to that (depending on the organism), but if it's much higher than that, you might consider sub-sampling to bring the kmer coverage down ... oddly enough, this may help your assembly.

**magick** · 08-11-2009, 08:03 PM

Yup, I did the blast and view the assemblies before but no significant information that I can extract from the results. Or maybe I am not good enough in analyze the results.

Sorry can I know how to calculate the k-mer coverage from transcriptome data?

**jnfass** · 08-12-2009, 09:03 AM

yah - it's all gray area (no clear lines) with blast results from assemblies

So, if you have a transcriptome size estimate, you want enough reads to have ~20-30X kmer coverage, as described here:

Error: 404 | EMBL-EBI

http://www.ebi.ac.uk/~zerbino/velvet/hash_length_choice.html

Too low is obviously bad, but I've also found that extremely high kmer coverage can kill an assembly ... but that's probably over 100-500X ...

There are other discussions of how to calculate kmer cov on seqanswers .. but let me know if it's not clear ...

**bea** · 08-13-2009, 01:54 AM

Hi,

I’m struggling with a similar problem. I’ve got very high coverage (>6000, this is taken from the contig names in the velvet contig.fa file). Does this mean that my assembly is not optimal? What is mend by “subsampling” ? Dividing my reads in different subsets, do separate assemblies and then try and assemble the contigs into longer contigs?

Thanks

**jnfass** · 08-13-2009, 09:05 AM

Hi Bea. Yes- if you randomly pick a smaller number of your reads, corresponding to lower coverage, then assemble ... and I would probably generate several (5? 10?) random subsamples, and assemble each, for statistical purposes (though, then you have to compare them somehow).

A clear case in which I've seen this is with phiX. I tried to assemble the control lane of phiX reads from one of our Illumina runs, and got a terrible assembly (N50 < 100?). Then, after subsampling down to ~ 20-30X kmer coverage, velvet assembled phiX174 perfectly, in one contig.

I'm not sure if other assemblers have this problem (Mira's author seems to think that could be the case), or whether it's a general issue or specific to an assembler's algorithm.

**jnfass** · 08-13-2009, 09:06 AM

Also, Bea, note that when you see a coverage value in a velvet contig name, that's k-mer coverage ... and the length is in k-mers as well.

**magick** · 08-13-2009, 07:42 PM

I had come through those threads before but don't know how to calculate the coverage and thus can't get the k-mer coverage from the formula.

As discussed here,

Coverage (Velvet) - SEQanswers

http://seqanswers.com/forums/showthread.php?t=1529&highlight=coverage+velvet

Hello, all Does anyone know what the coverage means in velvet ? According to the manual of velvet, the relation between k-mer coverage Ck and standard (nucleotide-wise) coverage C is Ck = C*(L-k+1)/L. Is C calculated as follows ? Read length: 36bp Number of reads: 50,000,000 X 2 (paired-end) Size of reference

It included the reference sequence for calculating coverage. But isn't velvet a de novo assembler without using reference?

So, more helps needed on calculating the coverage. Thanks.

Another problem is that most are discussed about the genomic data, is there any differences between transcriptomic data and genomic in calculating their coverage?

**jnfass** · 08-19-2009, 01:42 PM

@magick: If you have some estimate of the size of the genome you're trying to assemble, that might be the best you can do. Of course, a run through velvetg without any parameters specified will result in some statistics (in the stats.txt file) that can help you estimate the coverage, as described in the manual.

**Zigster** · 08-20-2009, 04:14 PM

the coverage cutoff will also have a huge effect on total coverage (i.e. assembly length) and contig count. Make sure you explore that setting from 2x-10x (measured in kmers)

**zhangju** · 11-04-2011, 02:54 PM

If my velvet contigs have very broad cov distribution from 2 to 6000, is subsetting data necessary to improve the assembly.

Thanks,

Justin

**sphil** · 11-08-2011, 06:57 AM

Hey,

i hope that helps.

Theauthors of Velvet recommend to choose k as: E(X) = C * ((l - k + 1) / l),
where E(X) = number X of times a k-mer in a genome of length G
is observed in a set of n reads of length l , where
C = n * l/G=coverage. Choose k odd and larger than 10.

best,

phil

Topics	Statistics	Last Post
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 57 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM
TIGR Systems Offer a Compact Alternative to CRISPR for Gene Editing by seqadmin Started by seqadmin, 03-03-2025, 01:15 PM	0 responses 201 views 0 reactions	Last Post by seqadmin 03-03-2025, 01:15 PM

Seqanswers Leaderboard Ad

Choice of hash length k in velvet

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News