Go Back   SEQanswers > General

Similar Threads
Thread Thread Starter Forum Replies Last Post
Get coverage of each site on contig generated by Velvet genelab Bioinformatics 5 08-03-2013 10:40 PM
absolute k-mer coverage explained (Abyss) harrb Bioinformatics 1 12-21-2010 01:01 PM
k-mer size impacts coverage distribution (animated gif inside!) seb567 Bioinformatics 0 11-06-2010 05:20 PM
ABySS contig coverage? jgibbons1 Bioinformatics 2 09-24-2010 06:56 AM
How quantitative is contig coverage? AlexB 454 Pyrosequencing 4 09-11-2009 02:38 AM

Thread Tools
Old 02-13-2012, 10:19 PM   #1
Location: Singapore

Join Date: Apr 2011
Posts: 10
Default Contig length, k-mer coverage, and differential expression

I'm working with some data where I have a read count and k-mer coverage (Ck) for a set of contigs and scaffolds across different conditions. I've recently heard and read a few very confusing explanations of k-mer coverage, so would appreciate some clarification. From what I gather, Ck is directly related to base coverage. But, can the size of a contig be determined if I know the Ck value, read length, and read number for that specific contig? Or would this calculation not work for a de novo transcriptome where read coverage varies greatly between contigs and scaffolds?

For example, here are my numbers for contig A:

Read length = 75 b
Read count = 185,600 reads
Ck = 63
hash length = 31

When I plug all this into Ck = C*(rL-k+1)/rL where C=coverage (read length*reads/contig length (cL)) and rL = read length, I get a value for cL of about 127 kb. However, when I go back to the raw data and look at that contig's sequence, I find it to be only .823 kb. Not sure how the total reads for the run figure into this, but I have ~40 million reads for this condition.

Because C depends on the read count, my best guess is that contigs and scaffolds that have relatively high or low expression over the mean will have Ck values unrepresentative of the contig length. But I feel clueless, and my partner appears to be only acting as if he knows. I have a feeling I'm misunderstanding something completely obvious.

Any help on this matter would be greatly appreciated.
nbogard is offline   Reply With Quote
Old 08-03-2013, 10:41 PM   #2
Location: DDN

Join Date: Mar 2013
Posts: 10

Hi, all
I am new to denovo genome assembly. I have a fastq sequence data which i have to assemble using velvet. I used the velvet optimiser script with different hash length from 27 to 41 and it predicted best to be 37. The output file contigs.fa contains 260 contigs whereas log file predicts 283 nodes, where are the rest gone? Length given in contigs.fa is in k mers? how do i calculate it's actual nucleotide length in bp?. How do i understand whether the assembly is good or bad. FInal stat given after script running:
Final graph has 283 nodes and n50 of 347, max 2336, total 68614, using 19064/50000 reads
Why are the number of used reads so low?
diptarka is offline   Reply With Quote
Old 08-04-2013, 02:05 AM   #3
Senior Member
Location: uk

Join Date: Mar 2009
Posts: 667
Default Contig length, k-mer coverage, and differential expression

I'm pretty sure velvet has a cutoff value for the length of the contigs
listed in the contigs.fa file, although I don't remember off the top of my head what that is. So the missing contigs are probably the very short ones.

The formula for calculating kmer coverage from base coverage is
given in the velvet manual. See

As to whether the assembly is good, have a look at this Nature Methods article entitled ''De novo genome assembly: what every biologist should know"
mastal is offline   Reply With Quote
Old 09-10-2013, 11:30 AM   #4
Location: DDN

Join Date: Mar 2013
Posts: 10

What is the twin node as specified in velvet? It says reverse of reverse complement k merss. How are contigs actually generated using paired end assembly with velvet? can someone show using an example?
diptarka is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 10:47 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO