Seqanswers Leaderboard Ad

**winsettz** · 08-06-2013, 08:44 AM

This question rightly belongs in http://seqanswers.com/forums/forumdisplay.php?f=27
which is the de novo assembly forum.

When you say "50-300 bp", are you referencing the length of what velvet calls inserts?

And in response to which Kmer to use; I refer you back to the manual:

5.2 Choice of hash length k
The hash length is the length of the k-mers being entered in the hash table.
Firstly, you must observe three technical constraints:
• it must be an odd number, to avoid palindromes. If you put in an even
number, Velvet will just decrement it and proceed.
• it must be below or equal to MAXKMERHASH length (cf. 2.3.3, by
default 31bp), because it is stored on 64 bits
• it must be strictly inferior to read length, otherwise you simply will not
observe any overlaps between reads, for obvious reasons.
Now you still have quite a lot of possibilities. As is often the case, it’s a tradeoﬀ between speciﬁcity and sensitivity. Longer kmers bring you more speciﬁcity
(i.e. less spurious overlaps) but lowers coverage (cf. below). . . so there’s a sweet
spot to be found with time and experience.
Experience shows that kmer coverage should be above 10 to start getting
decent results. If Ck is above 20, you might be “wasting” coverage. Experience
also shows that empirical tests with diﬀerent values for k are not that costly to
run!
5.3 Choice of a coverage cutoﬀ
Velvet was designed to be explicitly cautious when correcting the assembly, to
lose as little information as possible. This consequently will leave some obvious
errors lying behind after the Tour Bus algorithm (cf. 7) was run. To detect
them, you can plot out the distribution of k-mer coverages (5.2), using plotting
software (I use R).

Error: 404 | EMBL-EBI

http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

velvetg is simply

Code:

velvetg auto

This would also be a good time to ask what you are assembling, and whether or not you have gotten your feet wet on de novo assembly for which there is an "answer", like E. coli MG1655.

**nareshvasani** · 08-06-2013, 09:11 AM

HI winsettz

My Fastq file has 50-300bp long sequence read. And all are single end read.

So I was wondering which command to executive;
For eg:
velveth auto 31 -fastq -short -inputfile

or

velveth auto 31 -fastq -long -inputfile

**winsettz** · 08-06-2013, 10:29 AM

Originally posted by nareshvasani View Post

My Fastq file has 50-300bp long sequence read. And all are single end read.

So I was wondering which command to executive;
For eg:
velveth auto 31 -fastq -short -inputfile

or

velveth auto 31 -fastq -long -inputfile

Again, in the velvet manual

5.6 What’s long and what’s short?
Velvet was pretty much designed with micro-reads (e.g. Illumina) as short and
short to long reads (e.g. 454 and capillary) as long. Reference sequences can
also be thrown in as long.
That being said, there is no necessary distinction between the types of reads.
The only constraint is that a short read be shorter than 32kb. The real diﬀerence
is the amount of data Velvet keeps on each read. Short reads are presumably
too short to resolve many repeats, so only a minimal amount of information is
kept. On the contrary, long reads are tracked in detail through the graph.
This means that whatever you call your reads, you should be able to obtain
the same initial assembly. The diﬀerences will appear as you are trying to resolve
repeats, as long reads can be followed through the graph. On the other hand,
long reads cost more memory. It is therefore perfectly ﬁne to store Sanger reads
as “short” if necessary

Illumina stuff is definitely short-read; and things like PacBio will require you to determine this beforehand. 454 and Sanger will also likely meet the definition of short read for velvet.

**nareshvasani** · 08-06-2013, 10:38 AM

winsettz

Thanks a lot.

This fastq file was generated from ion torrent proton instrumnet.
So I don't know what to consider this file as short or long?

**mastal** · 08-06-2013, 01:31 PM

If you read the extract from the manual, as posted above, it tells you that for your size of reads, it really doesn't matter whether you call them short or long, you will get the same result.

**nareshvasani** · 08-07-2013, 07:19 AM

Mastal

Thanks a lot!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Velveth and velvetg use

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News