Hi ,Iam working with velvet denovo assembly of illumina reads.Initially i trimmed the raw reads of illumina based on quality and after running the velvet with the subset of the reads the N50 is found to be always low , It is like 29 ,20 ..I tried with various parameters , nothing improved the N50.So any suggestions???
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
what is your read length? and the estimated genome length? if the sequencing coverage is very high (>=200) and uneven, the N50 is very slow for Velvet because of the sequencing errors and SNPs , even though you used subset and set important parameters, k-mers, exp_cov and cov_cutoff.Last edited by gridbird; 02-09-2011, 11:25 AM.
Comment
-
how many reads you have? how high is you estimated coverage? which kmers you tried? how many contigs you get? do you get some long contigs? what coverage is stated in the ids of the contigs?
i really can't say where the problem might be with the amount of information you stated.
anyway the best way to address velvet problems is over the mailinglist:
zerbino is also very active on this mailinglist.
Comment
-
17 mln reads from one lane from one end. Coverage near 200 (but may be contamination from nuclear genome). Length of reads 36. k-mers from 23 to 31. Number of contigs from 300 to 2500. N50 12 - 21. We try with 1 mln and 100 ths reads, but the result was few better (N50 54). Maximum contigs length near 100 nucleotides. What the "ids"?
(Data from 454 from this material gave a chloroplast genome map.)Last edited by vtosha; 02-25-2011, 06:07 AM.
Comment
-
i mean the tag (id) of the contigs. they are like: >NODE_length_xxxxx_cov_xxxxx.xxxxxx, so you can check what coverage velvet assigns to the contigs.
did you also set the parameter -unused_reads yes, to check how many reads velvet does not use? Do you do any quality trimming before using velvet?
A coverage near 200 should give you better results, there seems to be something wrong. :-/ did you tried another assembler?
Comment
-
Coverage in the ids of contigs near 1000.
We didn't set parameter -unused reads by ourselves. But velvet write how many reads it use: 100 ths-1 mlns from 17 mlns reads. When we use for assembly 1 mlns or 100 ths reads: 82 ths used from 1 mln, 1000 from 100 ths. No, we didn't trim reads.
We try Edena - no good results (contig 134 nucleotides and no BLAST to anything).
No BLAST to adapters or primers.
May the problem be in abundant PCR?
Comment
-
I think it is the sequencing error which give you this problem. There is no good N50 for Velvet with high coverage and sequencing error. did you check velvet paper? for error free reads, Velvet can always get good N50 no matter how much high coverage. But for real reads, N50 will drop with coverage which is caused by sequencing error. you can randomly selected several coverage, such as 10,50,100, 150, 200 and assembly them using velvet and pick up a good N50.
did you try the error correction program, such as shrec, quake, to correct sequencing error before assembly? also, you can use Solexaqa to trimmed some reads with low quality before assembly.
Comment
-
velvet
Hi guys,
I have been working on an yeast strain. I have raw paired end illumina reads. Is there any way to find out the read length and coverage from the sequence data itself? I am trying to use velvet for assembly of the genome. However, i am new to velvet and have certain queries on the same. What is the optimization criteria for k mer length? Secondly, the contigs obtained after running velvetg, how it can be further used to generate full genome sequence of the organism? how can further genes be predicted from the sequence?
Comment
-
velvet N50
You can get the read length by having a look at your fastq files, and
FastQC will also give you the read length.
There is a script called velvetk that will calculate kmer coverage from your fastq files before you run velvet.
See
You may also find Velvet Optimiser and Velvet Advisor useful.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment