SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
N50 explained maasha Bioinformatics 19 06-11-2015 08:27 AM
the N50 is so low from soapdenovo heiya De novo discovery 3 05-31-2013 05:09 PM
n50 value for transcriptome assembly Ramprasad Bioinformatics 0 10-16-2011 10:18 PM
N50 less than 2000 sarbashis Illumina/Solexa 4 09-07-2011 03:06 AM
Optimal k-mer and N50? AronaldJ De novo discovery 1 12-28-2010 09:03 AM

Reply
 
Thread Tools
Old 04-06-2010, 02:13 AM   #1
bioenvisage
Member
 
Location: it

Join Date: Oct 2009
Posts: 40
Default velvet N50

Hi ,Iam working with velvet denovo assembly of illumina reads.Initially i trimmed the raw reads of illumina based on quality and after running the velvet with the subset of the reads the N50 is found to be always low , It is like 29 ,20 ..I tried with various parameters , nothing improved the N50.So any suggestions???
bioenvisage is offline   Reply With Quote
Old 06-13-2010, 12:48 AM   #2
biomed
Junior Member
 
Location: Asia

Join Date: Dec 2009
Posts: 6
Default

Important parameters for velvet: K-mer, exp_cov, cov_cutoff.... you could play with these three to get a better N50...
biomed is offline   Reply With Quote
Old 02-09-2011, 09:43 AM   #3
gridbird
Member
 
Location: san diego

Join Date: Oct 2010
Posts: 16
Default

what is your read length? and the estimated genome length? if the sequencing coverage is very high (>=200) and uneven, the N50 is very slow for Velvet because of the sequencing errors and SNPs , even though you used subset and set important parameters, k-mers, exp_cov and cov_cutoff.

Last edited by gridbird; 02-09-2011 at 10:25 AM.
gridbird is offline   Reply With Quote
Old 02-11-2011, 12:30 AM   #4
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

as gridbird already stated, we need more information to help you. ;-)

if you have a high coverage you can choose a high kmer and use cov_cutoff to remove contigs with low coverage which are normally small.
Thorondor is offline   Reply With Quote
Old 02-25-2011, 03:39 AM   #5
vtosha
Member
 
Location: Moscow

Join Date: May 2010
Posts: 36
Default

We have this problem too: we had an excellent run for our samples (chloroplast genome) but assembly with Velvet gave N50 lower than read lengh (36). We played with all Velvet parametres but maximum N50 was 29. Where may be a problem?
vtosha is offline   Reply With Quote
Old 02-25-2011, 04:31 AM   #6
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

how many reads you have? how high is you estimated coverage? which kmers you tried? how many contigs you get? do you get some long contigs? what coverage is stated in the ids of the contigs?

i really can't say where the problem might be with the amount of information you stated.

anyway the best way to address velvet problems is over the mailinglist:
http://listserver.ebi.ac.uk/mailman/...o/velvet-users

zerbino is also very active on this mailinglist.
Thorondor is offline   Reply With Quote
Old 02-25-2011, 05:00 AM   #7
vtosha
Member
 
Location: Moscow

Join Date: May 2010
Posts: 36
Default

17 mln reads from one lane from one end. Coverage near 200 (but may be contamination from nuclear genome). Length of reads 36. k-mers from 23 to 31. Number of contigs from 300 to 2500. N50 12 - 21. We try with 1 mln and 100 ths reads, but the result was few better (N50 54). Maximum contigs length near 100 nucleotides. What the "ids"?
(Data from 454 from this material gave a chloroplast genome map.)

Last edited by vtosha; 02-25-2011 at 05:07 AM.
vtosha is offline   Reply With Quote
Old 02-25-2011, 05:16 AM   #8
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

i mean the tag (id) of the contigs. they are like: >NODE_length_xxxxx_cov_xxxxx.xxxxxx, so you can check what coverage velvet assigns to the contigs.
did you also set the parameter -unused_reads yes, to check how many reads velvet does not use? Do you do any quality trimming before using velvet?

A coverage near 200 should give you better results, there seems to be something wrong. :-/ did you tried another assembler?
Thorondor is offline   Reply With Quote
Old 02-25-2011, 05:30 AM   #9
vtosha
Member
 
Location: Moscow

Join Date: May 2010
Posts: 36
Default

Coverage in the ids of contigs near 1000.
We didn't set parameter -unused reads by ourselves. But velvet write how many reads it use: 100 ths-1 mlns from 17 mlns reads. When we use for assembly 1 mlns or 100 ths reads: 82 ths used from 1 mln, 1000 from 100 ths. No, we didn't trim reads.
We try Edena - no good results (contig 134 nucleotides and no BLAST to anything).
No BLAST to adapters or primers.
May the problem be in abundant PCR?
vtosha is offline   Reply With Quote
Old 02-25-2011, 05:36 AM   #10
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

well when you velvet use only 1mln out of 17mlns there seems to be quality issue about the reads. And when you contigs have a cov around 1000 it looks like they are from repetitive sequences.
Thorondor is offline   Reply With Quote
Old 02-25-2011, 08:46 AM   #11
gridbird
Member
 
Location: san diego

Join Date: Oct 2010
Posts: 16
Default

I think it is the sequencing error which give you this problem. There is no good N50 for Velvet with high coverage and sequencing error. did you check velvet paper? for error free reads, Velvet can always get good N50 no matter how much high coverage. But for real reads, N50 will drop with coverage which is caused by sequencing error. you can randomly selected several coverage, such as 10,50,100, 150, 200 and assembly them using velvet and pick up a good N50.
did you try the error correction program, such as shrec, quake, to correct sequencing error before assembly? also, you can use Solexaqa to trimmed some reads with low quality before assembly.
gridbird is offline   Reply With Quote
Old 07-19-2013, 03:38 AM   #12
diptarka
Member
 
Location: DDN

Join Date: Mar 2013
Posts: 10
Default velvet

Hi guys,
I have been working on an yeast strain. I have raw paired end illumina reads. Is there any way to find out the read length and coverage from the sequence data itself? I am trying to use velvet for assembly of the genome. However, i am new to velvet and have certain queries on the same. What is the optimization criteria for k mer length? Secondly, the contigs obtained after running velvetg, how it can be further used to generate full genome sequence of the organism? how can further genes be predicted from the sequence?
diptarka is offline   Reply With Quote
Old 07-19-2013, 05:50 AM   #13
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default velvet N50

You can get the read length by having a look at your fastq files, and
FastQC will also give you the read length.

There is a script called velvetk that will calculate kmer coverage from your fastq files before you run velvet.

See
http://www.vicbioinformatics.com/software.velvetk.shtml

You may also find Velvet Optimiser and Velvet Advisor useful.

http://bioinformatics.net.au/softwar...ptimiser.shtml

http://dna.med.monash.edu.au/~torsten/velvet_advisor/
mastal is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO