SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help/advise to NGS virgin.... Coltom Metagenomics 1 07-22-2013 10:04 AM
Velevt Controll Read Length nxtgenkid10 Bioinformatics 6 02-24-2012 12:18 AM
Another Newbie.. Anyone to advise.? teutara Bioinformatics 7 03-16-2011 11:14 AM
filtering out reads from abundant transcripts before using velevt Marco Bioinformatics 2 02-03-2010 09:54 AM
Need Advise on the mRNA sequencing foamy Sample Prep / Library Generation 1 10-13-2009 09:20 AM

Reply
 
Thread Tools
Old 08-16-2014, 04:10 PM   #1
MikhailFokin
Member
 
Location: NZ

Join Date: Mar 2014
Posts: 15
Default Velevt - add reads -> ruin N50. why? need advise.

Thank you for reading this question, in general I understand how Velvet works, but can not explain 10 fold decrease of N50 when adding more reads to the dataset.

DETAILS

MiSeq v3 ~300 bp reads, mate-pair libraries 3-12kb inserts,
(I have also ~5% of paired-end 800bp insert library used in both assemblies).

Assembly 1.
Nextclip -> A only files (Junction Adapter in both reads) -> RevCompl -> Velvet k=91

results Assembly 1
Estimated Coverage = 36.798895
Pre-graph has 623415 nodes and 21026066 sequences 53626987 kmers found
Final graph has 3170 nodes and n50 of 1948774, max 3890197, total 35875507, using 14957532/21026066 reads

Assembly 2.
Nextclip -> A, B (JA in read2), C(JA in read 1), E(JA in both with relaxed cond) -> join A,B,C,E by "cat" -> RevCompl -> Velvet k=91

results Assembly 2
Pre-graph has 1937511 nodes and 31230404 sequences 110486096 kmers found
Estimated Coverage = 44.494662
Final graph has 8571 nodes and n50 of 225231, max 909759, total 35949035, using 21595103/31230404 reads

PS I've checked that no JA left in final assemblies.
PPS My guess now - adding many bad reads complicates the graph, so playing with filtering (by Trimmomatic) now.

Last edited by MikhailFokin; 08-16-2014 at 05:20 PM.
MikhailFokin is offline   Reply With Quote
Old 08-16-2014, 09:43 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by MikhailFokin View Post
PPS My guess now - adding many bad reads complicates the graph, so playing with filtering (by Trimmomatic) now.
Seems like the most likely reason. Though also, Velvet doesn't necessarily do well with too high coverage (not that 44x is too high; that should be fine). I encourage you to look at or post a FastQC analysis of the data.

Also, is it possible that your read pairing got messed up at some point? Check to make sure the names still match at every line.

Last edited by Brian Bushnell; 08-16-2014 at 09:45 PM.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO