SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
N50 explained maasha Bioinformatics 19 06-11-2015 08:27 AM
velvet N50 bioenvisage De novo discovery 12 07-19-2013 05:50 AM
the N50 is so low from soapdenovo heiya De novo discovery 3 05-31-2013 05:09 PM
n50 value for transcriptome assembly Ramprasad Bioinformatics 0 10-16-2011 10:18 PM
Optimal k-mer and N50? AronaldJ De novo discovery 1 12-28-2010 09:03 AM

Reply
 
Thread Tools
Old 08-31-2011, 02:19 AM   #1
sarbashis
Member
 
Location: India

Join Date: Jun 2010
Posts: 17
Default N50 less than 2000

Dear all,
I am trying to assemble a bacterial genome. I have trimmed each read and also removed low quality reads. I also removed reads having "N". The genome coverage is above 150. I used velvet and SOAPdenovo with different parameters and k-mers but always getting N50 less than 2000. Any suggestion ?? Thanks in advance
sarbashis is offline   Reply With Quote
Old 08-31-2011, 03:44 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I recommend ABySS. Is your data paired end or single read? How long are the reads? ABySS-PE is particularly great if you iterate through a good range of kmers and find the optimum one.

Also, 150x may be too high. You might want to split your data set in half and repeat assembly on each half. Now it is difficult to do an Illumina run that goes less than 100x on a bacterial genome. But the assemblers are not designed to work with coverages this high for the most part.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-31-2011, 10:53 PM   #3
sarbashis
Member
 
Location: India

Join Date: Jun 2010
Posts: 17
Default

@pmiguel,
Thanks for your response. My reads are paired-end and of 72 nucleotide long. I trimmed 12 nucleotide from the ends.
sarbashis is offline   Reply With Quote
Old 09-01-2011, 04:48 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Yes, try using 1/3rd or 1/2 of the data set and/or using ABySS-PE. For ABySS, kmers of 40-63 are worth trying.

--
Phillip
pmiguel is offline   Reply With Quote
Old 09-07-2011, 03:06 AM   #5
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by sarbashis View Post
I am trying to assemble a bacterial genome. I have trimmed each read and also removed low quality reads. I also removed reads having "N".
I'd suggest something like FastQC to check how good the input data really is.

Then i would recommend quality based trimming (probably Q20 or above given your data supply) rather then fixed length, and don't just bin a read because of an N at the end. I also strongly suggest trimming for adapters (though that may not help the N50, it will help correctness), and be careful that you don't 'unpair' the files by dropping one of a pair.

Blatant plug: The trimmomatic will do all you need - you can get it here. PM me if you need help with it.

Quote:
Originally Posted by sarbashis View Post
The genome coverage is above 150. I used velvet and SOAPdenovo with different parameters and k-mers but always getting N50 less than 2000. Any suggestion ?? Thanks in advance
Are these SOAP numbers based on scaffolding or just assembly. SOAP AFAIK doesn't use paired information at all in the assembly stage, so it looks very weak if judged by contigs. Drop anything shorter than 2 * k if you're checking the output of SOAP - most of the smaller shrapnel is junk.

BTW SOAP seems to work best with a k of around 45% of the read length.

You can also try the SOAP corrector - it helps, if sometimes marginally. If nothing else, the kmer frequency graph can be enlightening.
tonybolger is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO