Seqanswers Leaderboard Ad

**pmiguel** · 08-31-2011, 03:44 AM

I recommend ABySS. Is your data paired end or single read? How long are the reads? ABySS-PE is particularly great if you iterate through a good range of kmers and find the optimum one.

Also, 150x may be too high. You might want to split your data set in half and repeat assembly on each half. Now it is difficult to do an Illumina run that goes less than 100x on a bacterial genome. But the assemblers are not designed to work with coverages this high for the most part.

--
Phillip

**sarbashis** · 08-31-2011, 10:53 PM

@pmiguel,
Thanks for your response. My reads are paired-end and of 72 nucleotide long. I trimmed 12 nucleotide from the ends.

**pmiguel** · 09-01-2011, 04:48 AM

Yes, try using 1/3rd or 1/2 of the data set and/or using ABySS-PE. For ABySS, kmers of 40-63 are worth trying.

--
Phillip

**tonybolger** · 09-07-2011, 03:06 AM

Originally posted by sarbashis View Post

I am trying to assemble a bacterial genome. I have trimmed each read and also removed low quality reads. I also removed reads having "N".

I'd suggest something like FastQC to check how good the input data really is.

Then i would recommend quality based trimming (probably Q20 or above given your data supply) rather then fixed length, and don't just bin a read because of an N at the end. I also strongly suggest trimming for adapters (though that may not help the N50, it will help correctness), and be careful that you don't 'unpair' the files by dropping one of a pair.

Blatant plug: The trimmomatic will do all you need - you can get it here. PM me if you need help with it.

Originally posted by sarbashis View Post

The genome coverage is above 150. I used velvet and SOAPdenovo with different parameters and k-mers but always getting N50 less than 2000. Any suggestion ?? Thanks in advance

Are these SOAP numbers based on scaffolding or just assembly. SOAP AFAIK doesn't use paired information at all in the assembly stage, so it looks very weak if judged by contigs. Drop anything shorter than 2 * k if you're checking the output of SOAP - most of the smaller shrapnel is junk.

BTW SOAP seems to work best with a k of around 45% of the read length.

You can also try the SOAP corrector - it helps, if sometimes marginally. If nothing else, the kmer frequency graph can be enlightening.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

N50 less than 2000

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News