Unconfigured Ad

**WhatsOEver** · 09-15-2014, 11:13 PM

On a first glance, the mitochondrium seems quite enormous in size with a really low GC-content.
I would therefore assume, that you have whole genome sequencing data and that you didn't filter your reads in any way, did you?
Can you tell from which organism this is?
Does "24GB of data" mean you have 2x12GB fastq read files or you have 24Gbp of sequence information?

The following paper on mitochondrial genome assembly from WGS might also be of interest for you:

http://nar.oxfordjournals.org/content/41/13/e129

**mandar.bobade60** · 09-16-2014, 01:59 AM

Thank you WhatsOEver for your paper link.

It's only mitochondrial data with 24GB for each end, so collectively 48GB. But coverage is huge thats why data is too much. The only filtering are done using FASTQC and FastUniq.

**Brian Bushnell** · 09-16-2014, 08:48 AM

I highly recommend subsampling that data; you have way too much to get a good assembly. Hard to say how much you need since mito vary in size. I'd start by subsampling by a factor of 200 and assembling again to get a better idea of how big the genome is (or you could estimate the size from a kmer frequency plot). Then, if you want to assemble with Velvet, subsample again or normalize to around 40x coverage.

You can subsample paired reads with my reformat tool, which will keep the pairing intact.

**mandar.bobade60** · 09-23-2014, 01:54 AM

Subsampling

Dear Brian Bushnell,
I did subsampling and after subsmapling N50 value is getting substantially increased.
I have 101300000 reads with expected mitochondrial genome size of 715000 base pairs.
But problem persists even after picking file with less contig numbers (around 90-100) with good N50 is that the alignment result with raw reads to its contig file is horrible (almost 91% failure).

Can anyone let me know further processing? Since genome is mitochondrial, I don't have much options also for multiple seq alignment with related fasta files.

**Brian Bushnell** · 09-23-2014, 10:10 AM

You still have ~14000x coverage which is way too high. Like I said, you need to target closer to 40x coverage, or at least, no more than 100x.

BLAST your contigs to see what they are, and blast a few unaligned reads to see what those are. You could have massive contamination. And anyway, it seems unlikely that you have 24GB of data on a mitochondria. Why would anyone do that? It's very wasteful experimental design.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

de novo assembly using velvet and Amos

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News