Seqanswers Leaderboard Ad

**WhatsOEver** · 09-15-2014, 11:13 PM

On a first glance, the mitochondrium seems quite enormous in size with a really low GC-content.
I would therefore assume, that you have whole genome sequencing data and that you didn't filter your reads in any way, did you?
Can you tell from which organism this is?
Does "24GB of data" mean you have 2x12GB fastq read files or you have 24Gbp of sequence information?

The following paper on mitochondrial genome assembly from WGS might also be of interest for you:

http://nar.oxfordjournals.org/content/41/13/e129

**mandar.bobade60** · 09-16-2014, 01:59 AM

Thank you WhatsOEver for your paper link.

It's only mitochondrial data with 24GB for each end, so collectively 48GB. But coverage is huge thats why data is too much. The only filtering are done using FASTQC and FastUniq.

**Brian Bushnell** · 09-16-2014, 08:48 AM

I highly recommend subsampling that data; you have way too much to get a good assembly. Hard to say how much you need since mito vary in size. I'd start by subsampling by a factor of 200 and assembling again to get a better idea of how big the genome is (or you could estimate the size from a kmer frequency plot). Then, if you want to assemble with Velvet, subsample again or normalize to around 40x coverage.

You can subsample paired reads with my reformat tool, which will keep the pairing intact.

**mandar.bobade60** · 09-23-2014, 01:54 AM

Subsampling

Dear Brian Bushnell,
I did subsampling and after subsmapling N50 value is getting substantially increased.
I have 101300000 reads with expected mitochondrial genome size of 715000 base pairs.
But problem persists even after picking file with less contig numbers (around 90-100) with good N50 is that the alignment result with raw reads to its contig file is horrible (almost 91% failure).

Can anyone let me know further processing? Since genome is mitochondrial, I don't have much options also for multiple seq alignment with related fasta files.

**Brian Bushnell** · 09-23-2014, 10:10 AM

You still have ~14000x coverage which is way too high. Like I said, you need to target closer to 40x coverage, or at least, no more than 100x.

BLAST your contigs to see what they are, and blast a few unaligned reads to see what those are. You could have massive contamination. And anyway, it seems unlikely that you have 24GB of data on a mitochondria. Why would anyone do that? It's very wasteful experimental design.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

de novo assembly using velvet and Amos

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News