SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
help using velvet for de novo assembly NGS_New_User Bioinformatics 6 03-11-2015 12:07 AM
how to do reference based de novo assembly using velvet arkilis Bioinformatics 8 12-04-2013 05:41 AM
memory requirments of velvet tool (de novo assembly) bioinfosm Bioinformatics 12 04-19-2012 04:26 AM
Velvet de novo assembly to amosvalidate canuck Bioinformatics 5 07-17-2011 12:24 PM
de novo assembly (velvet or others) strob Bioinformatics 1 01-20-2010 05:53 AM

Reply
 
Thread Tools
Old 09-15-2014, 07:29 AM   #1
mandar.bobade60
Member
 
Location: India

Join Date: Jun 2013
Posts: 14
Default de novo assembly using velvet and Amos

Hi all,

I am newbie in NGS data analysis and first task in it come to me is de novo assembly. I have plant mitochondrial genome to assemble for which I have almost 24GB of data with R1 and R2 each.
However, I have through with QC analysis and velvet output. The output with velvet I have got is contigs.fa files for multiple kmers, as 55, 95, 10. The read length is 101.

I got statistics for kmer 85 contig.fa using quast which is as follows:

Assembly contigs
# contigs (>= 0 bp) 2933
# contigs (>= 1000 bp) 274
Total length (>= 0 bp) 2145071
Total length (>= 1000 bp) 1433182
# contigs 441
Largest contig 62822
Total length 1548880
GC (%) 45.35
N50 7528
N75 3479
L50 51
L75 129
# N's per 100 kbp 0.00

But I am stuck here now, since I am not getting idea how to say this is good to proceed with or bad to go with something else. Also, it would be of great help if anyone suggest me further steps to be taken to arrive at well assembled genome.

Regards,
Mandar
mandar.bobade60 is offline   Reply With Quote
Old 09-16-2014, 12:13 AM   #2
WhatsOEver
Senior Member
 
Location: Germany

Join Date: Apr 2012
Posts: 215
Default

On a first glance, the mitochondrium seems quite enormous in size with a really low GC-content.
I would therefore assume, that you have whole genome sequencing data and that you didn't filter your reads in any way, did you?
Can you tell from which organism this is?
Does "24GB of data" mean you have 2x12GB fastq read files or you have 24Gbp of sequence information?

The following paper on mitochondrial genome assembly from WGS might also be of interest for you:
http://nar.oxfordjournals.org/content/41/13/e129
WhatsOEver is offline   Reply With Quote
Old 09-16-2014, 02:59 AM   #3
mandar.bobade60
Member
 
Location: India

Join Date: Jun 2013
Posts: 14
Default

Thank you WhatsOEver for your paper link.

It's only mitochondrial data with 24GB for each end, so collectively 48GB. But coverage is huge thats why data is too much. The only filtering are done using FASTQC and FastUniq.
mandar.bobade60 is offline   Reply With Quote
Old 09-16-2014, 09:48 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

I highly recommend subsampling that data; you have way too much to get a good assembly. Hard to say how much you need since mito vary in size. I'd start by subsampling by a factor of 200 and assembling again to get a better idea of how big the genome is (or you could estimate the size from a kmer frequency plot). Then, if you want to assemble with Velvet, subsample again or normalize to around 40x coverage.

You can subsample paired reads with my reformat tool, which will keep the pairing intact.
Brian Bushnell is offline   Reply With Quote
Old 09-23-2014, 02:54 AM   #5
mandar.bobade60
Member
 
Location: India

Join Date: Jun 2013
Posts: 14
Default Subsampling

Dear Brian Bushnell,
I did subsampling and after subsmapling N50 value is getting substantially increased.
I have 101300000 reads with expected mitochondrial genome size of 715000 base pairs.
But problem persists even after picking file with less contig numbers (around 90-100) with good N50 is that the alignment result with raw reads to its contig file is horrible (almost 91% failure).

Can anyone let me know further processing? Since genome is mitochondrial, I don't have much options also for multiple seq alignment with related fasta files.
mandar.bobade60 is offline   Reply With Quote
Old 09-23-2014, 11:10 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

You still have ~14000x coverage which is way too high. Like I said, you need to target closer to 40x coverage, or at least, no more than 100x.

BLAST your contigs to see what they are, and blast a few unaligned reads to see what those are. You could have massive contamination. And anyway, it seems unlikely that you have 24GB of data on a mitochondria. Why would anyone do that? It's very wasteful experimental design.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
amos, contig analysis, genome assembly, quast, velvet

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO