SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
running soap denovo sweet_dna_girl Bioinformatics 6 07-30-2013 06:30 AM
Segmentation fault (core dumped) at contig step during SOAP denovo assembly tangzhonghui Bioinformatics 1 10-09-2012 05:32 PM
Rank In Soap denovo sivasubramani Introductions 1 08-31-2011 10:05 PM
SOAP denovo output to AFG format Autotroph Bioinformatics 0 02-21-2011 01:00 PM
Need help on soap denovo sundar De novo discovery 4 11-29-2010 03:03 AM

Reply
 
Thread Tools
Old 09-08-2009, 07:46 AM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default SOAP denovo assembly results

Hi,

I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

However, I wish to understand some of the properties of contigs, like
- how many reads were actually used in the assembly
- what kind of depth of coverage do the contigs have from overlapping reads
- other properties to determine how confident I could be of the contigs

any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-09-2009, 03:46 AM   #2
henry
Member
 
Location: china

Join Date: Sep 2009
Posts: 36
Default

Quote:
Originally Posted by bioinfosm View Post
Hi,

I was able to assemble more than 100 million reads to contigs using SOAP. That was cool, as all other tools ran into memory issues..

However, I wish to understand some of the properties of contigs, like
- how many reads were actually used in the assembly
- what kind of depth of coverage do the contigs have from overlapping reads
- other properties to determine how confident I could be of the contigs

any pointers... and any help with extracting info from the soap assembly results (except the contig sequences I already have)
This is a good question. I also wanna know. could anyone kindly help us?
Btw, how much time did it take to denovo assemble more than 100 million reads into contigs?

Best

Jing
henry is offline   Reply With Quote
Old 09-09-2009, 05:17 AM   #3
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

How much RAM did you need with SOAP? I'm curious ^^
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Old 09-09-2009, 08:44 AM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Less than 100Gb RAM, I did not track it (as long as it was not crashing, I was happy)
Time, it took less than a day to get done with its various steps. A lot of contigs are length 24 and definitely not useful.
>1 length 24 cvg_0_tip_0
AAAAAAAAAAAAAAAAAAAAAAAA
>3 length 24 cvg_0_tip_0
AAAAAAAAAAAAAAAAAAAAAAAC
...
>347 length 65 cvg_10_tip_0
TTCAGTAATAACGGCAGACTAATCACCTCAGAAAACACAAAGCACAAGCTTGTGCTTGTCACTTC


Looking for some documentation to understand this better...
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-11-2009, 01:15 AM   #5
node
Member
 
Location: Shenzhen,China

Join Date: Sep 2009
Posts: 10
Default Hi

A better way to ask for help about SOAPdenovo , is join to SOAP's mailing list !

And here is the SOAP site: http://soap.genomics.org.cn . You can submit your email on the home page .

PS: SOAP use Google group as it's mailing list .
node is offline   Reply With Quote
Old 05-02-2011, 07:50 AM   #6
rubi
Junior Member
 
Location: North Carolina

Join Date: Apr 2011
Posts: 6
Default

Hi,

Can anyone refer me to good documentation on quality assessment of soap denovo assembly (e.g., n50 values, how to compute coverage, and what the output files mean, etc...)

Thanks
rubi is offline   Reply With Quote
Old 10-03-2011, 02:00 PM   #7
y.divyatej@gmail.com
Junior Member
 
Location: phoenix

Join Date: Feb 2011
Posts: 4
Default Denovo assembly pipeline

I'm curious if there is a pipeline available for Soap De novo assembly. Is there a requirement for the number of genomes required for Denovo assembly?
y.divyatej@gmail.com is offline   Reply With Quote
Old 10-13-2011, 01:14 AM   #8
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

Dear Members

I am doing de-novo assembly of human genome from fastq data files. I get contigs as well as scaffolds from tools that I use. I know that scaffolds are a combination of contigs with estimated gaps in between them. Does this mean that downstream analysis when comparing it to another genome such as the reference should be done with contigs more reliably than with scaffolds ?

Aby
narain is offline   Reply With Quote
Old 02-01-2012, 02:39 PM   #9
darren.cullerne
Junior Member
 
Location: Australia

Join Date: May 2011
Posts: 1
Default

bioinfosm:
Quote:
However, I wish to understand some of the properties of contigs, like
- how many reads were actually used in the assembly
- what kind of depth of coverage do the contigs have from overlapping reads
- other properties to determine how confident I could be of the contigs
The way our "lab" (aka office) has looked at the number of reads used in an assembly is to take the raw reads and back align them against the newly created denovo contigs. It should give a pretty good indication. We use Kanga for our back alignments. Damned quick and efficient:
http://code.google.com/p/biokanga/
darren.cullerne is offline   Reply With Quote
Old 06-13-2012, 07:22 AM   #10
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

I am confused about contig numbers reported by Soap.I was trying to
run SoapDenovo for small microbial genomes (Size varies from 5-10MB).

However, the number of contigs reported in .contig files is very high
(always in thousands) whereas other assemblers giving me contigs less
than 500. But Soap-Scaffolding output was better than other
assemblers.

I am not sure, if I am looking at some intermediate contig file OR
contig number is always high in Soap? From my experience contig and
Scaffolds numbers differ by few 100s only (at least for small microbial genomes). There is no drastic change, but in Soap contigs were in 2000-3000 range while scaffolds were in 200-500 range. Please explain.
sagarutturkar is offline   Reply With Quote
Reply

Tags
assembly, coverage, denovo, soap, unused

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO