SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tool for extending reads ? medalofhonour Bioinformatics 3 11-18-2013 01:41 PM
the N50 is so low from soapdenovo heiya De novo discovery 3 05-31-2013 05:09 PM
SOAPdenovo for ABySS contigs narain Bioinformatics 3 04-16-2012 10:31 AM
Extending contigs by remapping AdrianP Bioinformatics 8 03-07-2012 01:23 PM
Cufflinks extending/merging exons kbushley Bioinformatics 2 10-18-2011 09:39 PM

Reply
 
Thread Tools
Old 01-14-2014, 04:34 AM   #1
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default Extending contigs N50 (SOAPdenovo)?

Hellow fellows,

I've been assembling a genome of a snake with SOAPdenovo using big set of paired-end reads that were sequenced in Illumina hiScan.
I already made a lot of assemblies tuning the parameters and using filtered/non-filtered reads, but I'm always getting a low contig N50 of 1.4k (against >30k on references).

I've read something about breaking the scaffolds in contigs and using the reads again to map one end to the gap but I couldn't find a detailed explanation on how to do that. The fact is that in all assemblies that took this approach they could extend contigs lengths and N50 as a consequence.

Could someone recommend me a text to read about that? has any of you ever done this?


Also, another doubt I have is regarding evaluating scaffolds with QUAST (http://quast.bioinf.spbau.ru/manual.html)... I see it analyzes the scaffold file in two ways, and one of them gives me a bigger N50 (i.e. 3.5k) but I don't know the differences between the two results quast gives me.
No problem when evaluating contigs though.
Any clues here too?


Thanks a lot in advance!

Condomitti.
condomitti is offline   Reply With Quote
Old 01-15-2014, 04:27 AM   #2
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default

No one has tips on this??
condomitti is offline   Reply With Quote
Old 01-16-2014, 10:33 AM   #3
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

There is no different ways of calculating N50, only one way. Find a perl script that does it.

Doesn't soap already provide you with scaffold and contig files? There should be no need to break scaffolds into contigs.

Also, does SOAP not give you the stats? the n50, the longest contig...?

What you are suggesting to do with reads is called read walking, where you maps reads to the ends of contigs to extend. You will generate missassemblies due to repeats while doing that.

The best thing you can try is other assemblers, other parameters.
AdrianP is offline   Reply With Quote
Old 01-16-2014, 10:40 AM   #4
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default

Thank you for your reply, AdrianP!

Yes, SOAP gives me all the assembly stats. I have N50 value indeed. My concern is regarding the low N50 value.

I have tried SGA, which only increased that value in a few units.



cheers,
Condomitti.
condomitti is offline   Reply With Quote
Old 01-16-2014, 10:42 AM   #5
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

You have illumina data, SGA might not be your best choice. A few questions.

What is the rough nucleotide coverage of the genome?
What about the genome size?
How long are your reads? And how long is the insert size?
AdrianP is offline   Reply With Quote
Old 01-16-2014, 12:46 PM   #6
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default

The genome size is 2.2Gbp

Considering contigs > 800bp, the nucleotide coverage is 524.993.890bp


Reads vary from 40-100 and insert size 300bp.


N50 1549bp
largest contig: 28.559bp


Cheers,
Condomitti.
condomitti is offline   Reply With Quote
Old 01-16-2014, 12:52 PM   #7
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Quote:
Originally Posted by condomitti View Post
The genome size is 2.2Gbp

Considering contigs > 800bp, the nucleotide coverage is 524.993.890bp


Reads vary from 40-100 and insert size 300bp.


N50 1549bp
largest contig: 28.559bp


Cheers,
Condomitti.
I think you gave me the assembly size rather than the nucleotide coverage. Nucleotide coverage is how many reads overlap any given DNA sequence from the genome.

Your reads vary in length. That is not normal for illumina sequencing, did you trim them, or do you have different libraries?

What kmer values did you use when assembling with SOAP?
AdrianP is offline   Reply With Quote
Old 01-16-2014, 01:05 PM   #8
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default

Ohh you are right, sorry about that...

I'm working with ~130x fold.

I did trim them, and applied some filters to remove duplications etc.

I have tried some different values for kmer, using both single and multi-kmer strategies.

With single kmer, the one that generated better results was 65.
Using multi-kmer 61-71 I could get that result I've written above.
condomitti is offline   Reply With Quote
Old 01-16-2014, 02:33 PM   #9
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Using untrimmed libraries, try SPAdes wither kmers 23,33,43,53,63,73
AdrianP is offline   Reply With Quote
Old 01-17-2014, 09:20 AM   #10
condomitti
Member
 
Location: São Paulo - Brazil

Join Date: Sep 2013
Posts: 33
Default

Thanks AdrianP! I'll take a look.

Cheers,
Condomitti.
condomitti is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:27 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO