SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SSPACE: a new stand-alone scaffolding tool for small and large genomes boetsie Bioinformatics 252 03-07-2019 04:19 AM
Sspace scaffolder : does it take the "insert size" or the "fragment size" ndeshpan Bioinformatics 13 02-16-2015 12:17 AM
SSPACE creates tandem duplications instead of merging seb.lees Bioinformatics 0 05-28-2013 09:40 AM
Scaffolding despite multiple links in SSPACE swe Bioinformatics 1 05-28-2013 07:27 AM
Is SSPACE good for Abyss assemblies? pmiguel Bioinformatics 25 11-21-2011 07:26 AM

Reply
 
Thread Tools
Old 07-02-2013, 01:16 PM   #1
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default SSPACE libraries

Hello,

I am currently trying to scaffold an assembly with SSPACE, and can already see some progress.

Code:
SUMMARY:
------------------------------------------------------------
        Inserted contig file;
                Total number of contigs = 5308
                Sum (bp) = 42486218
                        Total number of N's = 158184
                        Sum (bp) no N's = 42328034
                Max contig size = 134487
                Min contig size = 1000
                Average contig size = 8004
                N50 = 17056

        After scaffolding lib1:
                Total number of scaffolds = 3392
                Sum (bp) = 42503701
                        Total number of N's = 178406
                        Sum (bp) no N's = 42325295
                Max scaffold size = 190774
                Min scaffold size = 1000
                Average scaffold size = 12530
                N50 = 27904

        After scaffolding lib2:
                Total number of scaffolds = 1820
                Sum (bp) = 42986473
                        Total number of N's = 661945
                        Sum (bp) no N's = 42324528
                Max scaffold size = 365239
                Min scaffold size = 1000
                Average scaffold size = 23618
                N50 = 50457

------------------------------------------------------------
The first library is just paired end, and the second library is mate pairs.

This is my library file:
Code:
lib1 GDR-16_65bp_R1.fastq GDR-16_65bp_R2.fastq 280 0.8 FR
lib2 MPNC_65bp_R1.fastq MPNC_65bp_R2.fastq 2411 0.5 FR
I am a bit confused about the 5th columns which is the "deviation of the mean distance". From the example in the tutorial, a deviation of 0.75 in a 200 bp insert, accepts distances of 150 to 250. I am assuming because 0.75*200 = 150? Anyways, I am asking because in both of my libraries I got a large % of reads that "calculated distances out-of-bounds".

I know in mate pairs you have a lot of paired end contamination, but the first library was paired end, and I had almost 50% of reads that did not satisfy the distance. I am pasting library stats:

Lib1, paired end:
Code:
LIBRARY lib1 STATS:
################################################################################

MAPPING READS TO CONTIGS:
------------------------------------------------------------
        Number of single reads found on contigs = 9149900
        Number of pairs used for pairing contigs / total pairs = 3517598 / 3646662
------------------------------------------------------------

READ PAIRS STATS:
        Assembled pairs: 3517598 (7035196 sequences)
                Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 280 +/-224): 1668422
                Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 5097
                Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 7824
                ---
                Satisfied in distance/logic within a given contig pair (pre-scaffold): 240094
                Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 1596161
                ---
        Total satisfied: 1908516        unsatisfied: 1609082


        Estimated insert size statistics (based on 1673519 pairs):
                Mean insert size = 240
                Median insert size = 230
REPEATS:
        Number of repeated edges = 1665
------------------------------------------------------------

################################################################################
Lib2, mate pair:
Code:
LIBRARY lib2 STATS:
################################################################################

MAPPING READS TO CONTIGS:
------------------------------------------------------------
        Number of single reads found on contigs = 5924956
        Number of pairs used for pairing contigs / total pairs = 1560927 / 1708281
------------------------------------------------------------

READ PAIRS STATS:
        Assembled pairs: 1560927 (3121854 sequences)
                Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2411 +/-1205.5): 129649
                Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 40359
                Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 427578
                ---
                Satisfied in distance/logic within a given contig pair (pre-scaffold): 267259
                Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 696082
                ---
        Total satisfied: 396908 unsatisfied: 1164019


        Estimated insert size statistics (based on 170008 pairs):
                Mean insert size = 1951
                Median insert size = 2237
REPEATS:
        Number of repeated edges = 1569
------------------------------------------------------------

################################################################################
Looks like for lib2 i should adjust the mean, but for lib1, I am wondering what is causing 50% of reads to not satisfy the distance, which is basically 280 +/-224.

Any thoughts on how I could improve my scaffolding?
AdrianP is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO