SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   SSPACE libraries (http://seqanswers.com/forums/showthread.php?t=31574)

AdrianP 07-02-2013 01:16 PM

SSPACE libraries
 
Hello,

I am currently trying to scaffold an assembly with SSPACE, and can already see some progress.

Code:

SUMMARY:
------------------------------------------------------------
        Inserted contig file;
                Total number of contigs = 5308
                Sum (bp) = 42486218
                        Total number of N's = 158184
                        Sum (bp) no N's = 42328034
                Max contig size = 134487
                Min contig size = 1000
                Average contig size = 8004
                N50 = 17056

        After scaffolding lib1:
                Total number of scaffolds = 3392
                Sum (bp) = 42503701
                        Total number of N's = 178406
                        Sum (bp) no N's = 42325295
                Max scaffold size = 190774
                Min scaffold size = 1000
                Average scaffold size = 12530
                N50 = 27904

        After scaffolding lib2:
                Total number of scaffolds = 1820
                Sum (bp) = 42986473
                        Total number of N's = 661945
                        Sum (bp) no N's = 42324528
                Max scaffold size = 365239
                Min scaffold size = 1000
                Average scaffold size = 23618
                N50 = 50457

------------------------------------------------------------

The first library is just paired end, and the second library is mate pairs.

This is my library file:
Code:

lib1 GDR-16_65bp_R1.fastq GDR-16_65bp_R2.fastq 280 0.8 FR
lib2 MPNC_65bp_R1.fastq MPNC_65bp_R2.fastq 2411 0.5 FR

I am a bit confused about the 5th columns which is the "deviation of the mean distance". From the example in the tutorial, a deviation of 0.75 in a 200 bp insert, accepts distances of 150 to 250. I am assuming because 0.75*200 = 150? Anyways, I am asking because in both of my libraries I got a large % of reads that "calculated distances out-of-bounds".

I know in mate pairs you have a lot of paired end contamination, but the first library was paired end, and I had almost 50% of reads that did not satisfy the distance. I am pasting library stats:

Lib1, paired end:
Code:

LIBRARY lib1 STATS:
################################################################################

MAPPING READS TO CONTIGS:
------------------------------------------------------------
        Number of single reads found on contigs = 9149900
        Number of pairs used for pairing contigs / total pairs = 3517598 / 3646662
------------------------------------------------------------

READ PAIRS STATS:
        Assembled pairs: 3517598 (7035196 sequences)
                Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 280 +/-224): 1668422
                Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 5097
                Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 7824
                ---
                Satisfied in distance/logic within a given contig pair (pre-scaffold): 240094
                Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 1596161
                ---
        Total satisfied: 1908516        unsatisfied: 1609082


        Estimated insert size statistics (based on 1673519 pairs):
                Mean insert size = 240
                Median insert size = 230
REPEATS:
        Number of repeated edges = 1665
------------------------------------------------------------

################################################################################

Lib2, mate pair:
Code:

LIBRARY lib2 STATS:
################################################################################

MAPPING READS TO CONTIGS:
------------------------------------------------------------
        Number of single reads found on contigs = 5924956
        Number of pairs used for pairing contigs / total pairs = 1560927 / 1708281
------------------------------------------------------------

READ PAIRS STATS:
        Assembled pairs: 1560927 (3121854 sequences)
                Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2411 +/-1205.5): 129649
                Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 40359
                Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 427578
                ---
                Satisfied in distance/logic within a given contig pair (pre-scaffold): 267259
                Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 696082
                ---
        Total satisfied: 396908 unsatisfied: 1164019


        Estimated insert size statistics (based on 170008 pairs):
                Mean insert size = 1951
                Median insert size = 2237
REPEATS:
        Number of repeated edges = 1569
------------------------------------------------------------

################################################################################

Looks like for lib2 i should adjust the mean, but for lib1, I am wondering what is causing 50% of reads to not satisfy the distance, which is basically 280 +/-224.

Any thoughts on how I could improve my scaffolding?


All times are GMT -8. The time now is 01:02 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.