Hello,
I am currently trying to scaffold an assembly with SSPACE, and can already see some progress.
The first library is just paired end, and the second library is mate pairs.
This is my library file:
I am a bit confused about the 5th columns which is the "deviation of the mean distance". From the example in the tutorial, a deviation of 0.75 in a 200 bp insert, accepts distances of 150 to 250. I am assuming because 0.75*200 = 150? Anyways, I am asking because in both of my libraries I got a large % of reads that "calculated distances out-of-bounds".
I know in mate pairs you have a lot of paired end contamination, but the first library was paired end, and I had almost 50% of reads that did not satisfy the distance. I am pasting library stats:
Lib1, paired end:
Lib2, mate pair:
Looks like for lib2 i should adjust the mean, but for lib1, I am wondering what is causing 50% of reads to not satisfy the distance, which is basically 280 +/-224.
Any thoughts on how I could improve my scaffolding?
I am currently trying to scaffold an assembly with SSPACE, and can already see some progress.
Code:
SUMMARY: ------------------------------------------------------------ Inserted contig file; Total number of contigs = 5308 Sum (bp) = 42486218 Total number of N's = 158184 Sum (bp) no N's = 42328034 Max contig size = 134487 Min contig size = 1000 Average contig size = 8004 N50 = 17056 After scaffolding lib1: Total number of scaffolds = 3392 Sum (bp) = 42503701 Total number of N's = 178406 Sum (bp) no N's = 42325295 Max scaffold size = 190774 Min scaffold size = 1000 Average scaffold size = 12530 N50 = 27904 After scaffolding lib2: Total number of scaffolds = 1820 Sum (bp) = 42986473 Total number of N's = 661945 Sum (bp) no N's = 42324528 Max scaffold size = 365239 Min scaffold size = 1000 Average scaffold size = 23618 N50 = 50457 ------------------------------------------------------------
This is my library file:
Code:
lib1 GDR-16_65bp_R1.fastq GDR-16_65bp_R2.fastq 280 0.8 FR lib2 MPNC_65bp_R1.fastq MPNC_65bp_R2.fastq 2411 0.5 FR
I know in mate pairs you have a lot of paired end contamination, but the first library was paired end, and I had almost 50% of reads that did not satisfy the distance. I am pasting library stats:
Lib1, paired end:
Code:
LIBRARY lib1 STATS: ################################################################################ MAPPING READS TO CONTIGS: ------------------------------------------------------------ Number of single reads found on contigs = 9149900 Number of pairs used for pairing contigs / total pairs = 3517598 / 3646662 ------------------------------------------------------------ READ PAIRS STATS: Assembled pairs: 3517598 (7035196 sequences) Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 280 +/-224): 1668422 Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 5097 Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 7824 --- Satisfied in distance/logic within a given contig pair (pre-scaffold): 240094 Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 1596161 --- Total satisfied: 1908516 unsatisfied: 1609082 Estimated insert size statistics (based on 1673519 pairs): Mean insert size = 240 Median insert size = 230 REPEATS: Number of repeated edges = 1665 ------------------------------------------------------------ ################################################################################
Code:
LIBRARY lib2 STATS: ################################################################################ MAPPING READS TO CONTIGS: ------------------------------------------------------------ Number of single reads found on contigs = 5924956 Number of pairs used for pairing contigs / total pairs = 1560927 / 1708281 ------------------------------------------------------------ READ PAIRS STATS: Assembled pairs: 1560927 (3121854 sequences) Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2411 +/-1205.5): 129649 Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 40359 Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 427578 --- Satisfied in distance/logic within a given contig pair (pre-scaffold): 267259 Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 696082 --- Total satisfied: 396908 unsatisfied: 1164019 Estimated insert size statistics (based on 170008 pairs): Mean insert size = 1951 Median insert size = 2237 REPEATS: Number of repeated edges = 1569 ------------------------------------------------------------ ################################################################################
Any thoughts on how I could improve my scaffolding?