Seqanswers Leaderboard Ad

**pmiguel** · 06-16-2011, 03:45 AM

Originally posted by boetsie View Post

[...]
Remember though, that one of the repeated elements is also included in the final assembly, so the repeats should be subtracted from the final scaffolds. So if contigA is repeated 4 times with a size of 1300bp. The 1300bp should be subtracted from the final assembly, since the contig is already present within the scaffolds.
[...]
Boetsie

Hi Boetsie,
If there were a repetitive element present in 10 copies in a genome that assembled into a single contig, would SSPACE only place a single copy of that element in the final assembly? Or am I misreading you?
--
Phillip

**boetsie** · 06-16-2011, 04:10 AM

Originally posted by pmiguel View Post

Hi Boetsie,
If there were a repetitive element present in 10 copies in a genome that assembled into a single contig, would SSPACE only place a single copy of that element in the final assembly? Or am I misreading you?
--
Phillip

In short, yes

**pmiguel** · 06-16-2011, 05:15 AM

Originally posted by seb567 View Post

OK, problem solved.

[...]

https://github.com/sebhtml/ray/tarball/v1.6.1-rc1

seb

Hi seb,
We still get the "DetectionFailure: Yes" line in the ".LibraryStatistics.txt" file.
Also the ".RayVersion.txt" file gives "Ray version: 1.6.0". So maybe the link above is to an older version?

--
Phillip

**pmiguel** · 06-16-2011, 05:34 AM

Originally posted by boetsie View Post

In short, yes

Just wondering what the current state of the art is in full genome assembly...

If there were 10 identical copies of an 1000 bp element scattered across an otherwise single copy genome would an assembler be able to reconstruct the genome without gaps? Say none of the elements were near each other and sufficient mate end coverage existed. That is, 30X coverage with 2 kb ME reads.

In principle seems it should be possible, but I don't know if modern assemblers would do so.

If not, would SSPACE reconstruct a gapless genome, or would it still produce a set of 10 scaffolds with one copy of the repetitive element?

--
Phillip

**boetsie** · 06-16-2011, 05:42 AM

If the library is larger than the repeated element, SSPACE will probably generate a single scaffold, though with gaps. The repeated contig will be present only once though.
If the library is smaller than the repeated element, SSPACE will generate 10 scaffolds, in one scaffold the repeated contig is present.

I'm not sure how other assemblers/scaffolders are doing this, if they include all repeats or not.

Through gap closing the remaining gaps can be filled. Currently, i'm working on a script to do this.

Boetsie

**pmiguel** · 06-16-2011, 07:48 AM

How to span very large repetitive blocks.

Hi Boetsie,
So would "gap closing" take the form of pulling out all the mates of the reads within the library length at the ends of contigs and attempting to assemble them into a contig that can span the gap? Or would it involve looking for individual reads that span the actual junction between the repetitive region and the single copy one that flanks it?

I would like to point out that even extremely large repetitive blocks might contain (small) segments that are, effectively, single copy. This is because large repetitive blocks are often formed by nested insertions of (high copy number) transposable elements (TEs).

Just to be clear, imagine one TE inserting between two single copy genes in a genome. Then in a later generation, imagine another TE inserting into the first one. This process can, and does, continue until you might have a > 100 kb block of highly repetitive DNA separating the two genes.

Because this block may comprise many TEs, many of which have copy numbers in the hundreds or thousands, it might seem hopeless to close the sequencing gap this represents. But it may not be hopeless.

Even though the TEs have high copy numbers in the genome, their junction with the DNA into which they inserted will likely still be unique. The effect is that even if your two low copy contigs are separated by an "ocean" of repetitive sequence, there likely will be small unique insertion-site-junction sequence "stepping stones" that could allow this ocean to be traversed.

TEs are rarely longer than 20 kb. So repetitive blocks formed by clusters of them may be traversable in this manner.

--
Phillip

**narain** · 08-26-2011, 12:41 AM

Dear Boetsi

I tried SSPACE on SOAPdenovo contig file which had a size of 6.2 GB. SSPACE crashed giving error of that the characters exceeded 2^32-1 characters! Does SSPACE not work for huge contig files ?

Also , in the library file that we specify if I specify the zipped fasta files such as .fa.gz I get a different result (N50 = 1440) than when I provide unzipped files such as .fa (N50=1990) . So I believe, SSPACE does not prefers taking compressed files as input such as .gz files.

Aby

**James Hane** · 11-20-2011, 08:53 PM

Hi,

I've found BGI's gapcloser works fairly well in conjunction with SSPACE... but looking forward to the release of GAPCLOSURE...

SSPACE was useful to me for finishing off abyss/velvet assemblies using illumina mate-pairs... the mate-paired data I obtained appeared to have quite high levels of "shadow library" contamination, so SSPACE's requirements for correct read orientation and expected separation distance appears to be a good way of reducing mistakes due to this contamination.

Cheers,
James Hane

**pmiguel** · 11-21-2011, 04:36 AM

Hi James,
What do you mean by "shadow library"?

--
Phillip

**James Hane** · 11-21-2011, 06:55 AM

Hi Phillip,

my service provider uses the term "shadow library" and the name stuck with me... i'd appreciate if you could enlighten me to its more common pseudonyms.

During the construction of Illumina mate pair libraries (as I understand it) the termini of very large fragments are circularised together... these are then fragmented and shorter fragments containing the joined termini (which are eventually sequenced in the <-- --> orientation relative to the original genomic sequence) are purified. However this process is inefficient and can be contaminated to various degrees by (non-circularised) contaminating short fragments (still in the original --> <-- orientation and not separated by a large distance).

If you were to reverse complement your mate-reads back to the --> <-- orientation (how i do it anyway) and align these back to a reference genome... the end result is some reads aligning large distances apart in the FR orientation and some contaminating reads aligning a short distance apart in the RF orientation. (i've noticed that as the mate pairs get bigger the shadow library contamination is bigger too - would appreciate if anyone else noticing this would share their experiences)

This is pretty bad for scaffolding a de novo assembly... and some assemblers i.e. velvet can allow for some level of contamination. SSPACE takes read pair alignments and expected separation distances of pairs into account when it joins scaffold ends together, minimising the "shadow library" problem to some extent.

Cheers,
James

**pmiguel** · 11-21-2011, 07:26 AM

James,
I don't have a term for this phenomenon. So "shadow library" is fine with me. BTW, near as I could tell ABySS-PE also handles shadow library contamination without problems.

--
Phillip

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News