Seqanswers Leaderboard Ad

**flxlex** · 08-01-2012, 02:43 AM

Originally posted by JamesH View Post

1. In the PacBio dataset the correction was processor intensive, but what do long reads mean for the memory requirements of de-novo assemblers? If you have very long reads does the algorithmic problem become more manageable without the need for 128-256GB of RAM?

With long, high quality reads, you need much less of them to have enough coverage for consensus calling. Less reads means less overlaps. So, assembly should take less memory and go faster.

2. Is anyone working on assemblers that will achieve this under the assumption that longer reads are inevitable, or will current tools work with minor modifications?

It would be great if people were investing in long-read assemblers already now, but I think this is a bit premature as of today.

3. I'm kind of indirectly interested in regions that have a bit of transposable action, and repetitive regions more generally. If a lot of the missing data in current assemblies is due to these two factors then what length of good quality read would be likely to resolve the majority of them?

Repeats can be resolved if there are enough reads that span them (i.e. are long enough to have flanking sequence). So, as usual, this is a species specific aspect (some species have very long repeats, in the 2-4 kb range).

Hope this helps!

**pmiguel** · 08-01-2012, 05:23 AM

Originally posted by flxlex View Post

(some species have very long repeats, in the 2-4 kb range).

Worse than that.
Maize has several 8-15 kb LTR-retrotransposon families with copy numbers >5000 and more than half of its total genome comprising this type of element. Maize is not unusual in this regard -- seems to be a common feature of most plant genomes with genome sizes > 2 gigabases or so. And below that genome size, LTR-retrotransposons are nevertheless major players, but their copy number maxima may drop into the hundreds.

--
Phillip

**krobison** · 08-01-2012, 06:57 AM

To some degree long reads are "back to the future" -- overlap-layout-consensus assemblers developed for Sanger data do well with long reads, such as MIRA and Celera Assembler (a upcoming renaissance for Phrap?)

It would appear that one of the issues addressed by string graph assemblers is long reads, though I won't claim any expert understanding in this area.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Assembly of long reads

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News