Seqanswers Leaderboard Ad

**pmcget** · 03-26-2013, 06:31 AM

Hi yaximik,

There are two difficulties I see with this suggestion:
1. The contamination of archaelogical samples with other organisms - fungi, bacteria etc - how would you separate novel ancient DNA from contaminant organism DNA?

2. the DNA fragmentation of the ancient sample.
The denovo approaches with nextgen technology rely on long high quality reads. One of the problems as DNA degrades is that the chromosomes fragment into smaller and smaller pieces.

As a practical example: an ancient sample I worked on (sample ~7000 years old) only about 23% of the DNA extracted could confidently be attributed to the organism concerned and the median DNA fragment size was about 40 bases long.

There are likely to be some better samples out there that could get around fragment size issue (e.g. some of the mammoth samples). But the ambiguity between contamination and novel target sequence would be a challenge.

**yaximik** · 03-26-2013, 07:00 AM

The problems with ancient DNA samples are well known, that is not the question. The question is whether it makes sense to try de novo, how useful it may be, and how far one can get even with two high depth hominid samples available - with all these problems in place.

Paabo's papers mention that bacterial and other contaminations were removed, although not much details is given. I am not sure how good would be DeconSeq for this task, but it is using BWA anyway, so I presume Paabo group used BWA for identifying contaminations with some tresholds.

**pmcget** · 03-26-2013, 08:07 AM

The reason I made those points was that they directly impact on your proposal:

1. It isn't easy to identify the reads that should be used in a denovo assembly
2. Most of the reads generated are below the size range that denovo assemblers require

You also seem to be under the misapprehension that contaminant sequences are positively identified in these projects. If you read the detailed methods you will see that they are rather what is left over after alignment to the reference genome.

Just a moment...

http://www.sciencemag.org/content/328/5979/710/rel-suppl/3879465158348b39/suppl/DC1

You can take the unaligned reads afterwards and use software such as MEGAN to determine its composition - but the line between unaligned contaminant DNA and unaligned potential novel target sequence is by no means clear.
Also note that Paabo's group uses an enzymatic treatment to reduce the amount of contaminant DNA in their library - but it doesn't eliminate the problem (see SOM1).

Thats not to say that all these problems are insurmountable. I'm sure it would make an interesting paper if you could come up with a method of disentangling the target from the contamination and generate some novel hominid contigs.

**Simon Anders** · 03-26-2013, 09:50 AM

Contaminants are not the only problem. Assume you wouldn't have the human reference sequence (and nothing close, such as another primate sequence, either), and I give you a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?

Maybe I underappreciate the recent progress in de-novo assembly tool, but I highly doubt that it is already possible to assemble a vertebrate genome from short reads only.

**HESmith** · 03-26-2013, 10:25 AM

Simon is correct. Decent assembly of even microbial genomes requires longer reads. AFAIK, the current benchmark is a ~4MB Geobacter assembly from Illumina plus 454 (250-450 bp) reads.

**pmcget** · 03-27-2013, 04:51 AM

The Assemblathon project/contest (and related commentary) is a good indicator of the state of the art in de novo assembly:

The Assemblathon

http://assemblathon.org/

An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly....

**yaximik** · 03-27-2013, 05:03 AM

a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?

That is exactly the question I am asking, in a combined form. I feel the consensus (including silent replies) is "that's not possible". But how far one can go? It is certainly true that brutal force assembly from short reads will not get far, especially for an eukaryotic genome with all introns, repeats, and other non-coding sequences. The assembly itself is not the goal per se, as any length contig would not make more sense that a set of unassembled reads if no biologically meaningful information can be recovered from it. If gazillions of short reads is the only available dataset, say for Denisova or Altai, some other information must be brought in to advance the task, which is recovering new biological information, not just assembly of as long contigs as possible. Aligning to a reference genome is one approach, but it has limitations. What other approaches can be used to pull out what could have been missed? This is what I am polling for.

**pmcget** · 03-27-2013, 05:09 AM

Also developments in denovo space are moving very quickly with incorporation of long Pacbio reads and Moleculo pseudo-contigs/sub assemblies. Using this type of data it may be possible to revert back to contig overlap assemblers rather than use de Bruijn based algorithms....

Interesting reports from the frontier from Mike Schatz at AGBT2013

http://schatzlab.cshl.edu/presentations/2013-02-20.AGBT.Assembling%20Crop%20Genomes.pdf

A coming of age for PacBio and long read sequencing? #AGBT13

http://www.labspaces.net/blog/1618/A_coming_of_age_for_PacBio_and_long_read_sequencing___AGBT___

Summary of the Moleculo and PacBio talks.

..and some commentary from PAGXXI

http://nextgenseek.com/2013/01/update-on-moleculo-technology-from-pagxxi/

However each of these approaches assumes high quality DNA libraries to start with. Not at all clear that they would help with ancient (highly fragmented) DNA.

**HESmith** · 03-27-2013, 06:37 AM

I've returned to your original questions, since some of the responses are not directly applicable.

Originally posted by yaximik View Post

So, my questions are:
- Would it still make sense to do just de novo assembly, can that produce useful information?
- If it can, how would YOU approach this? Overall strategy, tools to use?
- How to evaluate quality of the assembly?
- Assuming from archeological specimens that you are likely dealing with hominoids, or at least primates, what kind of biological information would you try to extract from assembly?

For questions 1-3: De novo assembly from short reads only is unlikely to work, but it's easy enough for you to test. Download a human sequence data set, trim to 50bp single reads, use your favorite assembler, and compare the results to the reference genome (using Assemblathon metrics or one of the many published validation tools).

For question 4: That's entirely up to you. The Neanderthal and Denisovan studies were interested in the evolutionary relationship to modern Homo sapiens, so alignment to that reference was appropriate. You seem to be interested in novel segments, so (at a minimum) you would need to develop strategies to filter out all contaminants and then assemble the remaining short reads. Convincing yourself (and reviewers) that you've succeeded will be challenging.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Pooling: how YOU would do that?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News