Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooling: how YOU would do that?

    No doubt, a very valuable asset of this community is diverse experience and broadest expertise one could ever reach out for. When you read a paper, you see an already completed path from data to conclusions and quite often there is no practical way for you alone to examine other paths. So I seek help from the community, kind of brain storming.
    Assembly of both Neanderthal and Denisova genomes were done by aligning short reads to the human reference. But what about fragments that did not match the reference in any meaningful way? Would not that mean that chunks of really different genetic information are simply lost? Perhaps de novo assembly could help, but the problems are short reads and complete lack of other information available for DNA from living organisms, such as transcriptome, BAC ends and so on. Although genomes of two archaic hominids are now sequenced at sufficiently high depths, only short reads are available.
    So, my questions are:
    - Would it still make sense to do just de novo assembly, can that produce useful information?
    - If it can, how would YOU approach this? Overall strategy, tools to use?
    - How to evaluate quality of the assembly?
    - Assuming from archeological specimens that you are likely dealing with hominoids, or at least primates, what kind of biological information would you try to extract from assembly?

  • #2
    Hi yaximik,

    There are two difficulties I see with this suggestion:
    1. The contamination of archaelogical samples with other organisms - fungi, bacteria etc - how would you separate novel ancient DNA from contaminant organism DNA?

    2. the DNA fragmentation of the ancient sample.
    The denovo approaches with nextgen technology rely on long high quality reads. One of the problems as DNA degrades is that the chromosomes fragment into smaller and smaller pieces.

    As a practical example: an ancient sample I worked on (sample ~7000 years old) only about 23% of the DNA extracted could confidently be attributed to the organism concerned and the median DNA fragment size was about 40 bases long.

    There are likely to be some better samples out there that could get around fragment size issue (e.g. some of the mammoth samples). But the ambiguity between contamination and novel target sequence would be a challenge.

    Comment


    • #3
      The problems with ancient DNA samples are well known, that is not the question. The question is whether it makes sense to try de novo, how useful it may be, and how far one can get even with two high depth hominid samples available - with all these problems in place.

      Paabo's papers mention that bacterial and other contaminations were removed, although not much details is given. I am not sure how good would be DeconSeq for this task, but it is using BWA anyway, so I presume Paabo group used BWA for identifying contaminations with some tresholds.

      Comment


      • #4
        The reason I made those points was that they directly impact on your proposal:

        1. It isn't easy to identify the reads that should be used in a denovo assembly
        2. Most of the reads generated are below the size range that denovo assemblers require

        You also seem to be under the misapprehension that contaminant sequences are positively identified in these projects. If you read the detailed methods you will see that they are rather what is left over after alignment to the reference genome.



        You can take the unaligned reads afterwards and use software such as MEGAN to determine its composition - but the line between unaligned contaminant DNA and unaligned potential novel target sequence is by no means clear.
        Also note that Paabo's group uses an enzymatic treatment to reduce the amount of contaminant DNA in their library - but it doesn't eliminate the problem (see SOM1).

        Thats not to say that all these problems are insurmountable. I'm sure it would make an interesting paper if you could come up with a method of disentangling the target from the contamination and generate some novel hominid contigs.

        Comment


        • #5
          Contaminants are not the only problem. Assume you wouldn't have the human reference sequence (and nothing close, such as another primate sequence, either), and I give you a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?

          Maybe I underappreciate the recent progress in de-novo assembly tool, but I highly doubt that it is already possible to assemble a vertebrate genome from short reads only.

          Comment


          • #6
            Simon is correct. Decent assembly of even microbial genomes requires longer reads. AFAIK, the current benchmark is a ~4MB Geobacter assembly from Illumina plus 454 (250-450 bp) reads.

            Comment


            • #7
              The Assemblathon project/contest (and related commentary) is a good indicator of the state of the art in de novo assembly:

              An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly....

              Comment


              • #8
                a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?
                That is exactly the question I am asking, in a combined form. I feel the consensus (including silent replies) is "that's not possible". But how far one can go? It is certainly true that brutal force assembly from short reads will not get far, especially for an eukaryotic genome with all introns, repeats, and other non-coding sequences. The assembly itself is not the goal per se, as any length contig would not make more sense that a set of unassembled reads if no biologically meaningful information can be recovered from it. If gazillions of short reads is the only available dataset, say for Denisova or Altai, some other information must be brought in to advance the task, which is recovering new biological information, not just assembly of as long contigs as possible. Aligning to a reference genome is one approach, but it has limitations. What other approaches can be used to pull out what could have been missed? This is what I am polling for.

                Comment


                • #9
                  Also developments in denovo space are moving very quickly with incorporation of long Pacbio reads and Moleculo pseudo-contigs/sub assemblies. Using this type of data it may be possible to revert back to contig overlap assemblers rather than use de Bruijn based algorithms....

                  Interesting reports from the frontier from Mike Schatz at AGBT2013




                  ..and some commentary from PAGXXI


                  However each of these approaches assumes high quality DNA libraries to start with. Not at all clear that they would help with ancient (highly fragmented) DNA.

                  Comment


                  • #10
                    I've returned to your original questions, since some of the responses are not directly applicable.

                    Originally posted by yaximik View Post
                    So, my questions are:
                    - Would it still make sense to do just de novo assembly, can that produce useful information?
                    - If it can, how would YOU approach this? Overall strategy, tools to use?
                    - How to evaluate quality of the assembly?
                    - Assuming from archeological specimens that you are likely dealing with hominoids, or at least primates, what kind of biological information would you try to extract from assembly?
                    For questions 1-3: De novo assembly from short reads only is unlikely to work, but it's easy enough for you to test. Download a human sequence data set, trim to 50bp single reads, use your favorite assembler, and compare the results to the reference genome (using Assemblathon metrics or one of the many published validation tools).

                    For question 4: That's entirely up to you. The Neanderthal and Denisovan studies were interested in the evolutionary relationship to modern Homo sapiens, so alignment to that reference was appropriate. You seem to be interested in novel segments, so (at a minimum) you would need to develop strategies to filter out all contaminants and then assemble the remaining short reads. Convincing yourself (and reviewers) that you've succeeded will be challenging.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X