Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yaximik
    Senior Member
    • Apr 2011
    • 199

    Pooling: how YOU would do that?

    No doubt, a very valuable asset of this community is diverse experience and broadest expertise one could ever reach out for. When you read a paper, you see an already completed path from data to conclusions and quite often there is no practical way for you alone to examine other paths. So I seek help from the community, kind of brain storming.
    Assembly of both Neanderthal and Denisova genomes were done by aligning short reads to the human reference. But what about fragments that did not match the reference in any meaningful way? Would not that mean that chunks of really different genetic information are simply lost? Perhaps de novo assembly could help, but the problems are short reads and complete lack of other information available for DNA from living organisms, such as transcriptome, BAC ends and so on. Although genomes of two archaic hominids are now sequenced at sufficiently high depths, only short reads are available.
    So, my questions are:
    - Would it still make sense to do just de novo assembly, can that produce useful information?
    - If it can, how would YOU approach this? Overall strategy, tools to use?
    - How to evaluate quality of the assembly?
    - Assuming from archeological specimens that you are likely dealing with hominoids, or at least primates, what kind of biological information would you try to extract from assembly?
  • pmcget
    Member
    • Nov 2007
    • 28

    #2
    Hi yaximik,

    There are two difficulties I see with this suggestion:
    1. The contamination of archaelogical samples with other organisms - fungi, bacteria etc - how would you separate novel ancient DNA from contaminant organism DNA?

    2. the DNA fragmentation of the ancient sample.
    The denovo approaches with nextgen technology rely on long high quality reads. One of the problems as DNA degrades is that the chromosomes fragment into smaller and smaller pieces.

    As a practical example: an ancient sample I worked on (sample ~7000 years old) only about 23% of the DNA extracted could confidently be attributed to the organism concerned and the median DNA fragment size was about 40 bases long.

    There are likely to be some better samples out there that could get around fragment size issue (e.g. some of the mammoth samples). But the ambiguity between contamination and novel target sequence would be a challenge.

    Comment

    • yaximik
      Senior Member
      • Apr 2011
      • 199

      #3
      The problems with ancient DNA samples are well known, that is not the question. The question is whether it makes sense to try de novo, how useful it may be, and how far one can get even with two high depth hominid samples available - with all these problems in place.

      Paabo's papers mention that bacterial and other contaminations were removed, although not much details is given. I am not sure how good would be DeconSeq for this task, but it is using BWA anyway, so I presume Paabo group used BWA for identifying contaminations with some tresholds.

      Comment

      • pmcget
        Member
        • Nov 2007
        • 28

        #4
        The reason I made those points was that they directly impact on your proposal:

        1. It isn't easy to identify the reads that should be used in a denovo assembly
        2. Most of the reads generated are below the size range that denovo assemblers require

        You also seem to be under the misapprehension that contaminant sequences are positively identified in these projects. If you read the detailed methods you will see that they are rather what is left over after alignment to the reference genome.



        You can take the unaligned reads afterwards and use software such as MEGAN to determine its composition - but the line between unaligned contaminant DNA and unaligned potential novel target sequence is by no means clear.
        Also note that Paabo's group uses an enzymatic treatment to reduce the amount of contaminant DNA in their library - but it doesn't eliminate the problem (see SOM1).

        Thats not to say that all these problems are insurmountable. I'm sure it would make an interesting paper if you could come up with a method of disentangling the target from the contamination and generate some novel hominid contigs.

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          #5
          Contaminants are not the only problem. Assume you wouldn't have the human reference sequence (and nothing close, such as another primate sequence, either), and I give you a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?

          Maybe I underappreciate the recent progress in de-novo assembly tool, but I highly doubt that it is already possible to assemble a vertebrate genome from short reads only.

          Comment

          • HESmith
            Senior Member
            • Oct 2009
            • 512

            #6
            Simon is correct. Decent assembly of even microbial genomes requires longer reads. AFAIK, the current benchmark is a ~4MB Geobacter assembly from Illumina plus 454 (250-450 bp) reads.

            Comment

            • pmcget
              Member
              • Nov 2007
              • 28

              #7
              The Assemblathon project/contest (and related commentary) is a good indicator of the state of the art in de novo assembly:

              An offshoot of the Genome 10K project, and primarily organized by the UC Davis Genome Center, Assemblathons are contests to assess state-of-the-art methods in the field of genome assembly....

              Comment

              • yaximik
                Senior Member
                • Apr 2011
                • 199

                #8
                a set of, say, 100M reads from a human sample (guaranteed pure human, no contaminants, additives or preservatives). Could you assemble it de novo?
                That is exactly the question I am asking, in a combined form. I feel the consensus (including silent replies) is "that's not possible". But how far one can go? It is certainly true that brutal force assembly from short reads will not get far, especially for an eukaryotic genome with all introns, repeats, and other non-coding sequences. The assembly itself is not the goal per se, as any length contig would not make more sense that a set of unassembled reads if no biologically meaningful information can be recovered from it. If gazillions of short reads is the only available dataset, say for Denisova or Altai, some other information must be brought in to advance the task, which is recovering new biological information, not just assembly of as long contigs as possible. Aligning to a reference genome is one approach, but it has limitations. What other approaches can be used to pull out what could have been missed? This is what I am polling for.

                Comment

                • pmcget
                  Member
                  • Nov 2007
                  • 28

                  #9
                  Also developments in denovo space are moving very quickly with incorporation of long Pacbio reads and Moleculo pseudo-contigs/sub assemblies. Using this type of data it may be possible to revert back to contig overlap assemblers rather than use de Bruijn based algorithms....

                  Interesting reports from the frontier from Mike Schatz at AGBT2013




                  ..and some commentary from PAGXXI


                  However each of these approaches assumes high quality DNA libraries to start with. Not at all clear that they would help with ancient (highly fragmented) DNA.

                  Comment

                  • HESmith
                    Senior Member
                    • Oct 2009
                    • 512

                    #10
                    I've returned to your original questions, since some of the responses are not directly applicable.

                    Originally posted by yaximik View Post
                    So, my questions are:
                    - Would it still make sense to do just de novo assembly, can that produce useful information?
                    - If it can, how would YOU approach this? Overall strategy, tools to use?
                    - How to evaluate quality of the assembly?
                    - Assuming from archeological specimens that you are likely dealing with hominoids, or at least primates, what kind of biological information would you try to extract from assembly?
                    For questions 1-3: De novo assembly from short reads only is unlikely to work, but it's easy enough for you to test. Download a human sequence data set, trim to 50bp single reads, use your favorite assembler, and compare the results to the reference genome (using Assemblathon metrics or one of the many published validation tools).

                    For question 4: That's entirely up to you. The Neanderthal and Denisovan studies were interested in the evolutionary relationship to modern Homo sapiens, so alignment to that reference was appropriate. You seem to be interested in novel segments, so (at a minimum) you would need to develop strategies to filter out all contaminants and then assemble the remaining short reads. Convincing yourself (and reviewers) that you've succeeded will be challenging.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      03-24-2025, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    57 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    50 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    201 views
                    0 reactions
                    Last Post seqadmin  
                    Working...