Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ancient DNA adaptor removal and read merging

    Hi everyone,

    I've seen a few people doing ancient DNA work on this forum and I thought they might be able to help me out on a Bioinformatics question related to the analysis of an ancient DNA sample sequenced using Illumina HiSeq. Ive been following a protocol written by Kircher et al. (http://www.ncbi.nlm.nih.gov/pubmed/22237537) regarding the merging of paired end sequence data for small insert libraries. I have a library insert size of between 180-300bp and I'm working with read lengths of 100bp. Is this protocol appropriate for this project?

    Alternatively, is it just easier to remove adaptors, quality trim and do single-read mapping rather than merging?

    Cheers

  • #2
    I have not read that paper, but could you in more detail describe the wet lab steps, how many lanes of data you have, and what your reservations are regarding following what is done in that paper?

    Comment


    • #3
      Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

      Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..

      Comment


      • #4
        Originally posted by jimmybee View Post
        Its a protocols paper, so nothing biologically relevant. I was asking advice as whether the process of merging with small insert sizes is common, appropriate given the stats I have given in my post, and whether this was a standard for ancient DNA analysis. There's not many good Bioinformatics methods papers for ancient DNA out there so it would be great to get advice on what others use.

        Only have one lane - about 158 million pairs. Wet lab steps isn't my area sorry..
        Oh, I see, you mean merging the two paired ends if they overlap to yield one single end read? I thought you meant merging multiple lanes of data. I'll look at that paper when I get to work and can access it but my gut instinct is that since some of your insert sizes are >200bp then there is no point in merging; I'd rather align all of the data identically. But, I'll glance at that paper in a couple of hours.

        Comment


        • #5
          So glancing through the paper I think it's probably fine to follow it. If I had more time I'd read it in detail so perhaps somebody else will chime in.

          Comment


          • #6
            Thanks mate. Looks like a good way to go, especially considering my lack of experience in the QC of small insert libraries in paired end sequencing

            Comment


            • #7
              I actually wrote something that did something similar for my masters project, unfortunately it wasn't as intelligent as this in choosing the most likely overlap. Alas I don't have access to the final code (it's buried on a UCL fileserver which i no longer have access to). However, it was based on this: http://almlab.mit.edu/vibrioGenomes/SHERA_temp/ which might be worth a look.

              Comment


              • #8
                Ok no problem, I'll have a look at it. All good ideas

                Comment


                • #9
                  Check out pandaseq, works pretty well:

                  Background Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:07 PM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X