Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ddRAD Sequencing failure

    Hi,

    I haven't posted before but have used the forum a lot during the construction of my libraries.

    I recently received a second library back from a sequencing facility and after a brief look through it seems that the sequence data is almost all random rather than the targeted RAD markers aimed for. The fastQC files seem to suggest that the restriction cut sites are to blame showing much lower quality scores than the rest of the read and in most cases not matching the expected cut sequence for MseI or PstI. If I run through the processing pipeline the number of stacks assembled is drastically lower than last time and the coverage has dropped from a mean of 30x to around 5x.

    My best guess is that the for some reason the adapters (with the restriction site specific overhang) have ligated to pretty much anything and everything in the digest reaction rather than targeting the restriction fragments and as a result I have sequenced a much more diverse pool of fragments at a much lower coverage. Unfortunately that means most of it is useless

    A few other details. I have checked for adapter contamination in the reads and there is very little (I checked and double checked this throughout the library prep too) so i dont think its adapter dimerization. This is the second library using the same method and the first one worked fine. To further confuse matters the sequencing facility had to resequence the library as there was issues with overclustering. They had the same issues again the second time but reckon the data is fine to use.

    It may be a case of degraded oligos used to make up the adapters (i used the same ones for both libraries) but if so why is it just the cut site that is low quality (the rest of the adapter quality is high)? And if so i dont understand how the ligation and ligation QC during prep could have been so successful with degraded adapters or overhangs? And even if this was the case I would have thought that in a pool of purified digested DNA that most free ends in the digested pool would be RADtag ends anyway so I would expect something in my sequence data?

    Sorry for so many questions. My heart sank when i found this out and I am still digging through the data for answers. Any help in figuring out what has gone wrong would be much appreciated.

    Many thanks

    Alex
    Attached Files

  • #2
    Did the facility add extra PhiX to help the basecalling in the low-complexity region of the cut site? What is the cut site?

    It does seem like two issues. If you are getting lots of off-target sites then a common problem is your adapters having the overhang chewed back and blunt-ligating to random sheared ends. But then the restriction region should not be very low complexity and should have normal quality!

    Are the off-target sites all at 1X read depth or is it a larger set of sites with most around 5X? If the latter, then is the size-selection different this time and you just ended up sequencing more ddRAD sites than hoped for? It doesn't sound like it if the fragments truly aren't near cut sites, but just checking.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Hi SNPsaurus,

      Thanks for the response and apologies for the delayed reply. Ive been away from the computer.

      They did add PhiX yes at 5% to compensate for the low complexity region. The cut sites are for MseI and PstI. Following ligation to the adapters the cut sites should GGTAA for MseI and TGCAG for PstI. Both appear to have been affected equally by the quality drop in FastQC and the loss of specificity in adapter ligation. The size selection if anything was tighter this time round so i would have expected better coverage. I have yet to try mapping the reads to see how far they sit from the expected cut sites.

      Ive now talked further with the sequencing facility and they have explained that the overclustering issue was caused by high levels of polyclonal or multi-occupancy wells. The automatic filters then remove those wells from further sequencing resulting in a much lower final read count than expected. What they are not sure about is why there are such high levels of multi-occupancy with this library.

      With regards to the low quality cut site signal in the fastQC files, they suggest that this is usual for ddRAd and other Rad based sequence libraries where the low complexity region results in lower call confidence values from the sequencer. What is strange about this is that the PhiX should have negated this issue and that this was not observed at all in the first library.

      The question now is whether the multi-occupancy levels, the low quality cut site call confidence, and the poor adapter specificity are related in any way. My initial thought was the low quality score and the adapter specificity must be, but on second thought perhaps not. If the overhangs had been blunted and I had the blunt ends ligating to random sheared fragments Id have thought they should still have formed coherent fragments for sequencing? Could the base pairs around the blunt end ligation region be degraded in some way that would affect the call confidence during sequencing?

      Thanks again for the help and insight.

      Alex

      Comment


      • #4
        If they loaded PhiX at 5% but the library was mis-estimated and actually much higher, then the actual PhiX could be lower, leading to poor quality scores.

        Are you sequencing just one of the cut sites in read 1?

        It may be that you are still sequencing ddRAD fragments and not random genomic but the poor quality of the cut site (because of overloading) is changing the cut site sequence. If you take 30 bp from the middle of "bad" read, can you find it in other reads with some of those reads starting from a good sequence and has a cut site?

        It can be tough to amplify a very tight size selection and that does give artifacts a chance to become significant. But I think you should characterize the "off site" reads and determine if they are random and scattered or real ddRAD loci that just have bad quality cut sites that have changed the cut site sequence.

        Overall though the core problem is the overloading. You aren't going to get a good number of reads. But you do want to figure this out to see if you can just load at a better concentration and you'll be fine or if other issues are present.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Thanks SNPsaurus,

          I hadnt considered the possibility that the overloading might have affected the rest of the library, I had only looked at it the other way round. I'll definitely take a look from that perspective.

          I had intended to look for cut site 2 in smaller fragments sequenced. The reads shouldnt contain the second cut site as the library size targeted was between 400-600bp, the average size was 465bp, and I was sequencing at 150bp PE. But there will likely be some and I will have a look through and see whether any showed the second site but not the first. That might hint at some sort of overloading effect at the first site.

          There is a chance it was mis-estimated slightly as they saw a very slight increase in concentration when they re-quantified the library for the second attempt at sequencing. But all together the library was quantified 3 times (qPCR and Bionanalyser each time), once by myself and twice by the facility, and it was fairly consistent.

          I really need to sit down and have a detailed look through the raw read data like you suggest. Unfortunately im away and wont be back in front of a workstation for a week but I will update here once I get that chance.

          Thanks again for the suggestions,

          best,

          A.
          Last edited by alextheinnes; 06-24-2019, 11:39 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X