Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with uneven distribution of rads among individual libraries

    Hello everyone!
    I am totally new in NGS, so I apologize for the basic question.

    I am doing ddRAD (EcoRI and SphI) for SNP analysis in a fish species with genome size ~900 Mb.

    I got my raw data from the MiSeq and found out that the distribution of reads varies a lot among individuals (each individual is identified with a barcode) (the image attached shows total number of reads -2nd column- for each barcode -first column- after running "process_radtags" in STACKS, used sliding window 0.1 and minimum quality score 20). Some libraries got 11M reads whereas others got ~8000, no good.

    I wonder, what else, other than an incorrect normalization of the DNA concentration when I started the protocol, may explain this outcome? Maybe the barcodes I used were not diverse enough among samples within the same sequencing run? A side note, I used 5% PhiX. For DNA quantification I used Quanti-PicoGreen

    Thanks in advance for any advice!
    Attached Files
    Last edited by apfuentes; 06-11-2014, 12:04 PM. Reason: typo in the title

  • #2
    The usual suspects are incorrect DNA quantification, and DNA quality. A lot of sheared DNA will convert into less of a library than a smaller amount of good DNA. Pico-green quantification should be reliable. Some species have PCR or restriction enzyme inhibitors that may come through genome extraction in variable ways. Any funny colors in the extracted DNA?

    Is there a file of reads unassigned to a barcode? If you do have complexity issues, you might get Ns in the index. But you aren't going to get 10M reads for each sample on one MiSeq run, so that probably isn't the issue.

    Did you amplify, pool and size select? Did you run out the samples on a gel before making libraries and post-amplification?
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      You have quantified input DNA with PicoGreen so if all samples are from the same species, as SNPsaurus has mentioned the issue could be DNA quality and digestibility of samples. Barcodes design is not optimal and colour balance is poor which would affect quality of your barcode reads and increase likelihood of barcode mis-assignment during demultiplexing as well. I think without PhiX spike-in your run would have not been successful specially if it was on HiSeq without taking appropriate measures to tackle low diversity in crucial cycles of 1-4. Your total read number is above 30M and combined with low diversity I would expect lower than usual quality. There is also possibility of barcode bias, but I would not use them again to check for that. Barcodes 3 and 10 do not destroy enzyme recognition site which can be a problem if it was not cleaned or deactivated before ligation. Other possibility is issues in annealing your adapter oligos which can be checked by PicoGreen.
      Last edited by nucacidhunter; 06-11-2014, 10:45 PM. Reason: Added extera points

      Comment


      • #4
        You have to quantify the library accurately after you make it. We use KAPA QPCR.

        Comment


        • #5
          It looks like they quantified the pooled library, since it got 26M reads on a MiSeq. With RAD-Seq or ddRAD/GBS type libraries it isn't a usual practice to quantify the individuals, since many of the steps are done post-pooling (size selection, PCR). Also, a more typical scenario would be sequencing 96 or more samples in a lane, so it is hard to justify spending $5/sample (qPCR, renormalization) when you are spending $10/sample on sequencing anyway. In this case, some checking of the individuals just before pooling would have helped, as you said, NextGenSeq.
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Originally posted by SNPsaurus View Post
            The usual suspects are incorrect DNA quantification, and DNA quality. A lot of sheared DNA will convert into less of a library than a smaller amount of good DNA. Pico-green quantification should be reliable. Some species have PCR or restriction enzyme inhibitors that may come through genome extraction in variable ways. Any funny colors in the extracted DNA?

            Is there a file of reads unassigned to a barcode? If you do have complexity issues, you might get Ns in the index. But you aren't going to get 10M reads for each sample on one MiSeq run, so that probably isn't the issue.

            Did you amplify, pool and size select? Did you run out the samples on a gel before making libraries and post-amplification?
            Hello SNPsaurus, thanks a lot for your input, it’s really appreciated! Below I will give a response to each of your questions:

            - Any funny colors in the extracted DNA?
            After extraction, I run total genomic DNA in agarose gels twice (see image attached). In the 1st run there was a clear smear below each band but I thought it was not a major issue because in the 2nd run the smear was almost gone (although I can see a shadow…should I be worried about it?) (Gels were run at 100 V for 1 hour, used different loading dye between gels, and loaded 4 ul of DNA). Is there any way I could clean out degradation of total genomic DNA when it seems to have a high weight?

            - Is there a file of reads unassigned to a barcode?
            Yes. After running process_radtags (STACKS), I found that around 300 000/~30 million reads did not have a barcode, but I understand this is OK (maybe some errors in the synthesis of primers…as far as it’s not a massive problem…).

            process_radtags gives you a table (see image attached) that summarizes the total number reads, how many of those did not have cut site sequence (“No Rad Tag”), how many of those were discarded because of low quality score (Q<30) (“Low Quality”) and how many passed all these filters (“Retained reads”) for each individual sample (identified with a barcode). Based on that table, I would say low quality was not my main problem, since 2 million/30 million (~7%) reads were discarded because of Q<30 (as far as I know, you always should expect to throw out 20% of reads because of low quality). So, I guess the barcodes and the 5% spike with PhiX worked well, am I right?

            In contrast, missing cut site (“No Rad Tag”) seemed to be an issue for some individuals, especially for those with the lowest number of total reads (S4R/sv053, S4R/sv053*, F4XR/sb043=30-70% of the reads did not have cut site) *Note: I sequenced 2 individuals twice within the same Miseq run (S4R/sv053 and F4XR/sb043).

            In general, all samples showed ~1-4% (~10 000 reads) of reads with missing cut site. Thus I should improve the digestion reaction conditions and make sure DNA is digested completely. I followed this protocol: 1X NEB buffer 4, 20 U SphI-HF, 2 U EcoRI-HF, 200 ng of DNA template-900Mb genome size, and ddH2O to complete a total reaction volume of 50 ul, incubation 1h at 37C). I plan to use new digestion conditions, as follows: 1X NEB CutSmart buffer, 300 ng of DNA template-900Mb genome size, 10U of SphI-HF, 50U of EcoRI, and ddH2O to complete a total reaction volume of 50 ul, incubation 3h at 37C. (Suggested protocol for NEB SphI-HF is: 20 U of RE for 1 ug of DNA, and for EcoRI-HF is 100U of RE for 1 ug of DNA, incubation 37C, 5-15 min), any comments about this plan?

            As I mentioned before, samples S4R/sv053 and F4XR/sb043 were sequenced twice (using different barcodes) within the same Miseq run. F4XR/sb043 got ~400 000 reads and ~4000 reads, whereas S4R/sv053 got ~5500 and ~4400 reads (correspondent bands in agarose gels are indicated with a red dot in the image attached and does not seem to show much degradation=smear). This makes me think about only one thing…pipetting problems…even though I am super careful with this! I wonder if the quick spin (3000 rpm 30 sec in a Qiagen centrifuge) I am giving to the DNA plate before making the DNA dilutions could cause problems by making all the DNA to get sticked to the bottom of the well. Moreover, I will double check the picogreen protocol.

            - Did you amplify, pool and size select?
            I followed these steps: Quantified DNA with Picogreen, Digest 200 ng/each sample (no heat kill enzyme), clean enzyme with speed beads, picogreen before ligation (all samples to 100 ng), ligate P1 and P2 adapters, pool individual DNA samples and clean up with speed beads, fragment size selection using pippin-prep, PCR to enrich for fragments with P1 and P2 adapters and addition of Illumina adapters, clean up with speed beads, library quantification with qPCR – KAPA kit, final dilution to put into the cartridge.

            - Did you run out the samples on a gel before making libraries and post-amplification?
            Yes, after DNA extraction and after PCR (pooled samples).
            Attached Files
            Last edited by apfuentes; 06-17-2014, 10:57 AM. Reason: Attached image

            Comment


            • #7
              Originally posted by nucacidhunter View Post
              You have quantified input DNA with PicoGreen so if all samples are from the same species, as SNPsaurus has mentioned the issue could be DNA quality and digestibility of samples. Barcodes design is not optimal and colour balance is poor which would affect quality of your barcode reads and increase likelihood of barcode mis-assignment during demultiplexing as well. I think without PhiX spike-in your run would have not been successful specially if it was on HiSeq without taking appropriate measures to tackle low diversity in crucial cycles of 1-4. Your total read number is above 30M and combined with low diversity I would expect lower than usual quality. There is also possibility of barcode bias, but I would not use them again to check for that. Barcodes 3 and 10 do not destroy enzyme recognition site which can be a problem if it was not cleaned or deactivated before ligation. Other possibility is issues in annealing your adapter oligos which can be checked by PicoGreen.
              Hello nucacidhunter, thank you for your comments! Below I will give a response to each of your questions:

              -If the barcodes were the issue, I would expect to see a lot of reads discarded because of low quality score and missing barcode sequence. Considering that only 2 million/30 million (~7%) reads were discarded due to Q<30 (as far as I know, you always should expect to throw out 20% of reads because of low quality) and ~300 000 reads were discarded because of “Ambiguous barcode”, I think the barcodes and the 5% spike with PhiX worked well, am I right?

              - I did not heat kill the enzyme, but I did a clean up after RE digestion (before ligation) using Speed beads. How could I verify adaptor annealing issues using picogreen?

              Comment


              • #8
                Originally posted by NextGenSeq View Post
                You have to quantify the library accurately after you make it. We use KAPA QPCR.
                Hello NextGenSeq, thanks for your feedback!

                - Yes, I used KAPA qPCR but in a smaller total volume (15 ul vs 20 ul – last, volume suggested by KAPA hand out). I used this protocol: 4 ul DNA/standard, 9 ul master mix (vs. 12 ul), 2 ul ddH2O (vs. 4 ul). Other person in the lab tested these modifications and found similar DNA quantification values, so I decided to follow this protocol (to save some $).

                Comment


                • #9
                  Since you have so few samples it would be worth it to separately amplify each sample so you know that a library was generated. I think nucacidhunter has some good ideas, particularly the oligos annealing to make adapters. I know that has tripped up labs in the past. PicoGreen detects dsDNA so that would tell you if they are annealed.

                  I can't see the teeny attachment well enough to tell what is going on, but I could only see high molecular weight DNA in a subset of the wells, which could be a problem if true.
                  Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                  Comment


                  • #10
                    As you have noted the issue is not caused by barcodes in this instance, even though the design is not optimum. Ideally there should be 25% of each nucleotide in every position up to 12 base at the start of the reads. I guess that the problem is in annealing oligos for adapters or DNA quality. As SNPsaurus has mentioned using dsDNA PicoGreen reagent or Qubit will give an indication of annealing efficiency. You would expect ½ or less of concentration for your Y shaped adapter (I assume SphI adapter) in comparison with your PstI adapter (barcoded) because it is only double stranded in shorter region. There are other ways as well to check adapters.

                    I look at DNA quality from few points: 1) integrity and molecular weight (with 0.8% agarose gel run slowly), 2) 260/280 and 260/230 ratios indicating presence of proteins and phenolic. You can also do a test digest on your DNA to check for digestibility which you can find suggestions in NEB or other suppliers web site. Inhibitors can be removed by cleaning up your DNA preps with Zymo genomic DNA clean and concentrator kit or Ampure XP beads (1.5x) depending on the absence or presence of sticky material in your prep.
                    Last edited by nucacidhunter; 06-18-2014, 01:16 AM.

                    Comment


                    • #11
                      DNA quality is essential: get rid of ethanol residues...
                      I quantify my GBS samples before pooling by QPCR with dedicated (species specifiec) primers (test a few primers on loci that are always abundantly present), that works fine.
                      directly amplifying for quantification your radseq or GBS libs with general primers won't work very well...

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      15 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      70 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X