Seqanswers Leaderboard Ad

**SNPsaurus** · 06-11-2014, 02:50 PM

The usual suspects are incorrect DNA quantification, and DNA quality. A lot of sheared DNA will convert into less of a library than a smaller amount of good DNA. Pico-green quantification should be reliable. Some species have PCR or restriction enzyme inhibitors that may come through genome extraction in variable ways. Any funny colors in the extracted DNA?

Is there a file of reads unassigned to a barcode? If you do have complexity issues, you might get Ns in the index. But you aren't going to get 10M reads for each sample on one MiSeq run, so that probably isn't the issue.

Did you amplify, pool and size select? Did you run out the samples on a gel before making libraries and post-amplification?

**nucacidhunter** · 06-11-2014, 03:29 PM

You have quantified input DNA with PicoGreen so if all samples are from the same species, as SNPsaurus has mentioned the issue could be DNA quality and digestibility of samples. Barcodes design is not optimal and colour balance is poor which would affect quality of your barcode reads and increase likelihood of barcode mis-assignment during demultiplexing as well. I think without PhiX spike-in your run would have not been successful specially if it was on HiSeq without taking appropriate measures to tackle low diversity in crucial cycles of 1-4. Your total read number is above 30M and combined with low diversity I would expect lower than usual quality. There is also possibility of barcode bias, but I would not use them again to check for that. Barcodes 3 and 10 do not destroy enzyme recognition site which can be a problem if it was not cleaned or deactivated before ligation. Other possibility is issues in annealing your adapter oligos which can be checked by PicoGreen.

**NextGenSeq** · 06-13-2014, 11:10 AM

You have to quantify the library accurately after you make it. We use KAPA QPCR.

**SNPsaurus** · 06-13-2014, 12:01 PM

It looks like they quantified the pooled library, since it got 26M reads on a MiSeq. With RAD-Seq or ddRAD/GBS type libraries it isn't a usual practice to quantify the individuals, since many of the steps are done post-pooling (size selection, PCR). Also, a more typical scenario would be sequencing 96 or more samples in a lane, so it is hard to justify spending $5/sample (qPCR, renormalization) when you are spending $10/sample on sequencing anyway. In this case, some checking of the individuals just before pooling would have helped, as you said, NextGenSeq.

**apfuentes** · 06-17-2014, 10:40 AM

Originally posted by SNPsaurus View Post

The usual suspects are incorrect DNA quantification, and DNA quality. A lot of sheared DNA will convert into less of a library than a smaller amount of good DNA. Pico-green quantification should be reliable. Some species have PCR or restriction enzyme inhibitors that may come through genome extraction in variable ways. Any funny colors in the extracted DNA?

Is there a file of reads unassigned to a barcode? If you do have complexity issues, you might get Ns in the index. But you aren't going to get 10M reads for each sample on one MiSeq run, so that probably isn't the issue.

Did you amplify, pool and size select? Did you run out the samples on a gel before making libraries and post-amplification?

Hello SNPsaurus, thanks a lot for your input, it’s really appreciated! Below I will give a response to each of your questions:

- Any funny colors in the extracted DNA?
After extraction, I run total genomic DNA in agarose gels twice (see image attached). In the 1st run there was a clear smear below each band but I thought it was not a major issue because in the 2nd run the smear was almost gone (although I can see a shadow…should I be worried about it?) (Gels were run at 100 V for 1 hour, used different loading dye between gels, and loaded 4 ul of DNA). Is there any way I could clean out degradation of total genomic DNA when it seems to have a high weight?

- Is there a file of reads unassigned to a barcode?
Yes. After running process_radtags (STACKS), I found that around 300 000/~30 million reads did not have a barcode, but I understand this is OK (maybe some errors in the synthesis of primers…as far as it’s not a massive problem…).

process_radtags gives you a table (see image attached) that summarizes the total number reads, how many of those did not have cut site sequence (“No Rad Tag”), how many of those were discarded because of low quality score (Q<30) (“Low Quality”) and how many passed all these filters (“Retained reads”) for each individual sample (identified with a barcode). Based on that table, I would say low quality was not my main problem, since 2 million/30 million (~7%) reads were discarded because of Q<30 (as far as I know, you always should expect to throw out 20% of reads because of low quality). So, I guess the barcodes and the 5% spike with PhiX worked well, am I right?

In contrast, missing cut site (“No Rad Tag”) seemed to be an issue for some individuals, especially for those with the lowest number of total reads (S4R/sv053, S4R/sv053*, F4XR/sb043=30-70% of the reads did not have cut site) *Note: I sequenced 2 individuals twice within the same Miseq run (S4R/sv053 and F4XR/sb043).

In general, all samples showed ~1-4% (~10 000 reads) of reads with missing cut site. Thus I should improve the digestion reaction conditions and make sure DNA is digested completely. I followed this protocol: 1X NEB buffer 4, 20 U SphI-HF, 2 U EcoRI-HF, 200 ng of DNA template-900Mb genome size, and ddH2O to complete a total reaction volume of 50 ul, incubation 1h at 37C). I plan to use new digestion conditions, as follows: 1X NEB CutSmart buffer, 300 ng of DNA template-900Mb genome size, 10U of SphI-HF, 50U of EcoRI, and ddH2O to complete a total reaction volume of 50 ul, incubation 3h at 37C. (Suggested protocol for NEB SphI-HF is: 20 U of RE for 1 ug of DNA, and for EcoRI-HF is 100U of RE for 1 ug of DNA, incubation 37C, 5-15 min), any comments about this plan?

As I mentioned before, samples S4R/sv053 and F4XR/sb043 were sequenced twice (using different barcodes) within the same Miseq run. F4XR/sb043 got ~400 000 reads and ~4000 reads, whereas S4R/sv053 got ~5500 and ~4400 reads (correspondent bands in agarose gels are indicated with a red dot in the image attached and does not seem to show much degradation=smear). This makes me think about only one thing…pipetting problems…even though I am super careful with this! I wonder if the quick spin (3000 rpm 30 sec in a Qiagen centrifuge) I am giving to the DNA plate before making the DNA dilutions could cause problems by making all the DNA to get sticked to the bottom of the well. Moreover, I will double check the picogreen protocol.

- Did you amplify, pool and size select?
I followed these steps: Quantified DNA with Picogreen, Digest 200 ng/each sample (no heat kill enzyme), clean enzyme with speed beads, picogreen before ligation (all samples to 100 ng), ligate P1 and P2 adapters, pool individual DNA samples and clean up with speed beads, fragment size selection using pippin-prep, PCR to enrich for fragments with P1 and P2 adapters and addition of Illumina adapters, clean up with speed beads, library quantification with qPCR – KAPA kit, final dilution to put into the cartridge.

- Did you run out the samples on a gel before making libraries and post-amplification?
Yes, after DNA extraction and after PCR (pooled samples).

Attached Files

miseq run3.png (137.0 KB, 57 views)

**apfuentes** · 06-17-2014, 10:43 AM

Originally posted by nucacidhunter View Post

You have quantified input DNA with PicoGreen so if all samples are from the same species, as SNPsaurus has mentioned the issue could be DNA quality and digestibility of samples. Barcodes design is not optimal and colour balance is poor which would affect quality of your barcode reads and increase likelihood of barcode mis-assignment during demultiplexing as well. I think without PhiX spike-in your run would have not been successful specially if it was on HiSeq without taking appropriate measures to tackle low diversity in crucial cycles of 1-4. Your total read number is above 30M and combined with low diversity I would expect lower than usual quality. There is also possibility of barcode bias, but I would not use them again to check for that. Barcodes 3 and 10 do not destroy enzyme recognition site which can be a problem if it was not cleaned or deactivated before ligation. Other possibility is issues in annealing your adapter oligos which can be checked by PicoGreen.

Hello nucacidhunter, thank you for your comments! Below I will give a response to each of your questions:

-If the barcodes were the issue, I would expect to see a lot of reads discarded because of low quality score and missing barcode sequence. Considering that only 2 million/30 million (~7%) reads were discarded due to Q<30 (as far as I know, you always should expect to throw out 20% of reads because of low quality) and ~300 000 reads were discarded because of “Ambiguous barcode”, I think the barcodes and the 5% spike with PhiX worked well, am I right?

- I did not heat kill the enzyme, but I did a clean up after RE digestion (before ligation) using Speed beads. How could I verify adaptor annealing issues using picogreen?

**apfuentes** · 06-17-2014, 10:44 AM

Originally posted by NextGenSeq View Post

You have to quantify the library accurately after you make it. We use KAPA QPCR.

Hello NextGenSeq, thanks for your feedback!

- Yes, I used KAPA qPCR but in a smaller total volume (15 ul vs 20 ul – last, volume suggested by KAPA hand out). I used this protocol: 4 ul DNA/standard, 9 ul master mix (vs. 12 ul), 2 ul ddH2O (vs. 4 ul). Other person in the lab tested these modifications and found similar DNA quantification values, so I decided to follow this protocol (to save some $).

**SNPsaurus** · 06-17-2014, 01:59 PM

Since you have so few samples it would be worth it to separately amplify each sample so you know that a library was generated. I think nucacidhunter has some good ideas, particularly the oligos annealing to make adapters. I know that has tripped up labs in the past. PicoGreen detects dsDNA so that would tell you if they are annealed.

I can't see the teeny attachment well enough to tell what is going on, but I could only see high molecular weight DNA in a subset of the wells, which could be a problem if true.

**nucacidhunter** · 06-17-2014, 07:53 PM

As you have noted the issue is not caused by barcodes in this instance, even though the design is not optimum. Ideally there should be 25% of each nucleotide in every position up to 12 base at the start of the reads. I guess that the problem is in annealing oligos for adapters or DNA quality. As SNPsaurus has mentioned using dsDNA PicoGreen reagent or Qubit will give an indication of annealing efficiency. You would expect ½ or less of concentration for your Y shaped adapter (I assume SphI adapter) in comparison with your PstI adapter (barcoded) because it is only double stranded in shorter region. There are other ways as well to check adapters.

I look at DNA quality from few points: 1) integrity and molecular weight (with 0.8% agarose gel run slowly), 2) 260/280 and 260/230 ratios indicating presence of proteins and phenolic. You can also do a test digest on your DNA to check for digestibility which you can find suggestions in NEB or other suppliers web site. Inhibitors can be removed by cleaning up your DNA preps with Zymo genomic DNA clean and concentrator kit or Ampure XP beads (1.5x) depending on the absence or presence of sticky material in your prep.

**Niels Wagemaker** · 03-22-2016, 07:18 AM

DNA quality is essential: get rid of ethanol residues...
I quantify my GBS samples before pooling by QPCR with dedicated (species specifiec) primers (test a few primers on loci that are always abundantly present), that works fine.
directly amplifying for quantification your radseq or GBS libs with general primers won't work very well...

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Problem with uneven distribution of rads among individual libraries

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News