SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina quality bias 5' end mapardo Illumina/Solexa 4 08-17-2011 07:42 AM
5' bias in Illumina captured sequecning Mali Salmon Bioinformatics 0 04-12-2011 11:46 PM
Looking for some statistics on Roche(454), Illumina & SOLiD platforms Risha Bioinformatics 1 08-30-2010 06:20 AM
Looking for simple statistics on Roche(454), Illumina & SOLiD platforms Risha Introductions 0 08-29-2010 02:05 PM
Sequencing bias on the Illumina platform HTS Illumina/Solexa 0 12-19-2009 08:18 AM

Reply
 
Thread Tools
Old 05-24-2011, 05:21 PM   #1
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default GC Bias on Illumina Platforms

Does anyone know if Illumina sequencing is still expected to be biased against GC rich sequences?

In doing some bacterial genome resequencing it is clear that our data does not give Poisson coverage of the genome, but that the variance is much higher than the mean (~35 for some runs). We found that about half of this unexplained dispersion is due to a bias against GC rich sequences, and that local GC content (within 10-20 bp) is the strongest determinant of the differences in sequencing coverage, and this influence decreases to about 100 bp away from the center base, where GC content matters little if at all.

The problem is that I have seen data from other sequencing centers that do not show any GC effect, and have much lower dispersion (variance/mean around 3.5). I would love to get data like this, but can't figure out what is different about our two attempts. Anyone have any insight? Did they change their machines or protocols to avoid this?
ndelaney is offline   Reply With Quote
Old 05-24-2011, 06:51 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Are your libraries PCR free?
ECO is offline   Reply With Quote
Old 05-24-2011, 06:52 PM   #3
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Yes, as they are made from genomic preps
ndelaney is offline   Reply With Quote
Old 05-24-2011, 06:57 PM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Let me rephrase. After ligation and prior to clustering, is there an amplification step of the library?
ECO is offline   Reply With Quote
Old 05-24-2011, 07:06 PM   #5
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

So my understanding was that there wasn't any. However, I am double checking with the technician now, should know tomorrow. Can this introduce a lot of bias?
ndelaney is offline   Reply With Quote
Old 05-24-2011, 07:11 PM   #6
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

PS Thanks for the help so far!!!
ndelaney is offline   Reply With Quote
Old 05-24-2011, 07:26 PM   #7
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Seems like all the talks/papers I've seen from Broad and Sanger seem to attribute the majority of the bias to the PCR step. Getting around this involves either eliminating PCR altogether or improving the PCR conditions.

I'll try to dig up a reference if no one else jumps in.
ECO is offline   Reply With Quote
Old 05-24-2011, 07:50 PM   #8
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Alright, looks like there is an amplification step! The protocol used was the Illumina TruSeq kit protocol, which I can't get a copy of right now but hopefully will tomorrow.

I am a bit surprised if it is the amplification because when I tried to model coverage at a site the most influential predictors were the local GC content (I used bins of 10 bp around the central base, so 0-10, 10-20, 20-30, 30-40, etc). Effects declined for more distant GC content (so the number of G or C within 10 bp was more important than that within 20 to 30 bp. Past about the read length GC didn't seem to matter which made me think it was the sequencing step, but I suppose the declining effect could be do to degrading quality to).

I'll try to look into avoid GC bias in the amplification, this is rather strange! Thanks again for the help, comments from any others welcome!
ndelaney is offline   Reply With Quote
Old 05-24-2011, 07:57 PM   #9
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Here you go, they solve it with modified cycling conditions IIRC:

http://genomebiology.com/2011/12/2/R18/abstract
ECO is offline   Reply With Quote
Old 05-24-2011, 08:14 PM   #10
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Ah, interesting, also just got this response from Illumina:

"GC bias observed when sequencing standard genomes is often the result of high cluster density on the sequencing slide (much higher than the recommended specification). This is due to the differential formation of AT and GC rich clusters. For the hiSeq2000/hiSeq1000/HiScanSQ we have just released new chemistry (Truseq v3 reagents) that will allow higher density whilst reducing the bias toward AT clusters. The new chemistry will also assist in the sequencing of GC or AT rich genomes."

So it seems like there are at least two steps that could be improved, going to read the paper and mull this over more, thanks!!
ndelaney is offline   Reply With Quote
Old 05-25-2011, 12:28 AM   #11
henry.wood
Member
 
Location: Leeds, UK

Join Date: Apr 2010
Posts: 63
Default

I've noticed a further complication in some of my samples. The samples with really good quality DNA have a much greater GC bias than more degraded samples. When I sequence DNA from FFPE blocks there is pretty much no bias at all. I've presumed that it's something to do with fragmentation of the DNA. The high quality DNA can have its bias reduced after a few freeze/thaw cycles if it's taken out of the freezer a few times.
henry.wood is offline   Reply With Quote
Old 05-25-2011, 01:37 PM   #12
ndelaney
Member
 
Location: Cambridge, MA

Join Date: May 2011
Posts: 19
Default

Thanks for the responses all! It looks like substantial headway can be made on this issue by :

1- Using optimal PCR settings per the papers specification

and

2- Not overloading the cluster density.

Going to try both. Henry your suggestion sounds applicable to DNA extraction methods not used for bacteria, but thanks for passing it on! Sounds like a great hint for somebody.

Thanks again!
ndelaney is offline   Reply With Quote
Old 07-12-2011, 12:46 PM   #13
kwaraska
Senior Member
 
Location: Boston,MA

Join Date: Nov 2008
Posts: 122
Default Starting quantity

Due to the TruSeq not requiring PCR, does anyone know how they have modified that for CG rich regions? How much starting material does one use, and since each ug now is one prep-do you multiply by the number of ug or can you just treat it as one prep?
kwaraska is offline   Reply With Quote
Old 07-13-2011, 06:14 AM   #14
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,315
Default

Quote:
Originally Posted by kwaraska View Post
Due to the TruSeq not requiring PCR,
Keep in mind that the standard protocol includes a 10 cycle PCR amplification. You have to go "off protocol" to produce an amplification free library.
Quote:
Originally Posted by kwaraska View Post
Due to the TruSeq not requiring PCR, does anyone know how they have modified that for CG rich regions?
Not sure what "that" refers to in this context. As detailed up thread much of the coverage bias results from the PCR "enrichment" step.
Quote:
Originally Posted by kwaraska View Post
How much starting material does one use, and since each ug now is one prep-do you multiply by the number of ug or can you just treat it as one prep?
TruSeq DNA asks for 1 ug per sample as you state. Did you want to start with more than 1 ug for some reason?

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-13-2011, 06:22 AM   #15
Bioo Scientific
Registered Vendor
 
Location: Austin, Tx

Join Date: Oct 2009
Posts: 99
Default

For GC rich genomes, in addition to reducing overly high clusters, we definitely recommend eliminating the PCR step too. There are biases that can be attributed to the polymerase even if you optimize your PCR steps. Reducing the number of cycles helps, but we’ve found eliminating the step completely works the best. I can send you our protocol if you are interested.
Bioo Scientific is offline   Reply With Quote
Old 07-13-2011, 11:09 AM   #16
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Another approach is linear amplification using in vitro transcription.
http://www.ncbi.nlm.nih.gov/pubmed/21720315

Amplification-free paper
http://www.ncbi.nlm.nih.gov/pubmed/21431776
krobison is offline   Reply With Quote
Old 07-13-2011, 11:36 AM   #17
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

Quote:
Originally Posted by ndelaney View Post
We found that about half of this unexplained dispersion is due to a bias against GC rich sequences, and that local GC content (within 10-20 bp) is the strongest determinant of the differences in sequencing coverage, and this influence decreases to about 100 bp away from the center base, where GC content matters little if at all.

How did you determine this? Did you write your own script or did you use a software package?

On GC bias and PCR, clearly the best is to go PCR-free but if you must do PCR, Kapa Biosciences says their polymerase reduces GC bias as compared to Phusion or the TruSeq polymerase (which they say is even worse then Phusion polymerase). The Kapa polymerase is more efficient (i.e. requires less cycles to achieve the same amplification) then Phusion in my hands so it would make sense.
ETHANol is offline   Reply With Quote
Old 02-12-2012, 10:35 AM   #18
Arjan
Junior Member
 
Location: Netherlands

Join Date: Feb 2012
Posts: 3
Default

Hello. I'm working on a ChIP-Seq data set from an Illumina platform (I think GA but not sure) and I observe what looks like GA-bias in the first 5-6bp. Has anyone seen this before and does anybody have an explanation for it? So, in our data there seems to be a preference for A and G in the 5'-ends of the read.


Last edited by Arjan; 02-12-2012 at 10:41 AM.
Arjan is offline   Reply With Quote
Old 02-12-2012, 10:42 AM   #19
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

I've seen weird biases in the first base pairs. I think it is a common problem. I just trim them off before mapping. Most mapping programs give you this option exactly for this reason. I don't run the machines so I don't know where it comes from.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 02-12-2012, 10:50 AM   #20
Arjan
Junior Member
 
Location: Netherlands

Join Date: Feb 2012
Posts: 3
Default

hmm. okay. i have been trying to find literature on this but to no success so far. thanks.
Arjan is offline   Reply With Quote
Reply

Tags
coverage, gc bias, illumina, overdispersion

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO