SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ideas on collecting quality scores per base in an illumina fastq file brachysclereid Bioinformatics 11 12-05-2011 02:00 PM
Base qualities for Illumina Sequencing fongchun Bioinformatics 6 11-29-2011 01:04 AM
base composition and base calling arolfe Illumina/Solexa 2 07-29-2011 08:50 AM
recalibration using second base call for Illumina drio Bioinformatics 1 03-04-2010 05:24 PM
In Sequence: New Base Caller Improves Accuracy of Long Illumina Reads; May Allow De N Newsbot! Illumina/Solexa 0 07-22-2008 02:06 PM

Reply
 
Thread Tools
Old 09-03-2010, 08:55 AM   #1
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default base composition variation in Illumina runsH

Hi everyone,

We've recently done a set of mRNA-seq runs using the Illumina platform, 75bp reads. The library was random primed, and polyA selected.

We notice that the GC content of the first ~12 bases fluctuates rather dramatically, suggesting a pretty strong bias in the transcripts that are being sequenced. We do not believe that these fluctuations are caused by adaptors. I've been told that random priming is not exactly random, and that this type of bias is not atypical. Attached are plots of base compostion (%A, %G etc.) for each position across all reads.



I just wanted to confirm with a broader audience that other people see things like this in their data. Or whether I should be terribly depressed about the condition of our data. Note that these images are from 5 different runs (each 1 lane) sequenced in two different batches (months apart) and that the pattern of the fluctuations is consistent in all 5.

Also, FYI, I used fastQC, a very handy little software to make these ( and a number of other) QC plots.

Thanks for your input!

Chris

Last edited by chrisbala; 09-03-2010 at 09:00 AM.
chrisbala is offline   Reply With Quote
Old 09-03-2010, 09:14 AM   #2
pzumbo
Member
 
Location: NY

Join Date: Mar 2009
Posts: 11
Default

*not* seeing those patterns following illumina's poly-a, random-primed mrna-seq protocol would be disconcerting.

i don't think this should lessen, however, being terribly depressed about the conditions of the data!
pzumbo is offline   Reply With Quote
Old 09-03-2010, 11:04 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Check out this paper:

Hansen et al. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res (2010) (link)

They analyze in depth the biases in the Illumina random hexamer priming method.
kmcarr is offline   Reply With Quote
Old 09-03-2010, 11:14 AM   #4
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default bias correction

THanks!

Any thought about/experience with the bias correction proposed in the paper above? I'll give it a shot and see what happens...
chrisbala is offline   Reply With Quote
Old 09-07-2010, 02:30 PM   #5
chrisbala
Member
 
Location: North Carolina

Join Date: Jan 2010
Posts: 82
Default base composition differences

This is not exactly related to my posts above - but does anyone know of any methods to correct for base composition differences among runs?

Here I'm not talking about differences between sites within reads as above, here I am talking about differences among libraries in gc content.
chrisbala is offline   Reply With Quote
Reply

Tags
base composition, fastqc, illumina, quality control

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO