Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dealing with super abundant transcripts in RNAseq kirby Bioinformatics 10 03-15-2013 06:12 AM
Single end RNAseq- no BAM file generated ibn.adam RNA Sequencing 0 10-28-2011 06:53 AM
RNASeq removing PCR bias schaffer Bioinformatics 3 10-20-2011 06:00 PM
Illumina quality bias 5' end mapardo Illumina/Solexa 4 08-17-2011 07:42 AM
Linker Bias in 454 Paired-End Libraries lzembek Sample Prep / Library Generation 5 06-03-2010 02:32 AM

Thread Tools
Old 09-06-2011, 08:02 AM   #1
Location: USA

Join Date: Mar 2010
Posts: 55
Default dealing with 5' end bias in RNASEQ

I read about random hexamer priming bias in Illumina RNASEQ (see Hansen et al., NAR 2010, vol 38 no 12). Specifically, "There is a strong distinctive pattern in the nucleotide frequencies of the first 13 positions at the 50-end of mapped RNA-Seq reads".

I assume that is the reason why my FASTQC report does not show parallel lines in the first 13 positions for GC content.

How do you deal with it? Will trimming work?

PFS is offline   Reply With Quote
Old 09-06-2011, 08:16 AM   #2
Location: USA

Join Date: Mar 2010
Posts: 55

I read more carefully the paper, and I understand that trimming the 5' end does not work.

I am curious to know whether this bias is of any real concern for differential expression analysis. If so, do you use Genominator?
PFS is offline   Reply With Quote
Old 09-06-2011, 08:44 AM   #3
I like code
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438

Personally I find the bias confuses things. After you run your reads through an aligner and process the abundances of reads aligning to genes you'll find 10's of thousands of genes with some expression. Figuring out which genes are "actually expressed" can be a little confusing. It doesn't help when the coverages are skewed due to a 5' bias. I've analyzed a run that had such a strong bias even genes with expressions in the 1000 FPKM range were not fully covered by reads. Does that mean the gene wasn't there? Probably not - but it certainly confuses the issue. I've also heard of the trimming idea - didn't work for me. We just took the expressions we got and went with it.
sdriscoll is offline   Reply With Quote
Old 09-06-2011, 11:07 AM   #4
Senior Member
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317

Does anyone see this with the Roche Rapid cDNA prep and 454 sequence? I would think so, because it also uses hexamers for cDNA synthesis/2nd strand synthesis. But if not, it might be that a minor change in the cDNA synthesis protocol could remove the bias.

How about using random 9-mers?

Or "balanced" 6-mers? (Where all 4096 possible six base sequences are synthesized separately, quantitated and pooled to ensure equi-molar amounts of each 6 base sequence.)

pmiguel is offline   Reply With Quote
Old 09-06-2011, 10:27 PM   #5
Simon Anders
Senior Member
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994

Originally Posted by PFS View Post
I am curious to know whether this bias is of any real concern for differential expression analysis.
No, not really. The bias will affect estimates of absolute expression, but once you calculate a fold change for a gene by comparing several samples, it should cancel out.

This holds if the patterns are the same in all samples. If they are not, you might get better results when adjusting for it. This is at least what Hansen et al. claim in their follow-up paper, a preprint of which you can find here:
Simon Anders is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:18 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO