SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
one transcript many genes "chimeric" BugSeq RNA Sequencing 0 02-13-2012 09:31 AM
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 06-22-2010, 08:57 AM   #1
lmc
Junior Member
 
Location: USA

Join Date: Jun 2010
Posts: 6
Default normalizing RNA-seq data to "unique transcript length" instead of "transcript length"

I've clustered expression profiles from 20 experiments for a large group of highly related genes. I have the raw read counts and normalized this data using [total read counts uniquely matching to a gene]/[total counts in experiment]*[length of transcript]. However, because these genes have a large amount of non-unique sequence, I don't think that this method is correct. I'd like to try normalizing the expression data based on [length of unique k-mers within transcript] rather than [length of transcript]. Is there an existing tool that can calculate this?
Thanks in advance any help!
lmc is offline   Reply With Quote
Old 06-22-2010, 10:43 AM   #2
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

I'm not particularly impressed with the RPKM measure either, as it is still biased towards long transcripts (Oshlack and Wakefield, Biology Direct 2009). The method found in this paper (doi:10.1093/bioinformatics/btp692) seems to be a more intelligent way of addressing this issues, though I haven't yet tested it out. I'm not sure that [length of unique k-mers within transcript] will be any better than [length of transcript] at eliminating the bias you think is there.
If you're set on doing it, though, I think you'll have to roll your own script to determine the length of unique k-mers, which may get to be fairly computationally intensive depending on how you go about doing that.
mrawlins is offline   Reply With Quote
Old 06-23-2010, 11:45 AM   #3
steven
Senior Member
 
Location: Southern France

Join Date: Aug 2009
Posts: 269
Default

Quote:
Originally Posted by lmc View Post
I've clustered expression profiles from 20 experiments for a large group of highly related genes. I have the raw read counts and normalized this data using [total read counts uniquely matching to a gene]/[total counts in experiment]*[length of transcript]. However, because these genes have a large amount of non-unique sequence, I don't think that this method is correct. I'd like to try normalizing the expression data based on [length of unique k-mers within transcript] rather than [length of transcript]. Is there an existing tool that can calculate this?
Thanks in advance any help!
Although it may not remove all the biases, subtracting non unique k-mers from the total transcript length before normalization makes sense. You can find some precomputed data in the "mapability" tracks at the UCSC for this purpose.
Another thing: because of biases in the read coverage at the end of the transcripts, it is frequent to disregard the initial and terminal exons and/or the UTRs.
good luck,
s.
steven is offline   Reply With Quote
Reply

Tags
expression, normalize, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:10 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO