SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Some problem with miRNA data ? ahmadsam Bioinformatics 0 01-16-2012 12:51 AM
how do you normalize MBD sequencing data? bioiion Epigenetics 3 07-23-2011 02:36 AM
way to normalize copy number data for small RNAs/miRNAs? vebaev Bioinformatics 2 03-28-2011 02:18 AM
How to Normalize NGS data? Tags per million? xhuister Bioinformatics 5 06-19-2010 03:17 PM
Analyzing collaboratively HTS data -- howto?? lcollado Bioinformatics 1 02-16-2010 05:04 PM

Reply
 
Thread Tools
Old 08-19-2010, 11:33 PM   #1
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default The Best way to normalize miRNA HTS data

Hi,

I'm actually working on HTS miRNA data. I've 8 samples with about 20 M reads per file.
After adapter trimming in each sample, the number of reads are different in each sample (each sample has a different quality (preparation and sequencing)).

Example :

file 1 : 200 000 reads
file 2 : 1 000 000 reads
...

Now I want to analyze the differential expression. The problem for doing that is the normalization step . What is the best way to normalize the data for comparing multiple samples ?

Thanks a lot,

N.
NicoBxl is offline   Reply With Quote
Old 08-22-2010, 11:11 PM   #2
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

After searching I found some methods :

- RPM ( Read Per Million )
- RPKM (Reads per kilobase per million mapped) : Mortazi et al, Nat. Methods, 2008
- Trimmed mean of M-values : Robinson, Oshlack, Genome Biology 2010
- upper-quantile : http://www.ncbi.nlm.nih.gov/pubmed/20167110

Which method is the best for HTS miRNA data ?

Last edited by NicoBxl; 08-23-2010 at 12:37 AM.
NicoBxl is offline   Reply With Quote
Old 08-23-2010, 12:49 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Dividing by the number of sequenced or mapped reads is a bad idea, for the reasons explained by Robinson and Oshlack.

I'd advise against quantile normalization (as suggested by Bullard et al., the fourth in your list) . RNA-Seq is known to be linear, and quantile normalization will only distort this. This leaves you with TMM (Robinson and Oshlack) or with the method that we implemented in DESeq (preprint here), which is similar in spirit to TMM but uses a bit different math.

This all is said assuming that you want to make comparisons between samples, i.e., see whether a given gene's expression depends on your experimental conditions. If you want to compare different genes within the same sample, you need a very different approach. In that case, look at cufflinks (Trapnell et al, 2010).

Simon
Simon Anders is offline   Reply With Quote
Old 08-23-2010, 12:54 AM   #4
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by Simon Anders View Post
This all is said assuming that you want to make comparisons between samples, i.e., see whether a given gene's expression depends on your experimental conditions. If you want to compare different genes within the same sample, you need a very different approach. In that case, look at cufflinks (Trapnell et al, 2010).
Sorry, this paragraph was nonsense; I forgot that you aredealing with miRNA. The whole point of cufflinks (an its FPKM measure) is to deal with splicing variants and differing transcript length. This is not an issue if the read length exceeds the transcript length as is the case with miRNA.

Simon
Simon Anders is offline   Reply With Quote
Old 08-23-2010, 01:31 AM   #5
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

ok thanks Simon,

I'll try DESeq

Nicolas
NicoBxl is offline   Reply With Quote
Old 08-24-2010, 12:27 AM   #6
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

Is it possible wit DESeq to get the normalized count matrix ? to draw a heatmap with the normalized data per example
NicoBxl is offline   Reply With Quote
Old 08-24-2010, 12:35 AM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by NicoBxl View Post
Is it possible wit DESeq to get the normalized count matrix ?
Yes, just divide the jounts by the size factors:

t( t(counts(cds)) / sizeFactors(cds) )

(All the 't are to make sure that R divides by column, not by row.)

Quote:
to draw a heatmap with the normalized data per example
You might want to take the log or use DESeq's variance-stabilizng transformation for such a heatmap. (Careful with the latter; I've just found a bug in 'getvarianceStabilizedData'. It's fixed in the devel version (1.1.11) of DESeq but not yet in the release branch.)

Simon
Simon Anders is offline   Reply With Quote
Old 08-24-2010, 12:43 AM   #8
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

ok thanks,

I'll try that.

For the heatmap, it's the log(normalized counts) that you talking about ?

For the variance, I'll download the dev version
NicoBxl is offline   Reply With Quote
Old 11-05-2010, 02:00 AM   #9
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

Quote:
Originally Posted by Simon Anders View Post
Yes, just divide the jounts by the size factors:
You might want to take the log or use DESeq's variance-stabilizng transformation for such a heatmap.

I don't understand how to get the log ? Can you explain me that Simon?

Thanks a lot

N.
NicoBxl is offline   Reply With Quote
Old 10-31-2011, 03:03 PM   #10
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Hi guys,

I'm also trying to normalized the diff expression in my small RNAs. I'd like to try DESeq in R; but after looking of its info:
http://www-huber.embl.de/users/anders/DESeq/
I can't understand how could I do this analysis. Is there any DESeq guide for Dummies? does it take any of my bam files???

Sorry to bother you guys with these silly questions.
cascoamarillo is offline   Reply With Quote
Old 10-31-2011, 11:29 PM   #11
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by cascoamarillo View Post
I'm also trying to normalized the diff expression in my small RNAs. I'd like to try DESeq in R; but after looking of its info:
http://www-huber.embl.de/users/anders/DESeq/
Have you even tried to read the manual ("vignette")? I ask because the URL you have put does not contain much useful information in addition to the link to the manual, but it does not sound as if you followed it.
Simon Anders is offline   Reply With Quote
Old 11-01-2011, 06:34 AM   #12
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Quote:
Originally Posted by Simon Anders View Post
Have you even tried to read the manual ("vignette")? I ask because the URL you have put does not contain much useful information in addition to the link to the manual, but it does not sound as if you followed it.
Thanks for the reply.

Yes, I've been looking the manual. Maybe the problem is my little experience with R. I made same plots and HTS data analysis in R, but not an expert on it. Sorry to express my frustration in the post. I'll try to look at the manual further and see what happen.
cascoamarillo is offline   Reply With Quote
Old 11-01-2011, 11:59 AM   #13
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Ok, after a further reading of the manual ("vignette") I've some questions. First of all, I don't work with Drosophila, so I do not need the library pasilla, right?. What I did was to take my sam and gff file and create the count table of my treatened/untreatened conditions (HTSeq). But how can I load this table into DESeq? Do I need my own library "pasilla" ?

Thanks
cascoamarillo is offline   Reply With Quote
Old 11-01-2011, 12:23 PM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

No, the 'pasilla' library is example data. It comes with another vignette which, among other things, tells you how to get the data into R.

On the long run, it may be well worth your while to read some short introductory tutorial to R. Reading in data tables and writing them out again is such a common procedure that hardly any documentation for a specific R package will explain you how this is done. When I write things like the DESeq vignette, I assume that the reader has already familiarized himself with these basics.

(I know that a lot of people here hope for single-click push-button solutions. You will get them --- in ten years or so, when analysis methods are no longer new and constantly changing, but well-matured textbook knowledge with fixed consensus recipes that can be followed blindly without a need to understand them. Until then, you will need some basic statistics and bioinfomatics knowledge to make informed choices about the many different possibilities to analyse your data.)
Simon Anders is offline   Reply With Quote
Old 11-01-2011, 02:58 PM   #15
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 160
Default

Ok, there's another DESeq manual. I guess this is the one you are talking about:
http://bioconductor.org/packages/2.8.../doc/DESeq.pdf

The other is:
http://www.bioconductor.org/packages.../doc/DESeq.pdf

Looks like the same but not; sorry for the confusion. "Analysing RNA-Seq data with the “DESeq” package" is the one that I need to start working with DESeq. Thanks a lot for this package and the complete documentation that you've provided.
cascoamarillo is offline   Reply With Quote
Old 11-01-2011, 10:50 PM   #16
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

They are two versions of the same. We completely overhauled the vignette this summer, reqrote parts of it, and used different example data. Also, we took out the explanation how to read in the data and moved it to the pasilla vignette. Maybe we should change this back.
Simon Anders is offline   Reply With Quote
Old 03-04-2012, 09:56 PM   #17
anurupa
Member
 
Location: india

Join Date: Jan 2012
Posts: 14
Default

hello every one i am also new to this area. i have a file containing something like this eg:
name celltype1 celltype2
mirna1 23 45

in the below example the name represents name of mirna and the no of reads in each celltype are also mentioned how and wat will be the best way to normalize this kind of data. can i use DEseq?

Last edited by anurupa; 03-04-2012 at 09:58 PM.
anurupa is offline   Reply With Quote
Old 03-04-2012, 10:04 PM   #18
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sure, you can use DESeq. However, you seem to only have only a single sample per cell type. You won't get far with that.
Simon Anders is offline   Reply With Quote
Old 03-04-2012, 10:10 PM   #19
anurupa
Member
 
Location: india

Join Date: Jan 2012
Posts: 14
Default

hi,
thank you for the reply. i just have shown an example that is not mine actual data. kindly explain what u meant and why? if u are talking about the duplicates we have 2 duplicates for each eg (celltype1a celltype1b) which are duplicates

Last edited by anurupa; 03-04-2012 at 10:27 PM. Reason: less information
anurupa is offline   Reply With Quote
Old 03-04-2012, 11:52 PM   #20
stoker
Member
 
Location: Poland

Join Date: Oct 2010
Posts: 17
Default

Dear all,
I am working on short RNA sequencing already for some time. I can agree with Simon Anders, that RNA-Seq is linear and DEseq or TMM are good solutions. On the other hand our experience is RPM normalization correlates well with microarray data and qPCR. Of course in the range of medium and high expression. I bet none of previously mentioned methods will deal well with low expression of RNA, since there are quantization effects and simply noise. My personal opinion is - if you can do things in many ways you can validate, choose the simplest one.
__________________
Tomasz Stokowy
www.sequencing.io.gliwice.pl
stoker is offline   Reply With Quote
Reply

Tags
microrna, mirna, normalization

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO