SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Statistics after mapping jomaco Bioinformatics 0 01-26-2012 12:12 PM
454 statistics jordi Bioinformatics 4 01-12-2012 12:10 AM
Instrument statistics, anyone? brjordan General 35 12-06-2010 02:22 AM
Statistics for Biologists [email protected] Events / Conferences 0 06-07-2010 07:41 AM
Summary Statistics in Newbler? gaster 454 Pyrosequencing 6 05-28-2009 01:14 PM

Reply
 
Thread Tools
Old 08-18-2011, 03:07 AM   #1
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default what statistics and tool to use?

Dear all,
Since I'm not so experienced in statistics most of the time I used deseq.
I have 2 samples with 2 repetitions, and I want to see which are most changed and their p-values.
The problem is in that particular case I do not have simple read counts, but already normalized and calculated by my own way values for a spesific task e.g.
c-control
s-treated
Quote:
name c1 s1 c2 s2
gene1 0.03 0.1 0.01 0.6
what statistics and what tool I can use and it is most relevant in that case?

Thanks!!!!
vebaev is offline   Reply With Quote
Old 08-18-2011, 05:11 AM   #2
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

You can try sam method, or moderated t-test
chenyao is offline   Reply With Quote
Old 08-18-2011, 05:13 AM   #3
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Thanks,
do you know what sotfware tool to use? since I'm not good in statistics for this moderated t-test?
vebaev is offline   Reply With Quote
Old 08-18-2011, 05:16 AM   #4
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

Do you know "R"?

It has many tools to use.
chenyao is offline   Reply With Quote
Old 08-18-2011, 05:18 AM   #5
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Poorly since I was using only deseq...

But if there is no other way I will try to look for this, and try to find what R module can do this for me
vebaev is offline   Reply With Quote
Old 08-18-2011, 05:36 AM   #6
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

It's very simple, much more than deseq.

you can use "sam" or "dchip" module.

It depends on the distribution of your data. If it's normal distribution, the simplest method is t-test.
chenyao is offline   Reply With Quote
Old 08-18-2011, 05:38 AM   #7
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Sorry for the stupid question but what it is mean by "normal distribution"
vebaev is offline   Reply With Quote
Old 08-18-2011, 05:48 AM   #8
chenyao
Member
 
Location: Beijing

Join Date: Jul 2011
Posts: 74
Default

see wiki:
http://en.wikipedia.org/wiki/Normal_distribution.

for simple, you can plot your data to see if it is symmetric and bell-shaped.

but I doult it, most data didn't obey it. But it's good to see the distribution of your data at first.
chenyao is offline   Reply With Quote
Old 08-18-2011, 05:49 AM   #9
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Thanks a lot!
vebaev is offline   Reply With Quote
Old 08-18-2011, 05:56 AM   #10
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

If your data are HTS reads, they don't follow a normal distribution. You've to use a non-parametric test like the Wilcoxon Test or the Kruskal-Wallis Test.
NicoBxl is offline   Reply With Quote
Old 08-18-2011, 06:05 AM   #11
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

My data are not pure read counts, but rather a value that came from read counts normalized by the total library reads and how muuch times a read maps to a genome.
vebaev is offline   Reply With Quote
Old 08-18-2011, 06:08 AM   #12
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Quote:
Originally Posted by vebaev View Post
My data are not pure read counts, but rather a value that came from read counts normalized by the total library reads and how muuch times a read maps to a genome.
So you should use raw reads with DESeq or edgeR. It seems the best solution to your problem.
NicoBxl is offline   Reply With Quote
Old 08-18-2011, 06:35 AM   #13
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Yes but I wanted to take in consideration of how much a seq maps to a genome and deseq is not taking this into account?
If I have seq with read counts 10 that maps in 2 location perfectly, and I want to see if location 1 is changes in samples, it does not mean that all 10 reads are coming from location 1, isn,t it?

Or I'm somwhere wrong in my logic

Anyway, I will run this with deseq to see the result!
vebaev is offline   Reply With Quote
Old 08-18-2011, 09:13 AM   #14
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

NicoBxl,

I have tun on my 4 samples (2 controles and 2 treatment) deseq with binomial test and I got list.

What about these genes that are 0 reads in one group and non-zero in the other group? I got =+Inf ot =-Inf? Should I took tham or discard them from the most diff.altered table?

Thanks
vebaev is offline   Reply With Quote
Old 08-19-2011, 12:28 AM   #15
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Quote:
Originally Posted by vebaev View Post
NicoBxl,

I have tun on my 4 samples (2 controles and 2 treatment) deseq with binomial test and I got list.

What about these genes that are 0 reads in one group and non-zero in the other group? I got =+Inf ot =-Inf? Should I took tham or discard them from the most diff.altered table?

Thanks
That's a tough question. Maybe check the p-value. What's the read count for the "other" group ?

Last edited by NicoBxl; 08-19-2011 at 12:30 AM.
NicoBxl is offline   Reply With Quote
Old 08-19-2011, 01:30 AM   #16
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Quote:
Originally Posted by vebaev View Post
What about these genes that are 0 reads in one group and non-zero in the other group? I got =+Inf ot =-Inf? Should I took tham or discard them from the most diff.altered table?
Thanks
If you have 0 counts in one condition and hundreds of counts in other, this is most likely a valid signal, and DESeq should indicate this with a small p value. Of course, the fold change estimate is not to useful but that is a general problem if one of conditions has very low counts
Simon Anders is offline   Reply With Quote
Old 08-19-2011, 01:37 AM   #17
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 108
Default

Yes Simon,
the =Inf are coming from rows where one of the groups are 0 and other is some reads (sometimes 5 simetimes 500)

I also found your post in other topic about these =Inf that if the last 2 columns the values are too big or close to zero I should discard these rows from further analysis?

Last edited by vebaev; 08-19-2011 at 01:43 AM.
vebaev is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO