SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   Binary characters in cuffcompare result & Questions on cuffdiff (http://seqanswers.com/forums/showthread.php?t=7633)

nkwuji 11-02-2010 04:45 AM

Binary characters in cuffcompare result & Questions on cuffdiff
 
Hi,

I am using tophat/cufflinks packages analyzing my RNA-seq data. I found a small bug in cuffcompare.

After I compared my reference gtf with transcript.gtf, I got the combined.gtf. But, sometimes, I found some of the strand information was in binary character. For example, if I use "less" to check the combined.gtf, for some transcripts, the strand information is "^@". If I submit this combined.gtf to UCSC genome browser, it will say "cannot read xxx.gtf file". After I changed these binary characters into ".", it works fine.

Another question is, does anyone know how to set up the minimal threshold in the cuffdiff to do the test. For example, I have a gene expressed mildly in one sample (FPKM 8), but no expression in the other sample (FPKM 0). It is actually one of the most interesting genes I was looking for. But in the cuffdiff, it has the mark of "NOTEST", thus the significance is "no". Can anyone give me any help on this? Can I manually select these genes as differentially expressed genes, because they are expressed and actually the pvalue is also 0?

Plus, can I remove genes expressed in the low level manually, e.g. for genes with FPKM < 1? These genes dont look very promising...

Cheers,
Jun

sdarko 11-02-2010 06:01 AM

I'm glad I found this post. I was having the exact same problem and changing the binary character to "." fixed my (current) issues as well.

Sam

RockChalkJayhawk 11-02-2010 06:57 AM

Quote:

Originally Posted by nkwuji (Post 28311)
Hi,

Another question is, does anyone know how to set up the minimal threshold in the cuffdiff to do the test. For example, I have a gene expressed mildly in one sample (FPKM 8), but no expression in the other sample (FPKM 0). It is actually one of the most interesting genes I was looking for. But in the cuffdiff, it has the mark of "NOTEST", thus the significance is "no". Can anyone give me any help on this? Can I manually select these genes as differentially expressed genes, because they are expressed and actually the pvalue is also 0?

Plus, can I remove genes expressed in the low level manually, e.g. for genes with FPKM < 1? These genes dont look very promising...

Cheers,
Jun

The cuffdiff -c option might be what you are looking for
Code:

-c/--min-alignment-count <int>
This limits the differential testing based on counts - rather than FPKM. However, do you think it is wise/necessary to use this feature if what you want to say is that it is present in one condition and not the other?

nkwuji 11-03-2010 03:01 AM

Thx RockChalkJayhawk.

I will think about this part, though the result seems to be a little weird on genes expressed at low levels. For example, for this gene expressed in one sample with FPKM of 8, and in the other sample with FPKM of 0, the result is shown as NOTEST. But for the other gene, in one sample, the FPKM is 0.25, and in the other sample is 0. THe result is OK, and significant.

Possibly it can be explained by the second gene is longer, and the min-alignment-count could be higher than default setting and got the test significant. But I think it may be better to limit the result by FPKM (or average coverage) other than total fragments(or reads), otherwise, it may have bias on longer genes.


All times are GMT -8. The time now is 09:39 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.