I would really appreciate explanations as I try to understand my cuffdiff output -
I ran cuffdiff through galaxy on my RNA-seq reads (paired end 50x50, Illumina Hiseq, no replicates - 1 control + 3 treated) pre-aligned to hg19 using cufflinks. I first ran a cuffcompare and then I inputted my bam files to run cuffdiff, using ensembl for reference annotation. I am confused by my cuffdiff output and would really appreciate some help -
1. several genes that I am interested are not present in my differential expression of transcripts file. I am unsure as to why but genes such as SFRP1, XAGE1, SPANXA12 etc are missing from the output. Is there a filtering step early that dictates which genes are included in the output and which are not? I thought cuffdiff performed statistical analysis and listed all transcripts, irrespective of significance or not?
2. as far as FPKM values go, I realized after attempting q RT-PCR analyses that low FPKM does not seem to correspond with low expression? If I am interested in genes that are not expressed in control and get expressed in treated, how would I set a cutoff to filter this list out? How do I correlate expression and FPKM values?
3. I got an unexpected result from my cuffdiff that told me that all my treatments resulted in the significant upregulation of a similar number of genes. This is not what I expected at all and on attempting to validate results in the lab I had very poor validation. Also, I am unable to visually appreciate any big differences using IGV, despite cuffdiff output indicating that I have a significant, several fold changes between my treated and control. Again, why is there such a poor correlation?
I would really appreciate your input on these questions so that I know how to proceed with my analyses.
I ran cuffdiff through galaxy on my RNA-seq reads (paired end 50x50, Illumina Hiseq, no replicates - 1 control + 3 treated) pre-aligned to hg19 using cufflinks. I first ran a cuffcompare and then I inputted my bam files to run cuffdiff, using ensembl for reference annotation. I am confused by my cuffdiff output and would really appreciate some help -
1. several genes that I am interested are not present in my differential expression of transcripts file. I am unsure as to why but genes such as SFRP1, XAGE1, SPANXA12 etc are missing from the output. Is there a filtering step early that dictates which genes are included in the output and which are not? I thought cuffdiff performed statistical analysis and listed all transcripts, irrespective of significance or not?
2. as far as FPKM values go, I realized after attempting q RT-PCR analyses that low FPKM does not seem to correspond with low expression? If I am interested in genes that are not expressed in control and get expressed in treated, how would I set a cutoff to filter this list out? How do I correlate expression and FPKM values?
3. I got an unexpected result from my cuffdiff that told me that all my treatments resulted in the significant upregulation of a similar number of genes. This is not what I expected at all and on attempting to validate results in the lab I had very poor validation. Also, I am unable to visually appreciate any big differences using IGV, despite cuffdiff output indicating that I have a significant, several fold changes between my treated and control. Again, why is there such a poor correlation?
I would really appreciate your input on these questions so that I know how to proceed with my analyses.
Comment