I am currently starting to run RNA-seq analysis using tophat and cufflinks. I am a beginner in this type of analysis. In my case I have RNA-seq reads (paired-end) from around 400 samples. I would be grateful if you can address the following questions:
1. As far as I know, Cuffmerge when we need to perform cuffdiff. In my study I want to classify the samples based on expression level obtained from cufflinks. I am wondering if Cuffmerge step is still needed if we are only interested in expression level ? (no differential expression analysis would be performed).
2. I thought that cufmerge will provide a sort of matrix containing of RPKM for all samples. But when I check the cufmerge result, I don’t see what I expected. I am wondering how we can combine/merge the FPKM for all samples into a single matrix as we usually have in gene expression using microarray.
3. I am still confused about the results of option -G and -g in cufflinks. If we specified -G reference.gtf, cufflink only calculate FPKM for only known transcript, but if we specified -g reference.gtf, it will include the reference transcripts as and novel genes and isoforms. Based on that I have the following questions:
a. If we specify -G, how can we distinguish which isoforms/genes are from the supplied reference annotation and which ones are novel genes/isoforms.
b. I checked the result from the defining -g and -G, the resulting values of FPKM for common isoforms are different. Is it expected?
1. As far as I know, Cuffmerge when we need to perform cuffdiff. In my study I want to classify the samples based on expression level obtained from cufflinks. I am wondering if Cuffmerge step is still needed if we are only interested in expression level ? (no differential expression analysis would be performed).
2. I thought that cufmerge will provide a sort of matrix containing of RPKM for all samples. But when I check the cufmerge result, I don’t see what I expected. I am wondering how we can combine/merge the FPKM for all samples into a single matrix as we usually have in gene expression using microarray.
3. I am still confused about the results of option -G and -g in cufflinks. If we specified -G reference.gtf, cufflink only calculate FPKM for only known transcript, but if we specified -g reference.gtf, it will include the reference transcripts as and novel genes and isoforms. Based on that I have the following questions:
a. If we specify -G, how can we distinguish which isoforms/genes are from the supplied reference annotation and which ones are novel genes/isoforms.
b. I checked the result from the defining -g and -G, the resulting values of FPKM for common isoforms are different. Is it expected?
Comment