I am a Galaxy user. At this moment, I am using tophat and cufflinks to analyze my RNA-Seq data. I have some questions concerning these tools, and hope you can help me to figure out how to get proper output data.
At the end of the analysis, I expect two lists: differentially expressed transcripts and differentially expressed genes. In these two lists, I would like to see the gene name, gene ID and transcript ID.
What I did is:
After mapping with tophat, I run cufflinks using reference gene sets (GTF file) from Ensembl. I modified the ensembl GTF file according to http://main.g2.bx.psu.edu/u/jeremy/p...lysis-faq#faq4 so that cufflinks can recognize the column for chromosomes. I got the file "cufflinks assembled transcript" which shows nicely the gene ID, transcript ID , but the gene name was lost in this file.
Then I run cuffcompare using the same reference gene sets (GTF file) from Ensembl. In the output file I can see that gene name appeared, but galaxy assigned new ID to gene and transcript.
Then I run cuffdiff . Output file only contains gene name.
My question is: how can I keep the information (gene ID, transcript ID, gene name ) from the reference gene sets during the whole analysis process so that I get meaningful information. Or is that possible that I retrieve "gene ID, transcripte ID, gene name" by using the output file "Cuffdiff transcript differential expression! " from cuffdiff?
I hope you can help me.
Thanks in advance.
At the end of the analysis, I expect two lists: differentially expressed transcripts and differentially expressed genes. In these two lists, I would like to see the gene name, gene ID and transcript ID.
What I did is:
After mapping with tophat, I run cufflinks using reference gene sets (GTF file) from Ensembl. I modified the ensembl GTF file according to http://main.g2.bx.psu.edu/u/jeremy/p...lysis-faq#faq4 so that cufflinks can recognize the column for chromosomes. I got the file "cufflinks assembled transcript" which shows nicely the gene ID, transcript ID , but the gene name was lost in this file.
Then I run cuffcompare using the same reference gene sets (GTF file) from Ensembl. In the output file I can see that gene name appeared, but galaxy assigned new ID to gene and transcript.
Then I run cuffdiff . Output file only contains gene name.
My question is: how can I keep the information (gene ID, transcript ID, gene name ) from the reference gene sets during the whole analysis process so that I get meaningful information. Or is that possible that I retrieve "gene ID, transcripte ID, gene name" by using the output file "Cuffdiff transcript differential expression! " from cuffdiff?
I hope you can help me.
Thanks in advance.
Comment