I'm trying to establish the best gene expression differential analysis for my purpose: 2 genotypes, 2 experimental situations, 3 biological replicates, 25 million reads per sample (sequenced RNA-seq libraries).
Now I'm using tophat-cufflinks and following the protocol published by Trapnell:
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
I'm working with a well annotated model organism "Arabidopsis thaliana"
I have two goals:
First look for diff expression in the already annotated transcriptome: TAIR10
Second: I'm interested in possibility that previously NO annotated genes are differentially expressed between the two genotypes in one of the different experimental conditions.
I have a protocol in mind BUT I will like to be advised for the expertize of this community:
Remember: two genotypes, Two experimental confitions, triplicates MEANS 12 INDEPENDENT LIBRARIES (25 millions reads each)
FIRST PROTOCOL FOR DIFFERENTIAL EXPRESSION:
1) Tophat for each library
2) Merge all the libraries in a single cufflinks (do I need to include the TAIR10.gtf?)
3) Use the final assembly of step two togheter with the 12 acepted hits files from step one in cuffdiff.
4) Use cuffcompare to identify locations of new genes.
How can I automatically extract all those new genes thar are also differentially expressed?
I will appreciate feedbacks for the protocol I have in mind and answer to my questions
Now I'm using tophat-cufflinks and following the protocol published by Trapnell:
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
I'm working with a well annotated model organism "Arabidopsis thaliana"
I have two goals:
First look for diff expression in the already annotated transcriptome: TAIR10
Second: I'm interested in possibility that previously NO annotated genes are differentially expressed between the two genotypes in one of the different experimental conditions.
I have a protocol in mind BUT I will like to be advised for the expertize of this community:
Remember: two genotypes, Two experimental confitions, triplicates MEANS 12 INDEPENDENT LIBRARIES (25 millions reads each)
FIRST PROTOCOL FOR DIFFERENTIAL EXPRESSION:
1) Tophat for each library
2) Merge all the libraries in a single cufflinks (do I need to include the TAIR10.gtf?)
3) Use the final assembly of step two togheter with the 12 acepted hits files from step one in cuffdiff.
4) Use cuffcompare to identify locations of new genes.
How can I automatically extract all those new genes thar are also differentially expressed?
I will appreciate feedbacks for the protocol I have in mind and answer to my questions
Comment