Hi All,
I am doing differential gene analysis on a non-model organism. The draft genome and annotation has been around us for a while but haven't published yet. The RNA-seq data we have three sample (5 reps for first two and 4 reps for last one)
First I tried in this way (followed the nature protocol): cufflinks assembly RNA-seq into single transcript for each replicate without GTF--cuffmerge single transcript with ref-gtf into merged. gtf--cuffdiff 3 samples based on merged.gft. But we got too much on gene level. We just annotated 8307 genes currently but the cuffdiff output gives us
126579 genes
136248 isoforms
133271 TSS
8659 CDS
126579 promoters
133271 splicing
7428 relCDS
We suspect it is because the cufflinks assembly too much novel transcript. Then I reran the cufflinks with --GTF and with --GTF guide. The cuffmerge step is the same with previous attempt. Then I used cuffcompare to compare the merged.gtf to ref.gtf. The two approach yield two different result: (1) in the --GTF approach, the *.tmap file gives us
j 201
= 8278
which is pretty reasonable because program will ignore alignments not structurally compatible with any reference transcript. (2) But in the GTF guide approach I still got tons of unmatches in the *.tmap.
u 106326
j 4235
x 3
o 51
= 7210
Also I got only few SD genes in all of the attempts, less then 30. The expectation is around 3,000. The number of SD genes only comes up to 300 when I put only one replicate for each sample. There is no difference when I use cuffdiff1.3 compared with cuffdiff2.0.
My question is how can I get the total # of genes down in the combined transcript and get the # of SD genes up in the cuffdiff result.
Any suggestion and comment is welcome.
Liu
I am doing differential gene analysis on a non-model organism. The draft genome and annotation has been around us for a while but haven't published yet. The RNA-seq data we have three sample (5 reps for first two and 4 reps for last one)
First I tried in this way (followed the nature protocol): cufflinks assembly RNA-seq into single transcript for each replicate without GTF--cuffmerge single transcript with ref-gtf into merged. gtf--cuffdiff 3 samples based on merged.gft. But we got too much on gene level. We just annotated 8307 genes currently but the cuffdiff output gives us
126579 genes
136248 isoforms
133271 TSS
8659 CDS
126579 promoters
133271 splicing
7428 relCDS
We suspect it is because the cufflinks assembly too much novel transcript. Then I reran the cufflinks with --GTF and with --GTF guide. The cuffmerge step is the same with previous attempt. Then I used cuffcompare to compare the merged.gtf to ref.gtf. The two approach yield two different result: (1) in the --GTF approach, the *.tmap file gives us
j 201
= 8278
which is pretty reasonable because program will ignore alignments not structurally compatible with any reference transcript. (2) But in the GTF guide approach I still got tons of unmatches in the *.tmap.
u 106326
j 4235
x 3
o 51
= 7210
Also I got only few SD genes in all of the attempts, less then 30. The expectation is around 3,000. The number of SD genes only comes up to 300 when I put only one replicate for each sample. There is no difference when I use cuffdiff1.3 compared with cuffdiff2.0.
My question is how can I get the total # of genes down in the combined transcript and get the # of SD genes up in the cuffdiff result.
Any suggestion and comment is welcome.
Liu