SEQanswers (
-   Bioinformatics (
-   -   issues found in using cufflinks/cuffcompare/cuffdiff (

sterding 12-02-2010 11:25 AM

issues found in using cufflinks/cuffcompare/cuffdiff
I already wrote to Cole for the following questions. Post here in case others have the relevant questions (or answer).

Hi, Cole

THanks first for the great tools you designed.

I have found several issues during using the tools. I'm not sure it's reported by others already or not. Just let you know in case:

1. Cufflinks -M mask.gtf seems not really mask reads mapped to mask.gtf file, when -G is set.
I have checked the results from the following command:
cufflinks -o \$HOME/scratch/HD_$inputfile/cufflinks_allknown -j 0.05 -p 8 -Q 0 -G $BOWTIE_INDEXES/hg19.gtf -M $BOWTIE_INDEXES/hg19.rRNA_MtRNA.gtf -r $BOWTIE_INDEXES/hg19.fa \$HOME/scratch/HD_$inputfile/tophat.mm2.accepted_hits.hg19.sam

In the output file ../cufflinks_allknown/transcripts.gtf, I still can see rRNA and mitochondria RNAs there. Shouldn't they be masked out from the output gtf when -M is set?

2. What's the suggested parameters for detecting novel transcripts using cufflinks?

I think detecting novel transcripts is one of the popular tasks for RAN-seq processing. But I found only very short "transcripts" were assembled by cufflinks if without setting option -G. I compared the output from cufflinks (without -G) with annotated transcript in UCSC, and it's obviously seen that only a small fraction of reads are used to assemble the 'transcripts' (and they are shorter than the annotated ones). OK, if I switched to use -G in cufflinks, the output will of course show FPKM for each annotated transcript in the GTF; in this case, I cannot know how many reads are used to calculated that FPKM value (I can't find way to track this. Educate me if there is a way, please) So, by comparing these two results, I have an intuitive concern: even I use -G option, it might use only a small fraction of reads for calculating the FPKM? Can the fraction of used reads be increased if I turn on the -G option? I am slightly doubting this :)

3. Option of "-o <outprefix>" in Cuffcompare should not contain '.' in the outprefix path.

For example, if I set "cuffcompare -o /home/projects/HD.neuro/results/cuffcompare ...", the output of cuffcompare will be output to

I guess this is because you wrongly stop until the first '.' in the path.

4. How is the Test status NOTEST decided?

In my cuffdiff output file gene_exp.diff, I saw many comparisons with test status NOTEST. It says "not enough alignments for testing" in the manual, but in my result file, I saw some genes with comparable FPKM values in both samples, but they are marked as NOTEST; while other genes with 0 FPKM in both sample, and they are marked as OK (e.g. the 2nd record in below example)

Another confusion for the 'yes/no' significant column, It says "depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing". However, for cases with 0 uncorrected p-value, some are marked as 'yes', some are marked as 'no'. I am confused which one I should use to select differentially expression genes. The last column ?

$ tail gene_exp.diff
XLOC_048056 J01415.24 chrM:8293-8364 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
XLOC_048057 J01415.19,J01415.2,J01415.21,J01415.23,J01415.5,J01415.8,MT-ATP6,MT-CO3,MT-ND3,MT-ND4,MT-ND4L,MT-ND5 chrM:8364-14742 HD_B4430 HD_B4189 OK 0 0 0 0 1no
XLOC_048058 J01415.15,MT-CYB chrM:14745-15953 HD_B4430 HD_B4189 NOTEST 0 0 0 0 1 no
XLOC_048059 J01415.22 chrM:3305-4400 HD_B4430 HD_B4189 NOTEST 3695.47 2047.85 -0.590317 21.4283 0 no
XLOC_048060 J01415.1 chrM:5585-5655 HD_B4430 HD_B4189 NOTEST 1949.56 1234.71 -0.456771 12.5587 0 no
XLOC_048061 J01415.18 chrM:5655-5729 HD_B4430 HD_B4189 NOTEST 970.36 667.756 -0.373745 7.43324 1.05915e-13 no
XLOC_048062 J01415.11,J01415.14 chrM:5759-5891 HD_B4430 HD_B4189 OK 7858.87 5882.96 -0.289583 16.7969 0 yes
XLOC_048063 J01415.13 chrM:5902-7514 HD_B4430 HD_B4189 OK 14129.9 10391.4 -0.30732 23.7808 0 yes
XLOC_048064 J01415.16,MT-ND6 chrM:8364-14742 HD_B4430 HD_B4189 OK 55722.5 25892.8 -0.766417 77.8562 0 yes
XLOC_048065 J01415.20 chrM:15954-16023 HD_B4430 HD_B4189 OK 14668.5 5896.61 -0.911327 59.1021 0 yes

So many questions, I know. :) Sorry for that.



sterding 12-08-2010 02:46 PM

I hope Cole have time for these questions :)

labrat73 01-12-2011 11:36 AM

annotated rRNA/MtRNA .gtf file
Greetings sterding-

I noticed that you have an annotated human rRNA_MtRNA file. I am trying to acquire such a file so that I may use the -M option when running cufflinks. Could you please advise on how to acquire/generate the file? Any assistance you could offer would be greatly appreciated.

Many thanks

sterding 01-12-2011 01:48 PM

I simply extracted the record with keywords 'rRNA' or 'Mt' from hg19.gtf file by using command 'grep'.

labrat73 01-12-2011 02:25 PM

Many, many, thanks sterding for your quick reply! Works great!:D

edge 06-01-2011 09:04 PM

Hi sterding,

Is it better you using -g instead of -G in your situation?

All times are GMT -8. The time now is 12:44 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.