Recently, our lab followed tophat->cuffdif protocol to analyze RNA-seq gene expression, when we look into the cuffdif gene_exp.diff output, we find a problem that
one gene with low reads mapped has extremely high FPKM while other genes' expression values seem fine.
Here is the Detail:
1 RNA-Seq library: a) A1 (100bp paired, no replicate) b) C1-2 (100bp paired,no replicate)
2 Genome data: Homo_sapiens.GRCh37.69.dna.toplevel.fa
3 GTF : Homo_sapiens.GRCh37.69.gtf
4 Gene with above problem in cuffdif "gene_exp.diff" file
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
ENSG00000200170 ENSG00000200170 Y_RNA 10:73856277-73995472 A1 C1-2 OK 4.75944e+06 0 -inf 0 0.1507 0.999526 no
5 Gene with above problem in cuffdif "gene.count_tracking" file
tracking_id A1_count A1_count_variance A1_count_uncertainty_var A1_count_dispersion_var A1_status C1-2_count C1-2_count_variance C1-2_count_uncertainty_var C1-2_count_dispersion_var C1-2_status
ENSG00000200170 1.19092 31 0 30.2743 OK 0 1 0 0 OK
6 tophat version: 2.0.4
7 cufflinks version: 2.0.0 or 2.10 (similar result)
8 Commands: (follow the nature protocol):
tophat: tophat-2.0.4.Linux_x86_64/tophat -r 500 -o /tophatmapping/A1_Remapping -p 8 -G /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel /dou_reads/A1_1.fq /dou_reads/A1_2.fq
tophat-2.0.4.Linux_x86_64/tophat -r 500 -o /tophatmapping/C1_Remapping -p 8 -G /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel /dou_reads/C1_1.fq /dou_reads/C1_2.fq
cuffdif: ~/bin/cufflinks-2.1.0.Linux_x86_64/cuffdiff -o /tophatmapping/A1_C1-2_Recuffdiff/diff_out -b /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel.fa -p 8 -L A1,C1-2 -u /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/output_A1/accepted_hits.bam ~/fudan/maobin2/tophatmapping/C1_Remapping/accepted_hits.bam
Question:
1 So how can a gene with only 1 fragment mapped has 4.75944e+06 FPKM?
2 Locus of this gene (ENSG00000200170) in gene_exp.diff file is different from locus in Homo_sapiens.GRCh37.69.gtf ,how could this happen ?
one gene with low reads mapped has extremely high FPKM while other genes' expression values seem fine.
Here is the Detail:
1 RNA-Seq library: a) A1 (100bp paired, no replicate) b) C1-2 (100bp paired,no replicate)
2 Genome data: Homo_sapiens.GRCh37.69.dna.toplevel.fa
3 GTF : Homo_sapiens.GRCh37.69.gtf
4 Gene with above problem in cuffdif "gene_exp.diff" file
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
ENSG00000200170 ENSG00000200170 Y_RNA 10:73856277-73995472 A1 C1-2 OK 4.75944e+06 0 -inf 0 0.1507 0.999526 no
5 Gene with above problem in cuffdif "gene.count_tracking" file
tracking_id A1_count A1_count_variance A1_count_uncertainty_var A1_count_dispersion_var A1_status C1-2_count C1-2_count_variance C1-2_count_uncertainty_var C1-2_count_dispersion_var C1-2_status
ENSG00000200170 1.19092 31 0 30.2743 OK 0 1 0 0 OK
6 tophat version: 2.0.4
7 cufflinks version: 2.0.0 or 2.10 (similar result)
8 Commands: (follow the nature protocol):
tophat: tophat-2.0.4.Linux_x86_64/tophat -r 500 -o /tophatmapping/A1_Remapping -p 8 -G /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel /dou_reads/A1_1.fq /dou_reads/A1_2.fq
tophat-2.0.4.Linux_x86_64/tophat -r 500 -o /tophatmapping/C1_Remapping -p 8 -G /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel /dou_reads/C1_1.fq /dou_reads/C1_2.fq
cuffdif: ~/bin/cufflinks-2.1.0.Linux_x86_64/cuffdiff -o /tophatmapping/A1_C1-2_Recuffdiff/diff_out -b /tophatmapping/index/Homo_sapiens.GRCh37.69.dna.toplevel.fa -p 8 -L A1,C1-2 -u /tophatmapping/gtf/Homo_sapiens.GRCh37.69.gtf /tophatmapping/output_A1/accepted_hits.bam ~/fudan/maobin2/tophatmapping/C1_Remapping/accepted_hits.bam
Question:
1 So how can a gene with only 1 fragment mapped has 4.75944e+06 FPKM?
2 Locus of this gene (ENSG00000200170) in gene_exp.diff file is different from locus in Homo_sapiens.GRCh37.69.gtf ,how could this happen ?