View Single Post
Old 10-07-2015, 08:26 AM   #1
Location: New York City

Join Date: Mar 2012
Posts: 14
Default Annotation difference between refSeq and Gencode

Hi all,

I am trying to set up an RNAseq work flow:

1. Generated genome files for STAR using .fna files from NCBI ftp and gtf files from Gencode;

2. Aligned fq using STAR, convert sam to bam and sorted bam.

3. Then I used the sorted bam files to test cufflinks and compared different gtf files for the -G option. The cufflinks output somehow all have different positions for the same genes:

gene_id gene_short_name locus
PDIA3 - chr15:44038589-44064804
CD276 - chr15:73976621-74006859
PROM2 - chr2:95940200-95957055

gene_id gene_short_name locus
ENSG00000167004.12 PDIA3 chr15:43746391-43773279
ENSG00000103855.17 CD276 chr15:73683965-73714518
ENSG00000155066.15 PROM2 chr2:95274452-95291308

And the FPKM as a result are very different in the two output.

What am I missing here and how to fix it, please? If the two gtf are inherently different in regard to gene loci, which one should I trust, pls?

graceqy is offline   Reply With Quote