(Many new threads from me here, but I'm still new to RNA-seq and all the analysis, so I'm hoping that's alright!)
My current goal is trying to get FPKM values for a test data set I have, and if I have understood the instructions and papers Cufflinks (after Tophat) is the way to do that. I've taken the "accepted_hits.bam" from Tophat and run that on Cufflinks (with no special options), and then taken the "transcripts.gtf" and run it through Cuffcompare with a reference annotation (hg19: genome.gtf, taken from iGenomes).
This is just a single biological replicate of paired-end reads (i.e. two files, *_1.fq and *_2.fq), and I just want the FPKM values for the genes in that replicate. So, first of all, is the workflow I described above correct for this? Next, looking at the cuffcmp.stats:
... it seems that the analysis was somehow unsuccessful? I'm not really sure how to interpret this data. I can also look into cuffcmp.transcripts.gtf.tmap:
... and that seems (to me) to give me the FKPM values I'm looking for. Is this so? If so, what does the cuffcmp.stats file mean? If not, how is it related to cuffcmp.stats, and how can I fix it? I found this thread (http://seqanswers.com/forums/showthread.php?t=3972) that seems related, but I don't really understand it. It seems to say I should change some things in the reference genomes, but as far as I can tell the one I'm using is formatted as described in the thread...
Thanks in advance!
My current goal is trying to get FPKM values for a test data set I have, and if I have understood the instructions and papers Cufflinks (after Tophat) is the way to do that. I've taken the "accepted_hits.bam" from Tophat and run that on Cufflinks (with no special options), and then taken the "transcripts.gtf" and run it through Cuffcompare with a reference annotation (hg19: genome.gtf, taken from iGenomes).
This is just a single biological replicate of paired-end reads (i.e. two files, *_1.fq and *_2.fq), and I just want the FPKM values for the genes in that replicate. So, first of all, is the workflow I described above correct for this? Next, looking at the cuffcmp.stats:
Code:
# Cuffcompare v2.2.1 | Command line was: #cuffcompare -p 2 -r /Users/erikfasterius/rna_seq/bowtie_indexes//hg19.gtf transcripts.gtf # #= Summary for dataset: transcripts.gtf : # Query mRNAs : 1099 in 1070 loci (708 multi-exon transcripts) # (27 multi-transcript loci, ~1.0 transcripts per locus) # Reference mRNAs : 49570 in 26031 loci (45413 multi-exon) # Super-loci w/ reference transcripts: 857 #--------------------| Sn | Sp | fSn | fSp Base level: 1.6 93.0 - - Exon level: 1.3 67.4 1.3 71.3 Intron level: 1.6 99.0 1.6 99.5 Intron chain level: 0.3 19.1 0.3 20.5 Transcript level: 0.0 0.0 0.0 0.0 Locus level: 0.5 12.3 0.5 12.5 Matching intron chains: 135 Matching loci: 132 Missed exons: 243810/248560 ( 98.1%) Novel exons: 97/4701 ( 2.1%) Missed introns: 224610/228634 ( 98.2%) Novel introns: 0/3623 ( 0.0%) Missed loci: 25103/26031 ( 96.4%) Novel loci: 87/1070 ( 8.1%) Total union super-loci across all input datasets: 945
Code:
head cuffcmp.transcripts.gtf.tmap ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len GNB1 NM_002074 c CUFF.1 CUFF.1.1 100 228.270145 132.599651 323.940640 4.059528 1359 CUFF.1.1 3194 RPL22 NM_000983 c CUFF.2 CUFF.2.1 100 231.819459 103.317386 360.321532 3.710384 876 CUFF.2.1 2085 PARK7 NM_001123377 = CUFF.3 CUFF.3.1 100 496.677254 301.998070 691.356439 8.342134 832 CUFF.3.1 903 ENO1 NM_001428 c CUFF.4 CUFF.4.1 100 844.654267 645.987522 1043.321012 14.260660 1250 CUFF.4.1 2187 TARDBP NM_007375 c CUFF.5 CUFF.5.1 100 169.652613 109.552539 229.752686 2.928344 2357 CUFF.5.1 4217 UQCRHL NM_001089591 c CUFF.6 CUFF.6.1 100 1198.040186 632.905455 1763.174917 13.612039 385 CUFF.6.1 538 - - u CUFF.7 CUFF.7.1 100 325.223511 212.645155 437.801867 5.269625 1402 CUFF.7.1 - SDHB NM_003000 c CUFF.8 CUFF.8.1 100 313.861645 139.762529 487.960760 5.058395 685 CUFF.8.1 1145 CDC42 NM_001039802 c CUFF.9 CUFF.9.1 100 230.544754 121.904615 339.184893 3.416650 1126 CUFF.9.1 2294
Thanks in advance!
Comment