Hi there.
I would like to to get for a list of exons their expression level from my RNA-seq data.
I thought about using cufflinks but the problem is that cufflinks generate FPKM at the isoform level. I usually feed cufflinks with a refseq gtf file downloaded from UCSC table browser (under genes and genes prediction), and cufflinks use that to build locus.
I thought to switch back from locus to exons (using the ref ID NM/R_somenumber) but it's a wrong way to go by since:
1) many exons participate in more than one transcript, and each has a different FPKM value
2) FPKM number are probably calculated considering fragment length and each transcripts has its own, so I cannot just sum up the FPKM from all transcripts for a particular exon
If I was in a world in which every gene had only one transcripts I guess my life would be easier and I would be blonde and handsome , but reality is crueler than that.
All tables I can download on ucsc include variants so I got my hands on illumina truseq exome file containing all exons per gene in a definitively matter, meaning each exon appear once , even if some belongs to different variants. But it seems this file is no good for cufflinks and it was not able to build locus right.
I thought of just using samtools for each exon:
samtools view accpeted_hits.bam chr1:exonstrat-exonend | wc -l
and just get and number and calculate my own FPKM using the exon's length (and total number of reads)
Do you think using samtools is fine ?
I would like to to get for a list of exons their expression level from my RNA-seq data.
I thought about using cufflinks but the problem is that cufflinks generate FPKM at the isoform level. I usually feed cufflinks with a refseq gtf file downloaded from UCSC table browser (under genes and genes prediction), and cufflinks use that to build locus.
I thought to switch back from locus to exons (using the ref ID NM/R_somenumber) but it's a wrong way to go by since:
1) many exons participate in more than one transcript, and each has a different FPKM value
2) FPKM number are probably calculated considering fragment length and each transcripts has its own, so I cannot just sum up the FPKM from all transcripts for a particular exon
If I was in a world in which every gene had only one transcripts I guess my life would be easier and I would be blonde and handsome , but reality is crueler than that.
All tables I can download on ucsc include variants so I got my hands on illumina truseq exome file containing all exons per gene in a definitively matter, meaning each exon appear once , even if some belongs to different variants. But it seems this file is no good for cufflinks and it was not able to build locus right.
I thought of just using samtools for each exon:
samtools view accpeted_hits.bam chr1:exonstrat-exonend | wc -l
and just get and number and calculate my own FPKM using the exon's length (and total number of reads)
Do you think using samtools is fine ?
Comment