![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bug? duplicated genes in cufflinks output genes.expr | silin284 | Bioinformatics | 3 | 05-18-2014 12:19 AM |
transcripts missing from cufflinks | djm2007 | RNA Sequencing | 3 | 05-30-2013 12:39 AM |
PubMed: Quantification of Gene Transcripts with Deep Sequencing Analysis of Gene Expr | Newsbot! | Literature Watch | 0 | 01-13-2011 03:00 AM |
cuffdiif or cufflinks expr comparisons | middlemale | Bioinformatics | 0 | 11-05-2010 03:44 AM |
inconsistent gene names in genes.expr - Cufflinks | Boel | Bioinformatics | 2 | 04-14-2010 06:16 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Rockville, MD Join Date: Jan 2009
Posts: 126
|
![]()
Hello,
I am looking for a way to derive the raw count of reads mapping to individual Cufflinks transcripts. My closest guess is that the "coverage" value (column 13) in the transcripts.expr file is what I need (the manual explains this as being an "Estimate for the absolute depth of read coverage across the whole transcript"). This number is non-integer for most transcripts, but I guess that may come from averaging across isoforms. Does anyone have an insight into this? Any advice would be much appreciated. Thanks, Shurjo |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
![]()
Multiplying the average depth of coverage by the transcript length will give you the *estimated* number of reads assigned to each transcript. Note that Cufflinks' statistical model probabilistically assigns reads to individual isoforms, because when isoforms share exons, reads from those exons can't be unambiguously assigned to one particular isoform.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Rockville, MD Join Date: Jan 2009
Posts: 126
|
![]()
OK, cool, thanks.
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Southern France Join Date: Aug 2009
Posts: 269
|
![]()
Watch out when considering these read densities if they come from overlapping transcripts though: although it is counter-intuitive, they cannot be directly compared to each other as if they could indicate a relative "expression" or "splicing" value.
Indeed, since a read that overlaps a common exonic part of two splicing variants is used to compute both density values (although it comes from one of the variants only), it would be mathematically incorrect to assume that read densities can be compared between such overlapping splicing variants. If reads are assigned to each variant independently, cases can appear where a variant A that is present at a lower number of copy than a variant B is given a higher density.. so RPKM are only relevant across "genes" (non overlapping transcripts). Therefore, reads should actually be assigned across transcripts (considering the whole set of variants) in a discriminative way, and that is why proper transcriptome reconstruction requires linear programming algorithms in order to find the optimal distribution. Unless of course i am totally wrong ![]() |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Carlsbad,CA Join Date: Jan 2010
Posts: 94
|
![]() Quote:
exon1| exon2|exon3 Lets say that Sample A exhibits exon1|exon3 only 20% of the times exon1|exon2|exon3 only 80% of the times while Sample B exhibits exon1|exon3 100% of the times how will you determine average depth of this gene in sampleA? I understand sampleB should be easier as you should see no reads spanning exon1| exon2 or exon2|exon3 How does cufflinks handle or calculate relative abundance of isoforms within a sample? |
|
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: Boston, MA Join Date: Nov 2008
Posts: 212
|
![]() Quote:
For now, let me roughly describe how it works to give you some intuition. Suppose you have two isoforms A and B of a gene (we'll assume they are the same length for now). We want to calculate the abundances of A and B by counting up how many fragments came from A and how many came from B, and then dividing each count by the total number of fragments. This gives us the relative abundances for A and B. Now consider a fragment that could have come from either of two isoforms, A or B. We can't just assign this fragment to A or B - we need to "probabilistically assign" to both of them. The probability it came from A in this example is just equal to the relative abundance of A (and the same goes for B). Now we're in a bind, because to get the abundances we need the counts, and to get the (probabilistic) counts, we need the abundances! There's an additional complication that happens because of course transcripts have differing lengths. Cufflinks calculates the abundances of transcripts by assuming that the fragments in an experiment were generated according to a basic model of how RNA-Seq works. With these assumptions one can write down a function that takes an assignment of relative abundances to each transcript as its argument. This function outputs a single number - the likelihood of observing the data in the experiment given the abundances supplied as the argument. The function is built by writing down the probability of observing each fragment in the actual data set, given some assignment of abundances to the transcript. It turns out that with this model, there's a single assignment of abundances to transcripts that *best* explains the data that came off of the sequencer. Cufflinks is able to find this value by systematically exploring the space of possible abundances for each transcript. There are a lot of details in the paper, but this is the core idea. Once we have the abundances, we can use them to estimate how many fragments came from each transcript, and thus we can estimate the average depth of sequencing coverage for each one. |
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: Carlsbad,CA Join Date: Jan 2010
Posts: 94
|
![]() Quote:
Did I get this right? I think, it takes into account what are all the possible isoform for this gene by taking in the gene annotation that we provide (is this where the GTF file comes in as input). It then taken the counts of reads and determines which combination of exons (or which isoform) is most probable. |
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: Pasadena, CA USA Join Date: Jan 2010
Posts: 52
|
![]() Quote:
![]() if a transcript looks like this in transcripts.gtf: Code:
chr1 Cufflinks transcript 3204563 3661579 1000 - . gene_id "Xkr4"; transcript_id "Xkr4"; FPKM "0.0459178326"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.110855"; cov "0.020363"; chr1 Cufflinks exon 3204563 3207049 1000 - . gene_id "Xkr4"; transcript_id "Xkr4"; exon_number "1"; FPKM "0.0459178326"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.110855"; cov "0.020363"; chr1 Cufflinks exon 3411783 3411982 1000 - . gene_id "Xkr4"; transcript_id "Xkr4"; exon_number "2"; FPKM "0.0459178326"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.110855"; cov "0.020363"; chr1 Cufflinks exon 3660633 3661579 1000 - . gene_id "Xkr4"; transcript_id "Xkr4"; exon_number "3"; FPKM "0.0459178326"; frac "1.000000"; conf_lo "0.000000"; conf_hi "0.110855"; cov "0.020363"; transcript length = sum of length of exons = 3,631 cov = 0.020363 number of mapped reads/transcript = 3,631 * 0.020363 = 74 Is doesn't seem to be correct! It would be great if cufflinks could report not only normalized FPKM values but also the raw number of mapped reads per transcript and exon (to use in software like DESeq, which I am using since I couldn't figure out how to use cuffcompare/cuffdiff with biological replicates for each conditition ![]() Thanx in advance for your help and consideration. Last edited by marcora; 05-18-2010 at 04:59 AM. |
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]()
marcora,
it's been a while since you posted this, but it appears to me that 74 is the number or *bases* that mapped to that transcript, as cufflinks coverage is i think on a base-level. so to get the number of reads you would have to divide this number by your read length, i.e, by 50 or 75, or the length of your reads. Does this make sense? Carmen ![]() |
![]() |
![]() |
![]() |
#10 | |
Junior Member
Location: china Join Date: Oct 2013
Posts: 5
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|