Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post schelhorn RNA Sequencing 5 08-27-2013 03:17 PM caswater Bioinformatics 3 10-28-2012 09:25 PM waspboyz Bioinformatics 3 06-20-2012 08:01 AM taozuo RNA Sequencing 1 03-04-2012 03:00 PM feng Bioinformatics 2 11-23-2011 02:02 AM

 01-18-2013, 01:38 AM #1 bharat_iyengar Member   Location: Delhi, India Join Date: Dec 2012 Posts: 20 Cufflinks statistical model I needed a little help in understanding the abundance estimation by Cufflinks. Please refer to cufflinks supplementary methods. Let me reiterate some of the key points/definitions; for the sake of convenience of explanation. ρ(t) = abundance of transcript t α(t) = probability of choosing a transcript t [identified by abundance and length] β(g) = sum(α(t)) (t belongs to g) = probability of choosing a transcript from a locus g γ(t) = probability that chosen transcript has given abundance and length Question 1: Does that mean that a transcript is fully identified by its length and abundance ? Question 2: In the parameter estimation section, I didnt quite understand how MLE of β becomes X(g)/M. Shouldnt it be the solution of ∑ ∂(X(g).log(β(g)))/∂β(g) = 0 ? Question 3: I dont understand importance sampling method much, but is there an intuitive way of understanding how is γ estimated from input variable i.e. reads ? FPKM calculation has l(t) in denominator. Cufflinks should accept any SAM/BAM file regardless of whether its passed through Tophat. If I pass to cufflinks, the reads aligned to transcriptome (refseq), and I dont provide any annotations, then: Question 4: How is a locus designated ? Question 5: How is l(t) estimated for FPKM calculation; length of a transcript should be smaller than a locus? Finally, how can I use cufflinks without involving genome alignments !?