Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • RockChalkJayhawk
    Senior Member
    • Mar 2009
    • 192

    #16
    Found the same thing

    Hey All,

    I noticed the same thing as konrad98 a while ago. I ran CuffDiff and DESeq on 2x36 bp PE-Reads from replicate samples. Cuffdiff says that ~30% are differentially expressed, whereas DESeq found 0. To find out why, I made MA plots for my samples. The X-axis is the average RPKM for the two samples and the Y-axis is the log2 Fold Change.

    As you can see, CuffDiff (for some reason) is much more variable in its calculation of RPKM. Note, these are the RPKM values from the genes.expr file and gene-level counting in exons.
    Attached Files

    Comment

    • lpachter
      Member
      • Feb 2010
      • 40

      #17
      Dear RCJ,

      Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
      I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

      Thanks!

      Comment

      • RockChalkJayhawk
        Senior Member
        • Mar 2009
        • 192

        #18
        Originally posted by lpachter View Post
        Dear RCJ,

        Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
        I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

        Thanks!
        These are from human data with the newest version of Cufflinks/Cuffdiff. I left out the transformation step for the last MA plot. I am posting a new one along with the R code I used to generate it. Sorry about that mess up.

        Notice, since these are replicates, no genes (or very few) should be differentially expressed. Since DESeq uses raw counts, the x-axis units will be larger than FPKM of CuffDiff.

        Red areas are those that were called as significant in the gene_exp.diff file (CuffDiff) or padj (DESeq). No genes were differentially expressed by DESeq.
        Code:
        DMSO_1vsDMSO_2.Gx<-read.table("DMSO_1vsDMSO_2.Gx",header=T)
        CuffDiff<-read.table("CuffDiff/DMSO1vDMSO2/gene_exp.diff",header=T)
        jpeg(file="Comparison.jpeg",width = 620, height = 280,quality = 70)
        
        par(mfrow=c(1,2))
        M<-log2(CuffDiff$value_2)-log2(CuffDiff$value_1)
        A<-.5*(log2(CuffDiff$value_2)+log2(CuffDiff$value_1))
        plot(A,M,main="CuffDiff",col=ifelse(CuffDiff$significant == "yes" ,"white","black"))
        text(A,M,col=c("red"),cex=.8,label=ifelse(CuffDiff$significant == "yes" ,CuffDiff$gene,""))
        
        M<-log2(DMSO_1vsDMSO_2.Gx$baseMeanB)-log2(DMSO_1vsDMSO_2.Gx$baseMeanA)
        A<-.5*(log2(DMSO_1vsDMSO_2.Gx$baseMeanB)+log2(DMSO_1vsDMSO_2.Gx$baseMeanA))
        plot(A,M,main="DESeq",col=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,"white","black"))
        text(A,M,col=c("red"),cex=.8,label=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,DMSO_1vsDMSO_2.Gx$id,""))
        dev.off()
        Attached Files

        Comment

        • adarob
          Member
          • Jul 2010
          • 71

          #19
          By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?

          Comment

          • RockChalkJayhawk
            Senior Member
            • Mar 2009
            • 192

            #20
            My bad. It is 0.9.2. I'll have to update next week.

            Comment

            • RockChalkJayhawk
              Senior Member
              • Mar 2009
              • 192

              #21
              Update on Cufflinks v0.9.3

              Originally posted by adarob View Post
              By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?
              This figure was generated using the same protocol as above.
              Using v0.9.3 really made things better. Now I have a FDR of 2.13% (427/20014), much better than the 30% I had previously. There is still a bit more variation than from the counting methods, but at least I can work with this.

              Thanks for the updates!
              Attached Files

              Comment

              • lahoman
                Member
                • Jan 2011
                • 12

                #22
                How did you count the reads from the assembly file?

                Hi, xinchen,

                Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

                Thank you so much,

                Lahoman

                Comment

                • nkwuji
                  Member
                  • Mar 2010
                  • 19

                  #23
                  Originally posted by lahoman View Post
                  Hi, xinchen,

                  Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

                  Thank you so much,

                  Lahoman
                  Hi, Lahoman,

                  Here is the introduction for cufflinks.


                  It is better to read through it. I thought there might be some shortcuts or easy ways to use it. But later on, I found it really necessary to read all the instructions carefully, and make sure about the parameters you should use.

                  Cheers,
                  Jun

                  Comment

                  • lahoman
                    Member
                    • Jan 2011
                    • 12

                    #24
                    Cufflink results filtering?

                    Hi,RockChalkJayhawk,

                    After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
                    263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

                    Thanks,

                    Lahoman

                    Comment

                    • RockChalkJayhawk
                      Senior Member
                      • Mar 2009
                      • 192

                      #25
                      Originally posted by lahoman View Post
                      Hi,RockChalkJayhawk,

                      After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
                      263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

                      Thanks,

                      Lahoman
                      Unfortunately, there is no right answer. Everything is subjective. Just use good judgement...

                      Comment

                      • chrisbala
                        Member
                        • Jan 2010
                        • 82

                        #26
                        I think this issue of excessive cufflinks predicted transcripts is one many people are wrestling with. I have not figured out a satisfactory way to deal with this other than just using ONLY existing Ensembl gene models (boo!). My interpretation of the problem is that pre-mRNA reads (and perhaps other forms of "background") give cufflinks trouble (understandably). I don't know if using paired-end data (which I did not do) and 100s of millions of reads (like the cufflinks paper did) as opposed to 10s of millions of read would help. If others have thoughts I would love to hear them..

                        Comment

                        • lpachter
                          Member
                          • Feb 2010
                          • 40

                          #27
                          The reason for excessive transcripts is that in genes with low coverage, Cufflinks will likely predict multiple disjoint transcripts- that is in fact the only thing it can do because it is not a gene finder and has no reason to glue them together (its also true that "background" causes problems and makes assembly difficult).

                          To address these issues, and also the fact that many users want to discover novel isoforms with respect to a known annotation (e.g. ENSEMBL), we have developed a new reference annotation guided assembly mode that will take as input your annotation and assemble with respect to that. It results in much cleaner assemblies by merging partial transcript fragments into the known annotations, yet is able to reveal novel isoforms. It sounds like it addresses the issue (and desired feature) you are reporting on.

                          It will be released in the next week or two as soon as we finish some more testing, so please check back with us before the end of the month.

                          Comment

                          • lahoman
                            Member
                            • Jan 2011
                            • 12

                            #28
                            the mapping position of RNA-Seq spanning junction

                            Hi,
                            I checked the accepted_hits.sam generated by TopHat. For one reads pair, their mapping results are:

                            USI-EAS376_0001:3:112:7215:15873#0 129 chr1 14790 255 40M140N35M = 155253852 0 CCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCC bbbbbabbbbbbbbbbbbbbbababbbbbbbb^bbbbcbbbb`cR`\aaY`\a`bbbcbb_\_^baa[`Wa]X_a NM:i:0 XS:A:- NS:i:0


                            The CIGAR code is 40M140N35M. We can see it spans two exons. 14790 on chr1 is the position in the first exon. How can we know the information about the second exon?

                            Thanks a lot,

                            Lahoman

                            Comment

                            • lahoman
                              Member
                              • Jan 2011
                              • 12

                              #29
                              Hi,

                              Suddenly I realize that the second exon is from the same chromosome. The distance between the first and second exon is 140bp.
                              The one more question is, if a RNA-Seq reads is spliced and aligned to different chromsome, TopHat won't report it. Am I right? In other words, TopHat can't be used for genefusion analysis.

                              Thanks,

                              Lahoman

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              33 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              97 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              117 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              111 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...