Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Found the same thing

    Hey All,

    I noticed the same thing as konrad98 a while ago. I ran CuffDiff and DESeq on 2x36 bp PE-Reads from replicate samples. Cuffdiff says that ~30% are differentially expressed, whereas DESeq found 0. To find out why, I made MA plots for my samples. The X-axis is the average RPKM for the two samples and the Y-axis is the log2 Fold Change.

    As you can see, CuffDiff (for some reason) is much more variable in its calculation of RPKM. Note, these are the RPKM values from the genes.expr file and gene-level counting in exons.
    Attached Files

    Comment


    • #17
      Dear RCJ,

      Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
      I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

      Thanks!

      Comment


      • #18
        Originally posted by lpachter View Post
        Dear RCJ,

        Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
        I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

        Thanks!
        These are from human data with the newest version of Cufflinks/Cuffdiff. I left out the transformation step for the last MA plot. I am posting a new one along with the R code I used to generate it. Sorry about that mess up.

        Notice, since these are replicates, no genes (or very few) should be differentially expressed. Since DESeq uses raw counts, the x-axis units will be larger than FPKM of CuffDiff.

        Red areas are those that were called as significant in the gene_exp.diff file (CuffDiff) or padj (DESeq). No genes were differentially expressed by DESeq.
        Code:
        DMSO_1vsDMSO_2.Gx<-read.table("DMSO_1vsDMSO_2.Gx",header=T)
        CuffDiff<-read.table("CuffDiff/DMSO1vDMSO2/gene_exp.diff",header=T)
        jpeg(file="Comparison.jpeg",width = 620, height = 280,quality = 70)
        
        par(mfrow=c(1,2))
        M<-log2(CuffDiff$value_2)-log2(CuffDiff$value_1)
        A<-.5*(log2(CuffDiff$value_2)+log2(CuffDiff$value_1))
        plot(A,M,main="CuffDiff",col=ifelse(CuffDiff$significant == "yes" ,"white","black"))
        text(A,M,col=c("red"),cex=.8,label=ifelse(CuffDiff$significant == "yes" ,CuffDiff$gene,""))
        
        M<-log2(DMSO_1vsDMSO_2.Gx$baseMeanB)-log2(DMSO_1vsDMSO_2.Gx$baseMeanA)
        A<-.5*(log2(DMSO_1vsDMSO_2.Gx$baseMeanB)+log2(DMSO_1vsDMSO_2.Gx$baseMeanA))
        plot(A,M,main="DESeq",col=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,"white","black"))
        text(A,M,col=c("red"),cex=.8,label=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,DMSO_1vsDMSO_2.Gx$id,""))
        dev.off()
        Attached Files

        Comment


        • #19
          By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?

          Comment


          • #20
            My bad. It is 0.9.2. I'll have to update next week.

            Comment


            • #21
              Update on Cufflinks v0.9.3

              Originally posted by adarob View Post
              By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?
              This figure was generated using the same protocol as above.
              Using v0.9.3 really made things better. Now I have a FDR of 2.13% (427/20014), much better than the 30% I had previously. There is still a bit more variation than from the counting methods, but at least I can work with this.

              Thanks for the updates!
              Attached Files

              Comment


              • #22
                How did you count the reads from the assembly file?

                Hi, xinchen,

                Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

                Thank you so much,

                Lahoman

                Comment


                • #23
                  Originally posted by lahoman View Post
                  Hi, xinchen,

                  Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

                  Thank you so much,

                  Lahoman
                  Hi, Lahoman,

                  Here is the introduction for cufflinks.


                  It is better to read through it. I thought there might be some shortcuts or easy ways to use it. But later on, I found it really necessary to read all the instructions carefully, and make sure about the parameters you should use.

                  Cheers,
                  Jun

                  Comment


                  • #24
                    Cufflink results filtering?

                    Hi,RockChalkJayhawk,

                    After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
                    263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

                    Thanks,

                    Lahoman

                    Comment


                    • #25
                      Originally posted by lahoman View Post
                      Hi,RockChalkJayhawk,

                      After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
                      263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

                      Thanks,

                      Lahoman
                      Unfortunately, there is no right answer. Everything is subjective. Just use good judgement...

                      Comment


                      • #26
                        I think this issue of excessive cufflinks predicted transcripts is one many people are wrestling with. I have not figured out a satisfactory way to deal with this other than just using ONLY existing Ensembl gene models (boo!). My interpretation of the problem is that pre-mRNA reads (and perhaps other forms of "background") give cufflinks trouble (understandably). I don't know if using paired-end data (which I did not do) and 100s of millions of reads (like the cufflinks paper did) as opposed to 10s of millions of read would help. If others have thoughts I would love to hear them..

                        Comment


                        • #27
                          The reason for excessive transcripts is that in genes with low coverage, Cufflinks will likely predict multiple disjoint transcripts- that is in fact the only thing it can do because it is not a gene finder and has no reason to glue them together (its also true that "background" causes problems and makes assembly difficult).

                          To address these issues, and also the fact that many users want to discover novel isoforms with respect to a known annotation (e.g. ENSEMBL), we have developed a new reference annotation guided assembly mode that will take as input your annotation and assemble with respect to that. It results in much cleaner assemblies by merging partial transcript fragments into the known annotations, yet is able to reveal novel isoforms. It sounds like it addresses the issue (and desired feature) you are reporting on.

                          It will be released in the next week or two as soon as we finish some more testing, so please check back with us before the end of the month.

                          Comment


                          • #28
                            the mapping position of RNA-Seq spanning junction

                            Hi,
                            I checked the accepted_hits.sam generated by TopHat. For one reads pair, their mapping results are:

                            USI-EAS376_0001:3:112:7215:15873#0 129 chr1 14790 255 40M140N35M = 155253852 0 CCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCC bbbbbabbbbbbbbbbbbbbbababbbbbbbb^bbbbcbbbb`cR`\aaY`\a`bbbcbb_\_^baa[`Wa]X_a NM:i:0 XS:A:- NS:i:0


                            The CIGAR code is 40M140N35M. We can see it spans two exons. 14790 on chr1 is the position in the first exon. How can we know the information about the second exon?

                            Thanks a lot,

                            Lahoman

                            Comment


                            • #29
                              Hi,

                              Suddenly I realize that the second exon is from the same chromosome. The distance between the first and second exon is 140bp.
                              The one more question is, if a RNA-Seq reads is spliced and aligned to different chromsome, TopHat won't report it. Am I right? In other words, TopHat can't be used for genefusion analysis.

                              Thanks,

                              Lahoman

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X