Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNAseq time course data

    Greetings,
    I am new to RNAseq data analysis, so I am hoping that someone can offer helpful feedback. We have generated RNAseq data (Illumina platform) for time course experiment (one reference point at the start of a culture followed by 8 successive samples at timed intervals). The data have been processed using TopHat, Bowtie, and Cufflinks and we have RPKM values now. My question concerns further analysis of the data...in particular, how to deal with genes whose RPKM values start at zero and increase substantially [how to represent expression changes (e.g.,fold-change) when the denominator is zero]? Use of log2 ratios such as with microarray data doesn't seem like an option. Similarly, what tools can be used to determine genes that are statistically significant in terms of differential expression? With microarrays, we'd use something like SAM where we could set a false-discovery rate. Is SAM an acceptable tool for RNAseq data? Is anyone out there dealing with data like this and have a pipeline that's working? Many thanks for your help.
    Last edited by rmberka; 03-11-2010, 02:08 PM.

  • #2
    hi, rmberka
    I'm new here too. Our data was analyzed by the company who sequenced for us (including determined differentially expressed genes). they used a software developed by themselves. But i think u can use other methods/softwares to deal with your data, like DEGseq (http://www.bioconductor.org/packages...ml/DEGseq.html).
    Good luck!

    Comment


    • #3
      Thanks, Dennis.
      I actually found a very recent paper on this topic and other readers may also find it of interest. See Bullard, JH et al. 2010. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11: 94 doi:10.1186/1471-2105-11-94

      Comment


      • #4
        DE analysis of RNA-seq data

        Hi all

        A couple of quick points:

        1. Tools such as SAM designed for analysing microarray data will not be optimal for RNA-seq data. The discrete nature of RNA-seq data (as opposed to continuous responses in microarray data) requires different mathematics and therefore new tools for the best assessment of differential expression. There are currently four software tools available for DE analysis of RNA-seq data: baySeq, DEGseq, DESeq and edgeR. I am one of the developers for edgeR, so I recommend this one

        2. These tools will give sensible fold changes even when comparing groups with some zero counts. Typically, you should not do RPKM normalization before using these tools, although some normalization is recommended. In particular, TMM normalization (Robinson and Oshlack, 2010), which is supported in edgeR.

        I have discussed a possible RNA-seq analysis pipeline here: http://seqanswers.com/forums/showthread.php?t=5248

        Hope that the discussion is useful.

        Best regards
        Davis

        Comment


        • #5
          Thank you

          Dear Davis,

          Many thanks for your reply. This is very helpful information indeed. We had actually come across the paper describing edgeR and we were just about to give it a try!

          Cheers and thanks again!!!

          Comment


          • #6
            Great to hear that you're planning to give edgeR a try - I hope you find the software helpful and of course get in touch if there is anything further I might be able to help with.

            Cheers
            Davis

            Comment


            • #7
              Fdr

              Hi,

              I'm not sure if this is the right place to ask this question but I'll give it a try and see if anyone can help me out.

              I've analysed my data for DE using edgeR. However, I have a question about the false discovery rate. I am comparing two treatments, each with 6 biological replicates. I understand how the FDR (BH) works but I would like to know if anyone thinks this is absolutely necessary even to the detriment of biological significance? The problem I come across is that I have a number of genes that are significantly differentially expressed based on uncorrected P-values of <0.01 but when I correct for FDR (<0.05) I end up with very small numbers (~50). I have investigated this and used GOseq to correct for length bias. I have also run this data through a pathway analysis tool and find that the uncorrected p-values turn up genes that would appear to be biologically significant.

              I don't want to report results that are misleading but I also don't want to mask any biological effect. What is the common practice here and do many people use FDR or do some report as p-values (<0.01)? Is it alright to publish without using FDR?

              Thanks in advance!

              Comment


              • #8
                can you please tell me if I can use edgeR for time course analysis (control vs mutant) at different time points? I want to do differentiation expression throughout the time course. I have tag counts data. thanks in advance.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                58 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                45 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X