Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq Quantification and Differential Expression Analysis

    I am currently analyzing RNA-Seq data from bacterial transcriptome. Here I have several questions regarding gene expression quantification and differential expression analysis.

    1. For Partially Overlapped reads
    If I use RPKM for quantification, how should I count the reads that only partially overlaps with the annotated gene regions in the reference genomes. Should I count each read as 1 no matter how long they overlaps with the gene? or to multiply 1 by a weight corresponding to how much they overlaps? Or to discard those reads and only consider the reads that are completely within a gene annotation?

    2. Paired-End reads
    Since my data are paired-end reads, should I consider that the gap between two ends contributes to coverage? Some background is that my RNA-seq data is not from a pure culture of bacteria. It might contain several similar species of bacteria. Their genomes are pretty similar, but not identical.

    3. Differential Expression Analysis Methodology
    a. I‘ve seen some posts discussing about DE methods. T-tests were recommended when there are "many" biological replicates. I am wondering if 5 vs. 5 should be considered as "many" or "OK amount of" replicates?
    b. Say, it is OK to use T-test. Since it is not clear whether it is valid to assume the T statistic follow a t-distribution given the data, is it more appropriate to use permutation method to generate null distribution?

    Those questions have bothered me for a long time. I appreciate any type of helps!!

    Thanks in advance,
    Dezhi

  • #2
    Can anybody kindly address some of the questions? My post has been ignored

    Comment


    • #3
      Hi Dezhi

      to use something like a t test, you need enough replicates to estimate a variance for each gene. With two groups of five samples, you are already entering the regime there this should work well. For comparison, also try a tool that pools information from several genes to get better confidence in variance estimates, such as our DESeq or the Smyth group's edgeR. (Of course, we like to claim that DESeq is better than edgeR, and for only two or three replicates, I do think so, but for five or more replicates, edgeR's "moderation" feature really pays off. So, even though I don't like admitting this, for your set-up, edgeR should work better than DESeq.)

      I wonder whether ten samples are sufficient to get a good permutation null. If you try it, be sure to let us know about your experiences.

      About RPKM: The raw integer number of counts gives you a lot of information about the expected Poisson noise, and this is why I recommend not to use RPKM values in differential expression analysis. Two genes with the same RPKM value can have very different accuracies, if one is a long gene with few counts and the other a short gene with many counts. This is why DESeq and edgeR require unnormalized counts as input.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-27-2024, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-27-2024, 06:07 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      69 views
      0 likes
      Last Post seqadmin  
      Working...
      X