Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • differential expression analysis in non-model species - best practice?

    Hi All,

    I've trawled the forums but have not found a complete discussion around this question: For RNA-seq DE analysis in non-model species, where a de novo transcriptome is the only mapping reference available, what's the most legitimate approach for DE testing? Transcript-level or 'gene'-level analysis?

    This is my understanding: In most cases, the non-model species community uses Trinity pipelines to assemble a reference transcriptome de novo (typically from the same reads used for downstream DE analysis), using RSEM for alignment-based abundance estimation to generate the counts tables for downstream DE analysis in whatever software you choose. Obviously, the success of DE analysis hinges on the accuracy of the count data used as input.
    There's a choice of using counts for Trinity transcripts (i.e., contigs in the de novo assembly theoretically equivalent to isoforms) (RSEM.isoforms.results), or at the level of Trinity 'components', which are a proxy for genes (RSEM.genes.results). (Compared to mapping against a genome, there are obvious inaccuracies with assembling genes and isoforms de novo, but its what we have).

    Obviously, a transcript-level analysis is preferred biologically but tricky in practice.
    *I'm aware that transcript-level analysis in popular edgeR and DESeq2 violates key assumptions of these programs. Many people go ahead anyway, and publish such results.
    *DEXseq is recommended for exon-level analysis, but appears to require mapping to a genome.
    *Alternatively, the 'gene'-level counts from RSEM can be used in e.g. DESeq2, although this brings its own issues because the Trinity components are only a proxy for gene level studies. Is this nevertheless the most legitimate approach for counts derived from de novo transcriptome mapping??
    *I've recently read of the alignment-free k-mer based approach of kallisto, with downstream DE analysis in sleuth, suitable at the transcript level. Is this new approach perhaps the best yet for non-model species??

    Like most, I'm relatively new to RNA-seq and am not a biostatistician. I realise there are issues with all of the above options, but I'm hoping some of the program developers and those with statistical minds can share some advice on what might be the most legitimate approach for non-model species.

    Many thanks.

  • #2
    Differential expression analysis at the gene level is always more reliable, regardless of the organism.

    More often than not, there is no reliable method of determining to which isoform a read belongs to when isoforms overlap. Less importantly, the counts are lower for the individual isoforms than for the genes.

    I like computing the coefficient of variation between replicates for isoforms vs genes to illustrate the tremendous gap in reliability in the results.

    Given the biological relevance of determining the differential expression at the isoform level, researchers will often request the results at the isoform level, but will end up using the analysis at the gene level, after seeing the unreliability of the results at the isoform level. There may be individual cases, where the differential expression analysis at the isoform level will give clear results, but this is generally not the case, especially at locations with many overlapping isoforms, or a low coverage.

    Comment


    • #3
      Kallisto can do transcript-level differential expression using a de-novo assembled transcriptome. It takes into account similarities in transcript sequences when doing counting, and has a stupidly fast bootstrapping mode for calculating a confidence interval for isoform proportions.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X