SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Differential transcript expression between different varieties of the same species giorgifm RNA Sequencing 8 12-12-2016 05:21 AM
differential expression across species hugomarquez Bioinformatics 0 10-12-2015 01:24 PM
Between species differential expression analysis GillermoPonz Bioinformatics 0 08-20-2015 04:08 AM
How to do gene expression analysis in Arraystar (DNAstar) for non-model species qingdaoalbert RNA Sequencing 0 07-18-2013 06:20 AM

Reply
 
Thread Tools
Old 12-07-2015, 01:22 PM   #1
evt8
Junior Member
 
Location: New Zealand

Join Date: Aug 2014
Posts: 7
Default differential expression analysis in non-model species - best practice?

Hi All,

I've trawled the forums but have not found a complete discussion around this question: For RNA-seq DE analysis in non-model species, where a de novo transcriptome is the only mapping reference available, what's the most legitimate approach for DE testing? Transcript-level or 'gene'-level analysis?

This is my understanding: In most cases, the non-model species community uses Trinity pipelines to assemble a reference transcriptome de novo (typically from the same reads used for downstream DE analysis), using RSEM for alignment-based abundance estimation to generate the counts tables for downstream DE analysis in whatever software you choose. Obviously, the success of DE analysis hinges on the accuracy of the count data used as input.
There's a choice of using counts for Trinity transcripts (i.e., contigs in the de novo assembly theoretically equivalent to isoforms) (RSEM.isoforms.results), or at the level of Trinity 'components', which are a proxy for genes (RSEM.genes.results). (Compared to mapping against a genome, there are obvious inaccuracies with assembling genes and isoforms de novo, but its what we have).

Obviously, a transcript-level analysis is preferred biologically but tricky in practice.
*I'm aware that transcript-level analysis in popular edgeR and DESeq2 violates key assumptions of these programs. Many people go ahead anyway, and publish such results.
*DEXseq is recommended for exon-level analysis, but appears to require mapping to a genome.
*Alternatively, the 'gene'-level counts from RSEM can be used in e.g. DESeq2, although this brings its own issues because the Trinity components are only a proxy for gene level studies. Is this nevertheless the most legitimate approach for counts derived from de novo transcriptome mapping??
*I've recently read of the alignment-free k-mer based approach of kallisto, with downstream DE analysis in sleuth, suitable at the transcript level. Is this new approach perhaps the best yet for non-model species??

Like most, I'm relatively new to RNA-seq and am not a biostatistician. I realise there are issues with all of the above options, but I'm hoping some of the program developers and those with statistical minds can share some advice on what might be the most legitimate approach for non-model species.

Many thanks.
evt8 is offline   Reply With Quote
Old 12-07-2015, 06:41 PM   #2
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Differential expression analysis at the gene level is always more reliable, regardless of the organism.

More often than not, there is no reliable method of determining to which isoform a read belongs to when isoforms overlap. Less importantly, the counts are lower for the individual isoforms than for the genes.

I like computing the coefficient of variation between replicates for isoforms vs genes to illustrate the tremendous gap in reliability in the results.

Given the biological relevance of determining the differential expression at the isoform level, researchers will often request the results at the isoform level, but will end up using the analysis at the gene level, after seeing the unreliability of the results at the isoform level. There may be individual cases, where the differential expression analysis at the isoform level will give clear results, but this is generally not the case, especially at locations with many overlapping isoforms, or a low coverage.
blancha is offline   Reply With Quote
Old 12-08-2015, 11:22 PM   #3
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Kallisto can do transcript-level differential expression using a de-novo assembled transcriptome. It takes into account similarities in transcript sequences when doing counting, and has a stupidly fast bootstrapping mode for calculating a confidence interval for isoform proportions.
gringer is offline   Reply With Quote
Reply

Tags
deseq2, differential expression, kallisto, non-model organism, sleuth

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO