Hey Everyone!
I'm assembling RNA-Seq data from 6 related taxa in the basal eudicots and doing some statistical analysis on the abundance profiles of the resulting contigs (EdgeR, WGCNA, etc). Folks in my lab are kicking around ideas on what procedures to undertake to eliminate trinity contigs that are likely to be false isoforms or replicates within the data set. We've tried digital normalization and eliminating contigs with a large number of frame shifts and are comparing results on the first assembled taxon before moving on to the others.
My question for you, the seqanswers community, is what steps (if any) do you take to filter out potentially bad contigs from your trinity assemblies before moving on to abundance estimation and differential expression analysis?
I'm assembling RNA-Seq data from 6 related taxa in the basal eudicots and doing some statistical analysis on the abundance profiles of the resulting contigs (EdgeR, WGCNA, etc). Folks in my lab are kicking around ideas on what procedures to undertake to eliminate trinity contigs that are likely to be false isoforms or replicates within the data set. We've tried digital normalization and eliminating contigs with a large number of frame shifts and are comparing results on the first assembled taxon before moving on to the others.
My question for you, the seqanswers community, is what steps (if any) do you take to filter out potentially bad contigs from your trinity assemblies before moving on to abundance estimation and differential expression analysis?