Hi all,
I plan to use the DESeq2 package for differential expression analysis between two conditions and I'm wondering which transcriptome/s (consensus or singles) should I use as reference. I don't have a genome for my specie.
The samples (regardless of the condition) presents different numbers of genes and isoforms (as the annotation of the contigs assembled suggest). Some people suggest generate a single assembly based on combining all reads across all samples as inputs and then align the reads separately back to the single ("consensus") assembly for downstream analysis of differential expression. But I really don't know if it is the best way to proceed. For example, for those samples that have only one isoform - for a determined gene - the read count would be overestimated if the consensus transcriptome included other isoforms (with shared exons) of the same gene from other samples. I don't want to discard the multimapping reads.
The other option simply consists on aligning the reads of each sample with its corresponding assembly. I do not know to what extent the heterogeneity of the single assemblies (distinct number of genes and isoforms, differences on transcripts lengths, etc) can affect the differential gene expression analysis.
The third option I have in mind is to use the exons obtained from all the samples as reference (I can obtain them from my transcripts and using exons of a related specie). I think that this could be the best option.
Which option do you think would be the best for the differential expression analysis with DESeq (or DEXeq)?
Thanks in advance,
Facundo
I plan to use the DESeq2 package for differential expression analysis between two conditions and I'm wondering which transcriptome/s (consensus or singles) should I use as reference. I don't have a genome for my specie.
The samples (regardless of the condition) presents different numbers of genes and isoforms (as the annotation of the contigs assembled suggest). Some people suggest generate a single assembly based on combining all reads across all samples as inputs and then align the reads separately back to the single ("consensus") assembly for downstream analysis of differential expression. But I really don't know if it is the best way to proceed. For example, for those samples that have only one isoform - for a determined gene - the read count would be overestimated if the consensus transcriptome included other isoforms (with shared exons) of the same gene from other samples. I don't want to discard the multimapping reads.
The other option simply consists on aligning the reads of each sample with its corresponding assembly. I do not know to what extent the heterogeneity of the single assemblies (distinct number of genes and isoforms, differences on transcripts lengths, etc) can affect the differential gene expression analysis.
The third option I have in mind is to use the exons obtained from all the samples as reference (I can obtain them from my transcripts and using exons of a related specie). I think that this could be the best option.
Which option do you think would be the best for the differential expression analysis with DESeq (or DEXeq)?
Thanks in advance,
Facundo
Comment