Dear all,
we have sequenced the transcriptome of different varieties of the same species (lacking complete genome information) using different read types, and obtained contigs with a respectable N50 for each of the varieties.
Now we want to use the reads to perform a differential gene transcript expression analysis, and two are the possible strategies that I could think of.
1) Map the variety reads on the variety contigs, and then group the contigs via some homology procedure. Then perform the analysis by comparing the counts within these groups.
Issue 1: the same transcript has sometimes different levels of fragmentation across the assemblies (like three fragments here, full sequence there), making a direct 1-1 comparison inappropriate.
Issue 2: robust orthology assignment methods (OrthoMCL, inParanoid etc.) are tuned and work principally (as far as I know) on proteins.
Issue 3: all differential expression tools that I know (e.g. EdgeR) assume identical lengths for the contigs targeted by the read match counts.
2) An alternative is to do a whole assembly using all varieties, and then map each variety reads separately on these contigs, thereby solving all the previous issues. However, it sounds dirty, and the joint assembly is very fragmented compared to the variety-specific ones.
How would you tackle a case like this? Would you favour one approach or the other? Possibly I'm missing some major strategy (and perhaps I'm duplicating another post on the issue), but forgive me, I'm a fresher
Thank you!
Federico
we have sequenced the transcriptome of different varieties of the same species (lacking complete genome information) using different read types, and obtained contigs with a respectable N50 for each of the varieties.
Now we want to use the reads to perform a differential gene transcript expression analysis, and two are the possible strategies that I could think of.
1) Map the variety reads on the variety contigs, and then group the contigs via some homology procedure. Then perform the analysis by comparing the counts within these groups.
Issue 1: the same transcript has sometimes different levels of fragmentation across the assemblies (like three fragments here, full sequence there), making a direct 1-1 comparison inappropriate.
Issue 2: robust orthology assignment methods (OrthoMCL, inParanoid etc.) are tuned and work principally (as far as I know) on proteins.
Issue 3: all differential expression tools that I know (e.g. EdgeR) assume identical lengths for the contigs targeted by the read match counts.
2) An alternative is to do a whole assembly using all varieties, and then map each variety reads separately on these contigs, thereby solving all the previous issues. However, it sounds dirty, and the joint assembly is very fragmented compared to the variety-specific ones.
How would you tackle a case like this? Would you favour one approach or the other? Possibly I'm missing some major strategy (and perhaps I'm duplicating another post on the issue), but forgive me, I'm a fresher
Thank you!
Federico
Comment