Hi. I'd like to discuss a situation that has been partially discussed in this thread:
I have RNAseq data from two cultivars of a plant which is polyploid. I've taken the approach of doing a de novo transcriptome assembly separately for each cultivar.
Examining the transcriptomes reveals that about 1/3 of the transcripts are unique to a cultivar, and 2/3 of the transcripts have a very close or identical BLAST hit in the other transcriptome.. but these hits are rarely full-length. There are actually few transcripts with a 100% full-length match in the other cultivar/genotype.
Doing differential expression analysis for different conditions in the same cultivar, no problem, I can do that with standard approaches. However, I'm not entirely sure how to do differential expression analyses across cultivars/genotype.
I know that I somehow need to produce a "combined" reference transcriptome, by one of two approaches:
(1) Simply throw all RNAseq reads from both cultivars into a new de novo assembly (which I'm doing now)
(2) Combine the two existing de novo assemblies into a new assembly using an OLC-based method like CAP3 or MIRA
To me, the main thing to avoid is assemblies containing "fake" transcripts that are half from one cultivar and half from another, and I can see approach (1) doing that a lot, because the de novo assembly breaks everything into kmers and you lose information about which full-length transcripts come from which cultivar/genotype. I am thinking that approach (2) is better for avoiding "fake" fusion transcripts since it starts from the point of long transcripts that are known to come from just one cultivar/genotype.
PS. Did I mention it's a horrible polyploid and there's no genome?
Does anyone have an opinion or similar experience?
I have RNAseq data from two cultivars of a plant which is polyploid. I've taken the approach of doing a de novo transcriptome assembly separately for each cultivar.
Examining the transcriptomes reveals that about 1/3 of the transcripts are unique to a cultivar, and 2/3 of the transcripts have a very close or identical BLAST hit in the other transcriptome.. but these hits are rarely full-length. There are actually few transcripts with a 100% full-length match in the other cultivar/genotype.
Doing differential expression analysis for different conditions in the same cultivar, no problem, I can do that with standard approaches. However, I'm not entirely sure how to do differential expression analyses across cultivars/genotype.
I know that I somehow need to produce a "combined" reference transcriptome, by one of two approaches:
(1) Simply throw all RNAseq reads from both cultivars into a new de novo assembly (which I'm doing now)
(2) Combine the two existing de novo assemblies into a new assembly using an OLC-based method like CAP3 or MIRA
To me, the main thing to avoid is assemblies containing "fake" transcripts that are half from one cultivar and half from another, and I can see approach (1) doing that a lot, because the de novo assembly breaks everything into kmers and you lose information about which full-length transcripts come from which cultivar/genotype. I am thinking that approach (2) is better for avoiding "fake" fusion transcripts since it starts from the point of long transcripts that are known to come from just one cultivar/genotype.
PS. Did I mention it's a horrible polyploid and there's no genome?
Does anyone have an opinion or similar experience?
Comment