I am about to use Trinity for de novo transcriptome assembly prior to differential expression analyses.
I have 11 individuals (5 control, 6 treated) with 3 tissue types = 33 samples with ~20 million ~80bp single-end reads each (after trimming and QC).... so that's about 660 million single end reads!
In order to reduce what is likely to be a LONG trinity run, would you suggest utilizing Trinity's normalization script or similar (e.g. khmer) prior to assembly?
Or should I just take a small subset of samples to make assembly?
I don't know how much individual genetic variability there is so I'm worried that using a subset for assembly will miss rarer transcripts.
Does anyone here have any experience with normalization? Are there any downsides to this method over using a subset of samples?
Any advice or experiences much appreciated!
I have 11 individuals (5 control, 6 treated) with 3 tissue types = 33 samples with ~20 million ~80bp single-end reads each (after trimming and QC).... so that's about 660 million single end reads!
In order to reduce what is likely to be a LONG trinity run, would you suggest utilizing Trinity's normalization script or similar (e.g. khmer) prior to assembly?
Or should I just take a small subset of samples to make assembly?
I don't know how much individual genetic variability there is so I'm worried that using a subset for assembly will miss rarer transcripts.
Does anyone here have any experience with normalization? Are there any downsides to this method over using a subset of samples?
Any advice or experiences much appreciated!
Comment