Hello everybody,
I am interested in the number of reads needed to perform DE analysis, detection of fusion genes and new transcripts. I search the forum but I have not found a clear answer. A good starting point are the ENCODE Standards.
For DE analysis, it is recommended to have 20-25M mappable to the genome or known transcriptome. Nevertheless, "genome or transcriptome" is pretty different, especially when we are interested in mRNA only and that the libraries are ribodepleted!
Here are some details about my experiment:
My data are 101bp paired-ends reads. The library were ribodepleted. To know which part of my reads aligns to the transcriptome, I performed filtering with Trimmomatic, then alignment with Tophat2, then I used HTSeqCount in intersetion_nonempty mode to count the reads in each gene. Finally, I summed the reads of all the genes.
Do you think it's a good way to evaluate the proportion of reads mappable to the transcriptome?
For the moment, I have between 1 and 20M of reads mappable to the transcriptome per sample. We will run all the samples one more time.
Here come my questions:
1) Is it good to aim at 25M of reads mappable to the transcriptome for DE analysis?
2) Since I have 2*101 bp reads, is it enough to have 1OOM filtered reads mappable to the genome for Gene fusion & novel transcripts detection?
Thank you for your help,
Jane
I am interested in the number of reads needed to perform DE analysis, detection of fusion genes and new transcripts. I search the forum but I have not found a clear answer. A good starting point are the ENCODE Standards.
- DE analysis
"Experiments whose purpose is to evaluate the similarity between the transcriptional profiles of two polyA+ samples may require only modest depths of sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are mappable to the genome or known transcriptome." - Gene fusion & novel transcripts detection
"Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms… a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended."
For DE analysis, it is recommended to have 20-25M mappable to the genome or known transcriptome. Nevertheless, "genome or transcriptome" is pretty different, especially when we are interested in mRNA only and that the libraries are ribodepleted!
Here are some details about my experiment:
My data are 101bp paired-ends reads. The library were ribodepleted. To know which part of my reads aligns to the transcriptome, I performed filtering with Trimmomatic, then alignment with Tophat2, then I used HTSeqCount in intersetion_nonempty mode to count the reads in each gene. Finally, I summed the reads of all the genes.
Do you think it's a good way to evaluate the proportion of reads mappable to the transcriptome?
For the moment, I have between 1 and 20M of reads mappable to the transcriptome per sample. We will run all the samples one more time.
Here come my questions:
1) Is it good to aim at 25M of reads mappable to the transcriptome for DE analysis?
2) Since I have 2*101 bp reads, is it enough to have 1OOM filtered reads mappable to the genome for Gene fusion & novel transcripts detection?
Thank you for your help,
Jane
Comment