Hi all,
I am wondering what people usually do for quality control of RNA-Seq data, especially how to check whether there are any outlier samples that should be removed, or certain genes that should be removed, for either cufflinks, RSEM, or HTSeq-count. Is it common to do quality control of the expression data for RNA-Seq? I don't see much mention of it online or in the manuals. One thing we were thinking of doing is removing the genes that have extremely high FPKM values in cufflinks, like to the 10th power or greater. We thought that these are probably really short transcripts that tend to have their expression values inflated (many were miRNAs) so we thought that it might be good to remove them in case they are messing up the normalization and the overall differential expression results. What are your thoughts on this? I am not an expert on statistics or how these programs work...
Thanks for your help!
Best,
Julia
I am wondering what people usually do for quality control of RNA-Seq data, especially how to check whether there are any outlier samples that should be removed, or certain genes that should be removed, for either cufflinks, RSEM, or HTSeq-count. Is it common to do quality control of the expression data for RNA-Seq? I don't see much mention of it online or in the manuals. One thing we were thinking of doing is removing the genes that have extremely high FPKM values in cufflinks, like to the 10th power or greater. We thought that these are probably really short transcripts that tend to have their expression values inflated (many were miRNAs) so we thought that it might be good to remove them in case they are messing up the normalization and the overall differential expression results. What are your thoughts on this? I am not an expert on statistics or how these programs work...
Thanks for your help!
Best,
Julia