I recently started to do a bit of RNA-seq variant calling (using the GATK Best Practices pipeline), but I'm wondering as to how I should handle replicate data. If I have three replicates for a sample, for example, should I somehow merge that data to get better quality variant calls, and if so, at what level should the merging be done?
I can imagine merging the FASTQ files before alignment, for example, but I'm not sure if that would adversely affect the variant calling step in the end. Or should merging be done after alignment, on the resulting BAM files? I read that for aligners such as BWA the options are (more or less) equivalent, but seeing as the RNA-seq Best Practice workflow using STAR... Or at some other level?
I imagine that merging the data at some level would increase the statistical power of the variant calls because of the added depth of additional data from several replicates, but maybe that's wrong? Is merging of data for variant calling using RNA-seq data something that could or should be done, and if so, at what level?
I can imagine merging the FASTQ files before alignment, for example, but I'm not sure if that would adversely affect the variant calling step in the end. Or should merging be done after alignment, on the resulting BAM files? I read that for aligners such as BWA the options are (more or less) equivalent, but seeing as the RNA-seq Best Practice workflow using STAR... Or at some other level?
I imagine that merging the data at some level would increase the statistical power of the variant calls because of the added depth of additional data from several replicates, but maybe that's wrong? Is merging of data for variant calling using RNA-seq data something that could or should be done, and if so, at what level?
Comment