I have some pre-computed bam files which I need to run tophat2 fusion search. Can anyone guide me as to how to use these RNA metrics to determine tophat2 parameters e.g. mate_inner_dist, --mate_std_dev, --fusion-anchor-length, for fusion search and what other metric I should obtain in general for QC of RNASeq data?
1. From an earlier post, I calculated using the same formula:
samtools view -F 0x4 Sample1_GRCh37-lite_rnaseq.bam | awk '{if ($9 >0) {sum+=$9;sumsq+=$9*$9;N+=1}} END {print "mean = " sum/N " SD=" sqrt(sumsq/N - (sum/N)**2)}'
mean = 26950.7 SD=1.55517e+06 (these numbers do not make sense to me). Aren't these too big?
2. Then I ran samtools and picard by first converting to fastq (paried end data), then running bwa on 1.fastq and 2.fastq and then samtools view,sort,index and finally ran the following using picard tools to get some metrics.
CollectMultipleMetrics.jar and I got: Mean_Read_Length=75 for all pairs; Total_reads_pf_reads=85000421, MEAN_QUALITY=36.08
MarkDuplicates: Percent_duplicated=30.7%
Thanks.
1. From an earlier post, I calculated using the same formula:
samtools view -F 0x4 Sample1_GRCh37-lite_rnaseq.bam | awk '{if ($9 >0) {sum+=$9;sumsq+=$9*$9;N+=1}} END {print "mean = " sum/N " SD=" sqrt(sumsq/N - (sum/N)**2)}'
mean = 26950.7 SD=1.55517e+06 (these numbers do not make sense to me). Aren't these too big?
2. Then I ran samtools and picard by first converting to fastq (paried end data), then running bwa on 1.fastq and 2.fastq and then samtools view,sort,index and finally ran the following using picard tools to get some metrics.
CollectMultipleMetrics.jar and I got: Mean_Read_Length=75 for all pairs; Total_reads_pf_reads=85000421, MEAN_QUALITY=36.08
MarkDuplicates: Percent_duplicated=30.7%
Thanks.