Hello everyone,
I have around 150 RNA-Seq datasets created using the Lexogen SENSE mRNA-Seq Library Prep Kit for Illumina, as well as around 50 Trueseq Illumina samples. I aligned one sample using both HISAT2 and TopHat2 for both Hg19 and Hg38. The reason for that is that I wish to run all samples with Hg38 (since it is the newest reference) and HISAT2, but using Hg19 I get a far better alignment rate:
Lexogen Sample:
HISAT2 (Hg19): Paired Rate = 82.68%, Overall Rate = 90.31%
HISAT2 (Hg38): Paired Rate = 73.87%, Overall Rate = 81.01%
TopHat2 (Hg19): Paired Rate = 74.74%, Overall Rate = 87.4%
TopHat2 (Hg38): Paired Rate = 77.68%, Overall Rate = 87.7%
It is interesting to notice, that TopHat2 does not seem to be negatively affected by the change of reference. On the contrary it actually "likes" it.
This got even more strange, when I run a control sample created with the TrueSeq Illumina Kit and got the following results:
Trueseq Sample:
HISAT2 (Hg19): Paired Rate = 94.86%, Overall Rate = 97.22%
HISAT2 (Hg38): Paired Rate = 93.15%, Overall Rate = 95.47%
TopHat2 (Hg19): Paired Rate = 93.46%, Overall Rate = 96.50%
TopHat2 (Hg38): Paired Rate = 88.07%, Overall Rate = 97.00%
I can accept a difference in the alignment rate as "random" if its less than 5% but a drop from 90.31% to 81.01% I cannot accept. Has anyone tested HISAT2 on those different reference genomes and if so had similar results? I have been struggling with this for a long time so any help is greatly appreciated!!
Additional Info:
- The whole analysis was run on the Galaxy Platform
- The Lexogen Samples are strand-specific (second strand) and the Trueseq samples are unstranded
- I tested 4 additional samples (2 Lexogen + 2 Trueseq) gaining similar results
- The references were downloaded from the USCS directly
Thanks in advance!
Sbamo
I have around 150 RNA-Seq datasets created using the Lexogen SENSE mRNA-Seq Library Prep Kit for Illumina, as well as around 50 Trueseq Illumina samples. I aligned one sample using both HISAT2 and TopHat2 for both Hg19 and Hg38. The reason for that is that I wish to run all samples with Hg38 (since it is the newest reference) and HISAT2, but using Hg19 I get a far better alignment rate:
Lexogen Sample:
HISAT2 (Hg19): Paired Rate = 82.68%, Overall Rate = 90.31%
HISAT2 (Hg38): Paired Rate = 73.87%, Overall Rate = 81.01%
TopHat2 (Hg19): Paired Rate = 74.74%, Overall Rate = 87.4%
TopHat2 (Hg38): Paired Rate = 77.68%, Overall Rate = 87.7%
It is interesting to notice, that TopHat2 does not seem to be negatively affected by the change of reference. On the contrary it actually "likes" it.
This got even more strange, when I run a control sample created with the TrueSeq Illumina Kit and got the following results:
Trueseq Sample:
HISAT2 (Hg19): Paired Rate = 94.86%, Overall Rate = 97.22%
HISAT2 (Hg38): Paired Rate = 93.15%, Overall Rate = 95.47%
TopHat2 (Hg19): Paired Rate = 93.46%, Overall Rate = 96.50%
TopHat2 (Hg38): Paired Rate = 88.07%, Overall Rate = 97.00%
I can accept a difference in the alignment rate as "random" if its less than 5% but a drop from 90.31% to 81.01% I cannot accept. Has anyone tested HISAT2 on those different reference genomes and if so had similar results? I have been struggling with this for a long time so any help is greatly appreciated!!
Additional Info:
- The whole analysis was run on the Galaxy Platform
- The Lexogen Samples are strand-specific (second strand) and the Trueseq samples are unstranded
- I tested 4 additional samples (2 Lexogen + 2 Trueseq) gaining similar results
- The references were downloaded from the USCS directly
Thanks in advance!
Sbamo