From what I understand, running TopHat with the GTF will assist with the mappings and make them a little cleaner, but shouldn't make a huge difference. I tried running TopHat with and without GTF on some mouse data.
Alignment summary with GTF:
Alignment summary without GTF:
The difference between alignment with and without GTF is huge. Is this normal? What would explain such a big discrepancy?
If this is normal, then the conclusion is that providing a GTF is important. By that logic, can TopHat really be trusted to detect novel transcripts if it has so much trouble working with transcripts not described by a GTF file?
Alignment summary with GTF:
Code:
Left reads: Input: 76922617 Mapped: 37116408 (48.3% of input) of these: 1976558 ( 5.3%) have multiple alignments (515208 have >20) Right reads: Input: 76672086 Mapped: 35646606 (46.5% of input) of these: 1395593 ( 3.9%) have multiple alignments (748805 have >20) 47.4% overall read alignment rate. Aligned pairs: 32699739 of these: 745858 ( 2.3%) have multiple alignments and: 353721 ( 1.1%) are discordant alignments 42.2% concordant pair alignment rate.
Code:
Left reads: Input: 76922617 Mapped: 4455809 ( 5.8% of input) of these: 212949 ( 4.8%) have multiple alignments (481436 have >20) Right reads: Input: 76672086 Mapped: 3360721 ( 4.4% of input) of these: 128254 ( 3.8%) have multiple alignments (732174 have >20) 5.1% overall read alignment rate. Aligned pairs: 817083 of these: 5615 ( 0.7%) have multiple alignments and: 720 ( 0.1%) are discordant alignments 1.1% concordant pair alignment rate.
If this is normal, then the conclusion is that providing a GTF is important. By that logic, can TopHat really be trusted to detect novel transcripts if it has so much trouble working with transcripts not described by a GTF file?
Comment