Unlike other people here I don't seem to have any problems actually running TopHat2 and the actually mapping results look pretty similar to the previous version of TopHat we were running.
However it is reporting a lot more events in the insertions/deletions bed file. I will admit upfront that I'm using a home-built Bowtie index between the two rather than the iGenomes ones, but I don't understand why that would be responsible for this kind of output in the insertions.bed:
1 567684 567684 A 13
1 567684 567684 AT 10
1 567684 567684 GTG 12
1 567684 567684 AATC 2
1 567684 567684 GATCG 8
1 567684 567684 GATCGG 2
1 567684 567684 CCTTACTACC 1
1 567684 567684 CCTTACTACCAG 1
1 567685 567685 T 1
1 567685 567685 TG 12
1 567685 567685 AGATG 1
1 567685 567685 TATGATAGTGA 6
1 567685 567685 TGAATATGATAGTGA 4
1 567686 567686 G 2
1 567686 567686 TC 2
1 567686 567686 GAG 1
1 567686 567686 TCAA 3
1 567686 567686 GATGA 2
1 567686 567686 ACCACACCTC 1
1 567687 567687 CAGA 2
1 567688 567688 A 3
1 567688 567688 GGTGAAACC 1
1 567688 567688 TAGATCGGAAGAG 1
1 567689 567689 AAA 1
1 567689 567689 CCCGC 1
1 567689 567689 ATACTG 1
1 567689 567689 GGAAGAG 3
1 567689 567689 GGAAGAGCGT 1
1 567690 567690 A 4
1 567690 567690 TAG 1
1 567690 567690 ATCAAACACA 1
And this situations is repeated throughout the file to the tune of about 4 million entries.
Any insight appreciated..
However it is reporting a lot more events in the insertions/deletions bed file. I will admit upfront that I'm using a home-built Bowtie index between the two rather than the iGenomes ones, but I don't understand why that would be responsible for this kind of output in the insertions.bed:
1 567684 567684 A 13
1 567684 567684 AT 10
1 567684 567684 GTG 12
1 567684 567684 AATC 2
1 567684 567684 GATCG 8
1 567684 567684 GATCGG 2
1 567684 567684 CCTTACTACC 1
1 567684 567684 CCTTACTACCAG 1
1 567685 567685 T 1
1 567685 567685 TG 12
1 567685 567685 AGATG 1
1 567685 567685 TATGATAGTGA 6
1 567685 567685 TGAATATGATAGTGA 4
1 567686 567686 G 2
1 567686 567686 TC 2
1 567686 567686 GAG 1
1 567686 567686 TCAA 3
1 567686 567686 GATGA 2
1 567686 567686 ACCACACCTC 1
1 567687 567687 CAGA 2
1 567688 567688 A 3
1 567688 567688 GGTGAAACC 1
1 567688 567688 TAGATCGGAAGAG 1
1 567689 567689 AAA 1
1 567689 567689 CCCGC 1
1 567689 567689 ATACTG 1
1 567689 567689 GGAAGAG 3
1 567689 567689 GGAAGAGCGT 1
1 567690 567690 A 4
1 567690 567690 TAG 1
1 567690 567690 ATCAAACACA 1
And this situations is repeated throughout the file to the tune of about 4 million entries.
Any insight appreciated..