We are searching for noncanonical splice sites.
We have produced RNA-seq of a set of 60 human samples and produced aligs against human DNA (Ensembl) using TopHat2 (versions 2.0.4 to 2.0.10).
From these aligns, we collected splice sites cumulative from these files (resulting in about 10E6 = 1 million different sites). We annotated these sites using Ensembl (74) by searching for the best matching hit for the splice sites and reverse - complemented the sequences on the (-) strand (some of which obviously are assigned to the wrong strand...).
When looking at the first two intronic Nucleotides on the DONOR site we find the following distribution:
AT CT GC GT
15887 72358 44801 825650
We miss any noncanonical splice sites. Does TopHat filter them out? Or is there another explanation?
We have produced RNA-seq of a set of 60 human samples and produced aligs against human DNA (Ensembl) using TopHat2 (versions 2.0.4 to 2.0.10).
From these aligns, we collected splice sites cumulative from these files (resulting in about 10E6 = 1 million different sites). We annotated these sites using Ensembl (74) by searching for the best matching hit for the splice sites and reverse - complemented the sequences on the (-) strand (some of which obviously are assigned to the wrong strand...).
When looking at the first two intronic Nucleotides on the DONOR site we find the following distribution:
AT CT GC GT
15887 72358 44801 825650
We miss any noncanonical splice sites. Does TopHat filter them out? Or is there another explanation?
Comment