Having run casava and tophat on the same libraries we found that the tophat output has about 40% more mapped reads than did the casava output.
Looking across genes, we furthermore found that tophat aligned to many more genes. In addition, when we looked only at those genes that appeared in both alignments, we found that for most genes the two programs gave approximately the same number of counts, but there was a population of genes for which tophat gave much higher counts than casava.
We are trying to understand where the difference is coming from. Is there a difference in how they count multiply-aligned reads? Is there a difference in how the programs define an overlap?
Can someone who has dealt with this issue explain their findings?
Also, does anyone have a sample/ simulated dataset that they have used to explore the differences between algorithms?
thanks
Looking across genes, we furthermore found that tophat aligned to many more genes. In addition, when we looked only at those genes that appeared in both alignments, we found that for most genes the two programs gave approximately the same number of counts, but there was a population of genes for which tophat gave much higher counts than casava.
We are trying to understand where the difference is coming from. Is there a difference in how they count multiply-aligned reads? Is there a difference in how the programs define an overlap?
Can someone who has dealt with this issue explain their findings?
Also, does anyone have a sample/ simulated dataset that they have used to explore the differences between algorithms?
thanks