Hi all,
I am using Ion Pronton mRNA data and testing the optimal parameters of tophat to use. Because of the higher error rate in proton data I have been increasing the read-gap-length and read-mismatches parameters - as expected the % alignments increases when these are increased, however, the % with multiple alignments decreases!! (i.e. the actual number of reads with multiple alignments increases at a slower rate than the number of reads with unique alignments).
It would appear that to get the most out of this data read-gap-length and read-mismatches should be set extremely high, but I assume that might attract problems when the manuscript is reviewed. What would you say is the maximum that these values should reasonably be set at?
Is my interpretation of these values correct?
I have pasted the alignment stats below at varying read-gap-length values at read-mismatches=5 for a 100,000 read test sub-set.
(%ReadsWithMultiAlignments is ReadsWithMultiAlignments / ReadsAligned)
read-gap-length ReadsAligned %ReadsAligned ReadsWithMultiAlignments %ReadsWithMultiAlignments
1 71727 71.73% 9488 13.23%
2 74605 74.61% 9782 13.11%
3 75476 75.48% 9872 13.08%
4 75815 75.82% 9894 13.05%
5 76060 76.06% 9901 13.02%
6 76259 76.26% 9906 12.99%
7 76412 76.41% 9891 12.94%
8 76518 76.52% 9831 12.85%
9 76621 76.62% 9797 12.79%
10 76729 76.73% 9792 12.76%
11 76821 76.82% 9791 12.75%
12 76898 76.90% 9791 12.73%
13 76977 76.98% 9805 12.74%
14 77059 77.06% 9805 12.72%
15 77106 77.11% 9803 12.71%
16 77151 77.15% 9802 12.70%
17 77232 77.23% 9804 12.69%
18 77269 77.27% 9798 12.68%
19 77342 77.34% 9803 12.67%
20 77363 77.36% 9794 12.66%
Thank you.
-Liz
I am using Ion Pronton mRNA data and testing the optimal parameters of tophat to use. Because of the higher error rate in proton data I have been increasing the read-gap-length and read-mismatches parameters - as expected the % alignments increases when these are increased, however, the % with multiple alignments decreases!! (i.e. the actual number of reads with multiple alignments increases at a slower rate than the number of reads with unique alignments).
It would appear that to get the most out of this data read-gap-length and read-mismatches should be set extremely high, but I assume that might attract problems when the manuscript is reviewed. What would you say is the maximum that these values should reasonably be set at?
Is my interpretation of these values correct?
I have pasted the alignment stats below at varying read-gap-length values at read-mismatches=5 for a 100,000 read test sub-set.
(%ReadsWithMultiAlignments is ReadsWithMultiAlignments / ReadsAligned)
read-gap-length ReadsAligned %ReadsAligned ReadsWithMultiAlignments %ReadsWithMultiAlignments
1 71727 71.73% 9488 13.23%
2 74605 74.61% 9782 13.11%
3 75476 75.48% 9872 13.08%
4 75815 75.82% 9894 13.05%
5 76060 76.06% 9901 13.02%
6 76259 76.26% 9906 12.99%
7 76412 76.41% 9891 12.94%
8 76518 76.52% 9831 12.85%
9 76621 76.62% 9797 12.79%
10 76729 76.73% 9792 12.76%
11 76821 76.82% 9791 12.75%
12 76898 76.90% 9791 12.73%
13 76977 76.98% 9805 12.74%
14 77059 77.06% 9805 12.72%
15 77106 77.11% 9803 12.71%
16 77151 77.15% 9802 12.70%
17 77232 77.23% 9804 12.69%
18 77269 77.27% 9798 12.68%
19 77342 77.34% 9803 12.67%
20 77363 77.36% 9794 12.66%
Thank you.
-Liz