Hi All,
I am in process of building my own short read aligner for a lab working with cancer-genome sequencing. And have following questions:
1. As part for a our problem - we have with us read length ranging from 18-22 bp.
I am aware of several aligners currently available, and I was wondering if they were any issues at all, working with read length of above range. From all the research papers of aligners that I have gone through; usually the range has varied from 30 bp and upwards.
It would be really helpful if other members , who might have worked on similar read length, could advice me.
2. Also, are there any species-specific statistical heuristics that are known when employing an approach for short read alignment. To be more specific for example lets say for non-mammalian sequence source , I would like to set different set of parameters when doing a short read alignment as compared to mammalian source.
3. As I mentioned, I am in process of developing my own aligner, any piece of advise or suggestions would be very valuable from members who have embarked on the same . I am still in brain-storming phase, and trying the scope my problem range that this short read aligner would address. For now formally this aligner should be able to:
a. align read's length ranging 18-22 bp to reference genome
b. ungapped alignment
c. Applies a species - specific statistical scoring heuristic (If it all it makes sense to use one in first place.)
I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.
thank you
regards
Rupinder
I am in process of building my own short read aligner for a lab working with cancer-genome sequencing. And have following questions:
1. As part for a our problem - we have with us read length ranging from 18-22 bp.
I am aware of several aligners currently available, and I was wondering if they were any issues at all, working with read length of above range. From all the research papers of aligners that I have gone through; usually the range has varied from 30 bp and upwards.
It would be really helpful if other members , who might have worked on similar read length, could advice me.
2. Also, are there any species-specific statistical heuristics that are known when employing an approach for short read alignment. To be more specific for example lets say for non-mammalian sequence source , I would like to set different set of parameters when doing a short read alignment as compared to mammalian source.
3. As I mentioned, I am in process of developing my own aligner, any piece of advise or suggestions would be very valuable from members who have embarked on the same . I am still in brain-storming phase, and trying the scope my problem range that this short read aligner would address. For now formally this aligner should be able to:
a. align read's length ranging 18-22 bp to reference genome
b. ungapped alignment
c. Applies a species - specific statistical scoring heuristic (If it all it makes sense to use one in first place.)
I am looking forward to any advice or suggestion or perhaps even a general discussion on process of developing a short read aligner from scratch.
thank you
regards
Rupinder
Comment