Hey all,
Basically I want to be able to for any given base, determine how many uniqe reads cover it, how many reads appearing in duplicate cover it, how many reads appearing in triplicate cover it, etc.
IE, for a base at say chr1, position 1000, if there are 10 100bp reads covering it starting at:
1. chr1:920
2. chr1:935
3. chr1:950
4. chr1:950
5. chr1:950
6. chr1:978
7. chr1:980
8. chr1:989
9. chr1:989
10. chr1:996
I want to know that the base 1000 is covered by reads starting at 5 sites uniquely (1,2,6,7,10 above), 1 site in duplicate (8,9 above), and 1 site in triplicate (3,4,5 above). I only want to consider one strand at a time. Thus, the output could be like:
Chr,Base,Start_sites_covering_1x,Start_sites_covering_2x,Start_sites_covering_3x,etc
1,1000,5,1,1
Does anybody have any insight on an efficient way to do this? Are there any tools designed to do this, and if not, any thoughts on the easiest tools to use to write a program to pull this off?
I would extremely appreciate any help.
Basically I want to be able to for any given base, determine how many uniqe reads cover it, how many reads appearing in duplicate cover it, how many reads appearing in triplicate cover it, etc.
IE, for a base at say chr1, position 1000, if there are 10 100bp reads covering it starting at:
1. chr1:920
2. chr1:935
3. chr1:950
4. chr1:950
5. chr1:950
6. chr1:978
7. chr1:980
8. chr1:989
9. chr1:989
10. chr1:996
I want to know that the base 1000 is covered by reads starting at 5 sites uniquely (1,2,6,7,10 above), 1 site in duplicate (8,9 above), and 1 site in triplicate (3,4,5 above). I only want to consider one strand at a time. Thus, the output could be like:
Chr,Base,Start_sites_covering_1x,Start_sites_covering_2x,Start_sites_covering_3x,etc
1,1000,5,1,1
Does anybody have any insight on an efficient way to do this? Are there any tools designed to do this, and if not, any thoughts on the easiest tools to use to write a program to pull this off?
I would extremely appreciate any help.
Comment