Hi,
I am developing a new c++ based open-source tool (peak-tool) to annotate peaks from a bed file using gencode annotation database.
Once the gencode database is loaded and processed, each bed line lookup takes a constant time because of an indexing data structure.
Here are example annotation:
Bed:
chr14 57735721 57735722 MACS_peak_13241 1159.23
Annotated:
chr14 57735721 MACS_peak_13241 1159.23 PROMOTER AP5M1 + AP5M1-001 protein_coding 57735627 94
chr14 57735721 MACS_peak_13241 1159.23 PROMOTER EXOC5 - EXOC5-001 protein_coding 57735726 5
Bed:
chr1 12556659 12556660 MACS_peak_330 1733.05
Annotated:
chr1 12556659 MACS_peak_330 1733.05 PROMOTER VPS13D + VPS13D-012 retained_intron 12557280 -621
Bed:
chr1 1778750 1778751 MACS_peak_51 102.12
Annotated:
chr1 1778750 MACS_peak_51 102.12 INTRON GNB1 - GNB1-001 protein_coding 1822495 43745
Can you please test this tool and provide feedback for further improvements?
Here's the link to github
https://github.com/goxed/peak-tool
Right now the tool can annotate only from Bed files (human / hg19) using annotations from Gencode database (included in the git repo)
The tool needs 16GB RAM on MAC OS X (10.9.x or greater) systems and >=20GB RAM on Linux systems (16GB if you use ZRAM memory compression or a very fast SSD swap)
I am developing a new c++ based open-source tool (peak-tool) to annotate peaks from a bed file using gencode annotation database.
Once the gencode database is loaded and processed, each bed line lookup takes a constant time because of an indexing data structure.
Here are example annotation:
Bed:
chr14 57735721 57735722 MACS_peak_13241 1159.23
Annotated:
chr14 57735721 MACS_peak_13241 1159.23 PROMOTER AP5M1 + AP5M1-001 protein_coding 57735627 94
chr14 57735721 MACS_peak_13241 1159.23 PROMOTER EXOC5 - EXOC5-001 protein_coding 57735726 5
Bed:
chr1 12556659 12556660 MACS_peak_330 1733.05
Annotated:
chr1 12556659 MACS_peak_330 1733.05 PROMOTER VPS13D + VPS13D-012 retained_intron 12557280 -621
Bed:
chr1 1778750 1778751 MACS_peak_51 102.12
Annotated:
chr1 1778750 MACS_peak_51 102.12 INTRON GNB1 - GNB1-001 protein_coding 1822495 43745
Can you please test this tool and provide feedback for further improvements?
Here's the link to github
https://github.com/goxed/peak-tool
Right now the tool can annotate only from Bed files (human / hg19) using annotations from Gencode database (included in the git repo)
The tool needs 16GB RAM on MAC OS X (10.9.x or greater) systems and >=20GB RAM on Linux systems (16GB if you use ZRAM memory compression or a very fast SSD swap)
Comment