Hey all,
A recent task I was working on involved looking up overlapping genes (UCSC/RefSeq) and DGV overlaps with many (2k+) regions of interest. The resulting file was simply what genes and the # of DGV regions added onto the region information in CSV format. Long story short, I accomplished this goal. But... how I was approaching it and how to possibly expand on it is what I'm curious about.
A script for this is obviously important, and was developed using a SQLite database containing the UCSC genes, refseq genes, and DGV. I developed the database through a few bed files I downloaded from their web site. The reason why I did this was that I didn't want to hit their site this many times, and this sounded like "fun". I didn't setup a MySQL instance just because it felt really heavyweight for what I wanted to accomplish here.
I got to thinking that this approach may be helpful for others, but wanted to ask a few questions about it.
1) Was there a better approach to do the querying? Such as, was making a SQLITE setup and a mini-library totally unnecessary?
2) If I wanted to pull down a bed file down from a script, is there a better way than using something like Mechanize? Lets say I wanted the UCSC genes, bed file, to be automatically downloaded using a script...is there a place I can pull it through a script, without using mechanize?
I'm contemplating continuing this little project I developed, if it's useful and the data is easier to get to than downloading everything manually.
Thoughts?
Thanks for any help people can give.
-David
A recent task I was working on involved looking up overlapping genes (UCSC/RefSeq) and DGV overlaps with many (2k+) regions of interest. The resulting file was simply what genes and the # of DGV regions added onto the region information in CSV format. Long story short, I accomplished this goal. But... how I was approaching it and how to possibly expand on it is what I'm curious about.
A script for this is obviously important, and was developed using a SQLite database containing the UCSC genes, refseq genes, and DGV. I developed the database through a few bed files I downloaded from their web site. The reason why I did this was that I didn't want to hit their site this many times, and this sounded like "fun". I didn't setup a MySQL instance just because it felt really heavyweight for what I wanted to accomplish here.
I got to thinking that this approach may be helpful for others, but wanted to ask a few questions about it.
1) Was there a better approach to do the querying? Such as, was making a SQLITE setup and a mini-library totally unnecessary?
2) If I wanted to pull down a bed file down from a script, is there a better way than using something like Mechanize? Lets say I wanted the UCSC genes, bed file, to be automatically downloaded using a script...is there a place I can pull it through a script, without using mechanize?
I'm contemplating continuing this little project I developed, if it's useful and the data is easier to get to than downloading everything manually.
Thoughts?
Thanks for any help people can give.
-David