Hey there, SEQanswers people! I have just started my PhD after a year of "project work", and it seems that I will be doing a lot of bioinformatics - especially RNA-seq. Having done a search of the SEQanswers forum and finding some interesting things to read (this, among other things), I still feel that I have some questions that I'd like some opinions on, so here goes...
My background is a Master's in biotechnology with a fair share of math, basic programming (Matlab) and some slightly more advanced programming from a bioinformatics course (Python). Over the least year I've continued working with the Master thesis project, which was absolute protein quantification with mass spectrometry, and I made a fair amount of small and medium-sized Python scripts for handling and analysing the resulting data. I've been working in OSX (as that was the computer I was given), and feel more or less comfortable in the shell. Having showed an aptitude in programming, I had the opportunity to start a PhD with RNA-seq as the main focus, and still keeping the MS-work as a little side-project.
Now, I've been at it for about two months; trying to read up on RNA-seq, alignment tools (Tophat, ...), differential expression (DESeq, ...), splice variant analysis, and quite a bit of everything, really. I've come to a place where I feel that I can admit that I know very little, and that I have an opportunity to really learn things the right way, from the beginning. My PI knows nothing about coding and related stuff, so I was appointed a bioinformatics post-doc at a nearby department as a "trainer" (or whatever you want to call it). He's working, it seems, almost exclusively with RNA-seq, and we get along just fine, and he's very good at answering my questions and helping how he can. So far, all my DE-analyses have been done "manually" (i.e. without a script pipeline of some sort) with the various modules (Tophat -> featureCounts -> DESeq) in the OSX terminal, and I've just gotten to the part where I feel I should script my own pipeline.
Having said all this (thanks if you read it!), I still feel that I'm lacking A LOT of knowledge. Not strange, since I'm new to the field, but that's why I'm here! So, some questions:
1) Other than my work in RNA-seq, is there something(s) I should be learning in terms of bioinformatics or the field in general? I feel that more statistics would be a good bet, but are there other things?
2) Me and my PI are currently looking for courses in either statistics, bioinformatics or both, but it seems the courses for the fall haven't been posted yet in most places. Do you have any good courses to recommend?
3) Good programming practices... What are they, where can I read about them? I'm currently scripting in the PyCharm IDE, and I like it quite a bit, with the Anaconda Python distribution. I've read some stuff about naming variables and the like, but what about scripting in general; for example, if I make a pipeline as above, should I just do a script that I execute in PyCharm (like I've done so far), or do I make the scripts somehow executable and run them from the shell?
4) I still read a fair amount that bioinformaticians work in Linux - I have Linux Mint on my home laptop (which barely sees any use at all, these days), so I have some minor experience with it (and the way it works, from OSX), but is it something you really need? Could I continue working in OSX, or should I get a dual-boot install of some Linux distribution (if so, which one)?
5) I see a lot of people blogging about their bioinformatics work, and uploading their code. I read somewhere that this is a good way to get your code "out there", and for practice; is this something I should start doing?
6) Knowing more or less self-taught Python, should I learn R? To what extent? What about SQL?
7) Other than SEQanswers, are there other communities out there for bioinformatics, maybe even ones suited to a beginner as myself?
Thanks a lot for reading, any help, tips or opinions are greatly appreciated!
Erik
My background is a Master's in biotechnology with a fair share of math, basic programming (Matlab) and some slightly more advanced programming from a bioinformatics course (Python). Over the least year I've continued working with the Master thesis project, which was absolute protein quantification with mass spectrometry, and I made a fair amount of small and medium-sized Python scripts for handling and analysing the resulting data. I've been working in OSX (as that was the computer I was given), and feel more or less comfortable in the shell. Having showed an aptitude in programming, I had the opportunity to start a PhD with RNA-seq as the main focus, and still keeping the MS-work as a little side-project.
Now, I've been at it for about two months; trying to read up on RNA-seq, alignment tools (Tophat, ...), differential expression (DESeq, ...), splice variant analysis, and quite a bit of everything, really. I've come to a place where I feel that I can admit that I know very little, and that I have an opportunity to really learn things the right way, from the beginning. My PI knows nothing about coding and related stuff, so I was appointed a bioinformatics post-doc at a nearby department as a "trainer" (or whatever you want to call it). He's working, it seems, almost exclusively with RNA-seq, and we get along just fine, and he's very good at answering my questions and helping how he can. So far, all my DE-analyses have been done "manually" (i.e. without a script pipeline of some sort) with the various modules (Tophat -> featureCounts -> DESeq) in the OSX terminal, and I've just gotten to the part where I feel I should script my own pipeline.
Having said all this (thanks if you read it!), I still feel that I'm lacking A LOT of knowledge. Not strange, since I'm new to the field, but that's why I'm here! So, some questions:
1) Other than my work in RNA-seq, is there something(s) I should be learning in terms of bioinformatics or the field in general? I feel that more statistics would be a good bet, but are there other things?
2) Me and my PI are currently looking for courses in either statistics, bioinformatics or both, but it seems the courses for the fall haven't been posted yet in most places. Do you have any good courses to recommend?
3) Good programming practices... What are they, where can I read about them? I'm currently scripting in the PyCharm IDE, and I like it quite a bit, with the Anaconda Python distribution. I've read some stuff about naming variables and the like, but what about scripting in general; for example, if I make a pipeline as above, should I just do a script that I execute in PyCharm (like I've done so far), or do I make the scripts somehow executable and run them from the shell?
4) I still read a fair amount that bioinformaticians work in Linux - I have Linux Mint on my home laptop (which barely sees any use at all, these days), so I have some minor experience with it (and the way it works, from OSX), but is it something you really need? Could I continue working in OSX, or should I get a dual-boot install of some Linux distribution (if so, which one)?
5) I see a lot of people blogging about their bioinformatics work, and uploading their code. I read somewhere that this is a good way to get your code "out there", and for practice; is this something I should start doing?
6) Knowing more or less self-taught Python, should I learn R? To what extent? What about SQL?
7) Other than SEQanswers, are there other communities out there for bioinformatics, maybe even ones suited to a beginner as myself?
Thanks a lot for reading, any help, tips or opinions are greatly appreciated!
Erik
Comment