Hi All,
I am starting a possible thesis project involving sequencing some virus passaged through different mouse genotypes. It should be a great learning experience regardless of the outcome(which I can only assume will be success). Anywho, I am having a couple of problems. While I am trying to develop/optimize a wetlab protocol, I am trying to get familiar with Linux and the algorithms that will be applicable.
So, the project will be to call variants in passaged retroviral samples from mice. The wetlab work will entail separating virus RNA from host, as there are putative expressed endogenous retroviruses in the mouse strains we have used. The viral blood titers are very low with this specific infection, so I will be using a tissue homogenate. Any RNA prep will bring a large amount of host message too. My workflow right now will basically try to separate the sequences with specific priming in the RT reaction and the subsequent PCR. Right now I am at the RT=> PCR step and getting wacky results. Using my RT as a template for a PCR step gives no amplicon, while the -RT gives a strong band where I expect. This is reproducible over 4 replicates with 2 different forms of -RT(-primer and -reverse transcriptase). This is confusing the bejesus out of me. Any contamination that could be primed should be also in the +RT. I am at a loss for the moment. On to the other half...
While the wetlab protocol is being worked out, I am spending a bunch of time trying to setup a pipeline for the data that will hopefully be generated soon. This will be a daunting task for me. The only bioinformatics work i have done was delving into some metagenomic data to find a very conserved biosynthetic pathway. All the infrastructure was set up for me. The Cygwin i was using already had all the appropriate modules and the scripts were more or less plug and play. My situation now is a bit different. The infrastructure and pipelines are going to have to be set up by me. This is way harder than just looking at the velvet manual to find out how to change a couple of parameters on an assembly. That said, I am very excited for the chance to do learn a new skill set.
On to my bioinformatics problem...
When I get some server access, hopefully this afternoon, I want to be able to start to compile the programs I will need to analyze my data and give it some test runs on small datasets. Umm...this is where I am going to sound very silly. I dont really know what the pipeline will look like. As I understand it, the pipeline will entail:
alignment-sort-dedup-clean-indel realignment-variant call
I am trying to figure out the best packages for small templates (9kb) that are optimized for pooled data.
If anyone has suggestions, advice or hints I would love to hear them.
BTW this is a sweet forum. I would hate to have to pool all this information from google searches. Thanks everyone for being a part of it. I look forward to being able to contribute some day.
Thanks a bunch,
Earl
Wow, this post is entirely too long.
I am starting a possible thesis project involving sequencing some virus passaged through different mouse genotypes. It should be a great learning experience regardless of the outcome(which I can only assume will be success). Anywho, I am having a couple of problems. While I am trying to develop/optimize a wetlab protocol, I am trying to get familiar with Linux and the algorithms that will be applicable.
So, the project will be to call variants in passaged retroviral samples from mice. The wetlab work will entail separating virus RNA from host, as there are putative expressed endogenous retroviruses in the mouse strains we have used. The viral blood titers are very low with this specific infection, so I will be using a tissue homogenate. Any RNA prep will bring a large amount of host message too. My workflow right now will basically try to separate the sequences with specific priming in the RT reaction and the subsequent PCR. Right now I am at the RT=> PCR step and getting wacky results. Using my RT as a template for a PCR step gives no amplicon, while the -RT gives a strong band where I expect. This is reproducible over 4 replicates with 2 different forms of -RT(-primer and -reverse transcriptase). This is confusing the bejesus out of me. Any contamination that could be primed should be also in the +RT. I am at a loss for the moment. On to the other half...
While the wetlab protocol is being worked out, I am spending a bunch of time trying to setup a pipeline for the data that will hopefully be generated soon. This will be a daunting task for me. The only bioinformatics work i have done was delving into some metagenomic data to find a very conserved biosynthetic pathway. All the infrastructure was set up for me. The Cygwin i was using already had all the appropriate modules and the scripts were more or less plug and play. My situation now is a bit different. The infrastructure and pipelines are going to have to be set up by me. This is way harder than just looking at the velvet manual to find out how to change a couple of parameters on an assembly. That said, I am very excited for the chance to do learn a new skill set.
On to my bioinformatics problem...
When I get some server access, hopefully this afternoon, I want to be able to start to compile the programs I will need to analyze my data and give it some test runs on small datasets. Umm...this is where I am going to sound very silly. I dont really know what the pipeline will look like. As I understand it, the pipeline will entail:
alignment-sort-dedup-clean-indel realignment-variant call
I am trying to figure out the best packages for small templates (9kb) that are optimized for pooled data.
If anyone has suggestions, advice or hints I would love to hear them.
BTW this is a sweet forum. I would hate to have to pool all this information from google searches. Thanks everyone for being a part of it. I look forward to being able to contribute some day.
Thanks a bunch,
Earl
Wow, this post is entirely too long.
Comment