Hi, I was wondering if anyone could help me.
I have a diploid plant genome of around 10Gb split up into around 500,000 contigs.
I have run into the problem with indexing using Bowtie2, as I reach the limit of characters. So unless I recompile Bowtie2-build in 64-bit it wont be possible.
I resorted to using STAR, but again, I have the same problem because of the genome size. STAR informs me that I should limit the memory to at least 65Gb of RAM and make sure it is available. I have total memory of 65Gb RAM (around 55Gb free), but of course I can't use all the systems resources and I'm not able to make any more available for use. I can not upgrade to more RAM, so I'm stuck with what I have.
The other alternative is splitting the genome up, perhaps in half, and indexing each half.
Is there a way to do this and merge the indexes? Or if this is possible will I simply hit the same problem again when I load the indexed genome into memory, I'll be short of RAM again.
I could also just align to each half, but this will result in biases and an increase in false positives with the alignments, which I would prefer to avoid. My aim is to identify novel isoforms, so this will just throw doubt on any novelty I find. Unfortunately, this may be my only option if I can not get any more RAM, merge indexes or tweak STAR or Bowtie2 parameters to work with the large genome.
Thanks in advance for any help. It is much appreciated.
I have a diploid plant genome of around 10Gb split up into around 500,000 contigs.
I have run into the problem with indexing using Bowtie2, as I reach the limit of characters. So unless I recompile Bowtie2-build in 64-bit it wont be possible.
I resorted to using STAR, but again, I have the same problem because of the genome size. STAR informs me that I should limit the memory to at least 65Gb of RAM and make sure it is available. I have total memory of 65Gb RAM (around 55Gb free), but of course I can't use all the systems resources and I'm not able to make any more available for use. I can not upgrade to more RAM, so I'm stuck with what I have.
The other alternative is splitting the genome up, perhaps in half, and indexing each half.
Is there a way to do this and merge the indexes? Or if this is possible will I simply hit the same problem again when I load the indexed genome into memory, I'll be short of RAM again.
I could also just align to each half, but this will result in biases and an increase in false positives with the alignments, which I would prefer to avoid. My aim is to identify novel isoforms, so this will just throw doubt on any novelty I find. Unfortunately, this may be my only option if I can not get any more RAM, merge indexes or tweak STAR or Bowtie2 parameters to work with the large genome.
Thanks in advance for any help. It is much appreciated.
Comment