large number of contigs
As @NGSfan pointed out, this is indeed RAM problem. STAR bins genome sequence in a way that each chromosome (contig) starts at a new bin, which creates an overhead of Nchromosomes*BinSize, where BinSize=2^genomeChrBinNbits. By default, --genomeChrBinNbits = 18,
so BinSize=2^18~256kb, so with 300,000 contigs you would need ~75GB of RAM - that's what likely killed your job.
I suggest that you try a much smaller value of --genomeChrBinNbits 12. This would require just a few GB of RAM and should allow you to generate the genome files. I have not tried STAR with more than 50,000 contigs, and I suspect there might be significant slowdown in the mapping speed when the number of contigs is too big.
Originally posted by JonB
View Post
so BinSize=2^18~256kb, so with 300,000 contigs you would need ~75GB of RAM - that's what likely killed your job.
I suggest that you try a much smaller value of --genomeChrBinNbits 12. This would require just a few GB of RAM and should allow you to generate the genome files. I have not tried STAR with more than 50,000 contigs, and I suspect there might be significant slowdown in the mapping speed when the number of contigs is too big.
. I am performing targeted resequencing of an RNA-seq library. Also known as amplicon sequencing. It is also semi-quantitative. My end goal being, I want to count the number of times a given sequence (e.g. for a gene target), or genetic variation (alleles from 1 base substitution up to 6 bases substituted). I want to be able to assign each sequence read the "name" of the consensus sequence it best matches.
Comment