ckseq 02-06-2013 04:12 AM


I am new to the forum. I'm a postgrad student in South Africa, and I'm currently working on viral genomics. I'm trying to reconstruct quasispecies using ShoRAH from illumina short read data.

I'm looking for information about ShoRAH from users but can't seem to find anything anywhere.

We have an analysis running on short read data (illumina), it's been going for two days now. We just chose random numbers for the parameters and fear that my analysis might take weeks and weeks!:) Can anyone help me by telling me what the following parameters mean:

-i <input data file>
-j <sampling iterations>
-a <alpha>
-K <startvalue for number of clusters> not compat. with -k
-k <avg. number of reads in each startcluster> not compat. with -K
-t <history time>
-R <randomseed>
-h this help!

The analysis is currently running on -j = 100, -t = 1000 and -K = 10. We left out the rest. The input data includes 90 000 reads mapped to a 11kb reference genome.

What is the signifficance/function of -a (alpha), -K (start value for number of clusters), -k (number of reads per start cluster), -t (history time) and -R (randomseed)? If I knew what these do it might be easier to try and optimise!!

Note: complete bioinformatics novice but trying

