Hi,
could someone please tell me what the output of SOAP aligner looks like (e.g. one sample line, with explanation of what the various items are). The thing is, I want to feed it into SOAPsnp, but I can't use the aligner cause its executables run only on a Unix platform which I don't have (easy) access to (the snp program comes as source code, but I'm not proficient enough in C to sort through it and see what input it takes). I've tried extracting the alignments from my bowtie .map file and reshuffling them in various ways like <position> \t <sequence> \t <quality string> \n, but soapsnp always gives a "Bus error".
On a related matter, does anybody know of a simple, intuitive metric for the quality of a consensus sequence? For example, if a given position (in haploid DNA from a single source, so the real sequence is a single allele) is covered by 5 reads, and it reads A in 3 of them with Phred scores of 15, 25 and 30, and G in the other two with qualities 20 and 30 - what is the probability that the actual sequence is A? What that it's G?
Thanks!
could someone please tell me what the output of SOAP aligner looks like (e.g. one sample line, with explanation of what the various items are). The thing is, I want to feed it into SOAPsnp, but I can't use the aligner cause its executables run only on a Unix platform which I don't have (easy) access to (the snp program comes as source code, but I'm not proficient enough in C to sort through it and see what input it takes). I've tried extracting the alignments from my bowtie .map file and reshuffling them in various ways like <position> \t <sequence> \t <quality string> \n, but soapsnp always gives a "Bus error".
On a related matter, does anybody know of a simple, intuitive metric for the quality of a consensus sequence? For example, if a given position (in haploid DNA from a single source, so the real sequence is a single allele) is covered by 5 reads, and it reads A in 3 of them with Phred scores of 15, 25 and 30, and G in the other two with qualities 20 and 30 - what is the probability that the actual sequence is A? What that it's G?
Thanks!
Comment