I posted the following at another forum and have not yet had a response. Is this what mpileup and/or GATK are for? Or do those analyze aligned data for indels and SNPs, rather than look at the raw data?
====
I don't know if I'm using the proper terminology, but I have ab1 (sanger) sequencing chromatograms and I was wondering if there was any software out there that you were aware of to deconvolve two overlapped chimeric reads (e.g. overlaid indel-type mutations) and output two+ sequences as (fasta) output.
For example, if I have a read that is unambiguously ACTGGCGA but then I have 60% population where I have an A and 40% where that A is deleted, followed by GCGTGA, phred will likely give me ACTGGCGAAGCGTGA, but is there a way for a basecaller to give me ACTGGCGAGCGTGA as a secondary call? Or, if I have 50/50 and I know that one version is ACTGGCGAAGCGTGA, supplying that as a comparison file for subtraction, leaving me with a residual signal of either the full ACTGGCGAGCGTGA or of GCGTGAx?
Obviously I would have to be able to set a threshhold where I consider the result "noise" - either a fixed value of signal strength or e.g. 5% of the main peak strength in order to filter out illegitimate base calls of non-chimeric sequence.
As a separate, but related, issue, is there a way to get phred (or any other program) to "filter" noise spikes in chromatograms? For example, sometimes I see spikes in pyrimidine signal strength that is way out of proportion to legitimate regions of call (the peak height using an ab1 viewer like bioedit or consed - I'm not sure if linear or log scale - is well over double of any nearby base or even any other place in the file. These typically have "width" of about 5 base calls)
====
I don't know if I'm using the proper terminology, but I have ab1 (sanger) sequencing chromatograms and I was wondering if there was any software out there that you were aware of to deconvolve two overlapped chimeric reads (e.g. overlaid indel-type mutations) and output two+ sequences as (fasta) output.
For example, if I have a read that is unambiguously ACTGGCGA but then I have 60% population where I have an A and 40% where that A is deleted, followed by GCGTGA, phred will likely give me ACTGGCGAAGCGTGA, but is there a way for a basecaller to give me ACTGGCGAGCGTGA as a secondary call? Or, if I have 50/50 and I know that one version is ACTGGCGAAGCGTGA, supplying that as a comparison file for subtraction, leaving me with a residual signal of either the full ACTGGCGAGCGTGA or of GCGTGAx?
Obviously I would have to be able to set a threshhold where I consider the result "noise" - either a fixed value of signal strength or e.g. 5% of the main peak strength in order to filter out illegitimate base calls of non-chimeric sequence.
As a separate, but related, issue, is there a way to get phred (or any other program) to "filter" noise spikes in chromatograms? For example, sometimes I see spikes in pyrimidine signal strength that is way out of proportion to legitimate regions of call (the peak height using an ab1 viewer like bioedit or consed - I'm not sure if linear or log scale - is well over double of any nearby base or even any other place in the file. These typically have "width" of about 5 base calls)
Comment