Hi,
Apologies if this has been discussed previously, if so could you point me to the thread.
I've been given 454 data amplicon data, from which I would like to find SNPS, not just the dominant SNP but also lesser called snps. The aim is to identify mixed populations based on snp type and frequency.
I have used gsMapper, aligning my sff file against a reference, outputting amongst other files the variant lists and bam alignments.
The allvariant file, seems to pull out a few snps, primarily it seems the most dominant. Closer inspection of the snp, for example the reference = C, the snp called (65% of reads) = T, but to a lesser extent (say 5 %) = A. However only the C --> T was called.
How can I pull that data out from the 454 output mapper output?
i.e. the most frequent variant and also lesser variants from the same point, such as from the example above: the C-->T and C-->A, the number of reads and position.
I have tried numerous snp callers, which can make use of bam files, such as varscan, bambino, atlas and freebayes.
- Freebayes produces by far the most comprehensive output, but i find it difficult to determine a cutoff, as well as identify the lesser variants? Also, strangely, when i view the bam alignment with the reference fasta the freebayes identified snp doesn't exist at the position stated-- another question?
So, in short any advice, pipelines, methodologies which anybody can suggest to help me identify snps (most frequent and less frequent from the same point) will be greatly appreciated!!!
Thanks in advance
Apologies if this has been discussed previously, if so could you point me to the thread.
I've been given 454 data amplicon data, from which I would like to find SNPS, not just the dominant SNP but also lesser called snps. The aim is to identify mixed populations based on snp type and frequency.
I have used gsMapper, aligning my sff file against a reference, outputting amongst other files the variant lists and bam alignments.
The allvariant file, seems to pull out a few snps, primarily it seems the most dominant. Closer inspection of the snp, for example the reference = C, the snp called (65% of reads) = T, but to a lesser extent (say 5 %) = A. However only the C --> T was called.
How can I pull that data out from the 454 output mapper output?
i.e. the most frequent variant and also lesser variants from the same point, such as from the example above: the C-->T and C-->A, the number of reads and position.
I have tried numerous snp callers, which can make use of bam files, such as varscan, bambino, atlas and freebayes.
- Freebayes produces by far the most comprehensive output, but i find it difficult to determine a cutoff, as well as identify the lesser variants? Also, strangely, when i view the bam alignment with the reference fasta the freebayes identified snp doesn't exist at the position stated-- another question?
So, in short any advice, pipelines, methodologies which anybody can suggest to help me identify snps (most frequent and less frequent from the same point) will be greatly appreciated!!!
Thanks in advance
Comment