![]() |
finding the base at a position in a bam
Hi All,
I have seen some other threads on this subject, sorry I don't have the links. But none of them seem to resolve the problem. I am trying to do something very simple but no tools for bam files seem to have the facility. I have tried samtools and GATK to no avail. All I need is to get the consensus read base at particular positions of the reference in a bam file. I am trying to check the state of SNPs from a particular bacterial SNP typing scheme in my NGS data. Not all of the positions will actually be a SNP, in which cae they have the same base as the reference, and I want that reported, so .vcf is no good to me. Greatly appreciate any suggestions. I suppose I can just use samtools to make the vcf and then if a position is not listed then assume it is the reference base. I do find samtools almost impenetrable.. would this line be appropriate? samtools mpileup -I -uBf ref.fasta 1_sorted.bam | bcftools view -bcg - > 1.bcf && bcftools view 1.bcf > 1.vcf but I don't understand the output vcf file because it seems to have a '.' for the reference base and then the proper A T C or G for the 'alt' base(s). Thanks for any help, S. |
I do think the mpileup command is what you want. I am not sure what your pipeline is making, but it sounds like the pileup format (. or , for the reference base with ATCG with alternates).
You can output a vcf with genotype compared to the reference for each position: samtools mpileup -gu -f ref.fasta sorted.bam | bcftools call -c - > all.vcf The genotype will be 0/0 for homozygous reference, 0/1 for heterozygous with the alternate listed allele, and 1/1 for homozygous alternate. bcftools call -cv would output the variants only. You might want to use bcftools consensus to create a modified fasta file with your variants: http://www.htslib.org/doc/bcftools.html#consensus |
Thanks SNPsaurus, I will try that. But I've now found that the published SNP locations I have been using have mistakes so have to go back a step..
S. |
samtools mpileup should also take as a commend line entry a position, or a bed file of positions, so you don't need to generate a line for every base in the genome.
|
All times are GMT -8. The time now is 04:41 PM. |
Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.