SEQanswers (
-   Bioinformatics (
-   -   finding the base at a position in a bam (

susanklein 06-30-2015 06:10 PM

finding the base at a position in a bam
Hi All,

I have seen some other threads on this subject, sorry I don't have the links. But none of them seem to resolve the problem. I am trying to do something very simple but no tools for bam files seem to have the facility. I have tried samtools and GATK to no avail.

All I need is to get the consensus read base at particular positions of the reference in a bam file. I am trying to check the state of SNPs from a particular bacterial SNP typing scheme in my NGS data. Not all of the positions will actually be a SNP, in which cae they have the same base as the reference, and I want that reported, so .vcf is no good to me.

Greatly appreciate any suggestions. I suppose I can just use samtools to make the vcf and then if a position is not listed then assume it is the reference base. I do find samtools almost impenetrable.. would this line be appropriate?

samtools mpileup -I -uBf ref.fasta 1_sorted.bam | bcftools view -bcg - > 1.bcf && bcftools view 1.bcf > 1.vcf

but I don't understand the output vcf file because it seems to have a '.' for the reference base and then the proper A T C or G for the 'alt' base(s).

Thanks for any help,


SNPsaurus 06-30-2015 10:33 PM

I do think the mpileup command is what you want. I am not sure what your pipeline is making, but it sounds like the pileup format (. or , for the reference base with ATCG with alternates).
You can output a vcf with genotype compared to the reference for each position:
samtools mpileup -gu -f ref.fasta sorted.bam | bcftools call -c - > all.vcf
The genotype will be 0/0 for homozygous reference, 0/1 for heterozygous with the alternate listed allele, and 1/1 for homozygous alternate.

bcftools call -cv would output the variants only.

You might want to use bcftools consensus to create a modified fasta file with your variants:

susanklein 06-30-2015 10:39 PM

Thanks SNPsaurus, I will try that. But I've now found that the published SNP locations I have been using have mistakes so have to go back a step..


swbarnes2 07-01-2015 11:54 AM

samtools mpileup should also take as a commend line entry a position, or a bed file of positions, so you don't need to generate a line for every base in the genome.

All times are GMT -8. The time now is 04:41 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.