View Single Post
Old 01-09-2016, 08:02 AM   #1
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Question Amino Acid Sequence from Exome Data?

Hi Everyone,

I working on a project that requires an amino acid sequence of exome sequence data. My first approach was to use samtool's mpileup command to get a consensus sequence from the exome sequencing data (bam file) followed by bcftools. Here are the commands that I used:

Code:
samtools mpileup -g -f [reference.fa] -r 11:5225466-5227071 [sorted .bam file] > [intermediate.bcf]

bcftools view [intermediate.bcf] > output.txt
However, I checked the sequence that I got from this consensus and it doesn't match any of the sequence from the input region that I used in mpileup. Moreover, I found that the sequence has an immediate stop codon after four amino acids, which is not correct. This is the HBB gene if that helps.

Also, I used the HG00096.mapped.illumina.mosaik.GBR.exome.20110411.bam for my exome sequence and the 1000 genomes project reference file for the fasta reference input.

Any suggestions on how I can extract the amino acid sequence of a gene from the exome sequence data?

Thanks in advance!
thickrick99 is offline   Reply With Quote