SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract sequence from multi fasta file with PERL andreitudor Bioinformatics 27 07-07-2019 07:45 AM
introducing BAMseek, a large file viewer for BAM and SAM BAMseek Bioinformatics 11 07-23-2013 08:02 PM
Find all occurrences of a sequence in a fasta file dphansti Bioinformatics 3 12-06-2011 06:11 AM
To get the sequence from a given BAM file ardmore Bioinformatics 11 11-12-2011 10:22 AM
Bam and Sam don't like my fasta file mindlessbrain Bioinformatics 2 12-09-2010 10:47 PM

Reply
 
Thread Tools
Old 11-12-2011, 02:38 PM   #1
mez
Junior Member
 
Location: georgia

Join Date: Nov 2011
Posts: 5
Question FASTA sequence From large BAM file

I hope I would be able to get some help even though my question will sound very basic, I am new to sequencing formats.

I am working on a project that involves obtaining reference sequences from the .bam files in the 1000 genomes database, to perform disease studies for a group. I used samtools view to download the genomic region I am interested in, but the actual sequence column in the .bam files seem to have reads for both forward and reverse strands. I would like to know, how I could get the simple nucleotide sequence of the reference strand from .bam file ... some kind of consensus sequence I am guessing.
mez is offline   Reply With Quote
Old 11-13-2011, 12:01 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Are you trying to get the consensus sequence from a BAM file?

The reference sequence would be the same standard human genome used for all the 1000 genomes, and can probably be downloaded from their site as FASTA.
maubp is offline   Reply With Quote
Old 11-13-2011, 01:50 PM   #3
mez
Junior Member
 
Location: georgia

Join Date: Nov 2011
Posts: 5
Default

Yes, I want to get the consensus FASTA sequence from the BAM file for a given chromosome region(not necessarily the whole chromosome sequence).
mez is offline   Reply With Quote
Old 11-14-2011, 03:34 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Have you tried searching the forum (or Google) for the terms BAM consensus?

Look at the samtools mpileup command (which replaced the older samtools pileup command).
maubp is offline   Reply With Quote
Old 11-14-2011, 09:13 AM   #5
ardmore
Member
 
Location: USA

Join Date: Jun 2011
Posts: 51
Default

I have the same question. I used the command
Code:
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
However I got an error.
Code:
the 'Argument f isnt numeric in numeric
ardmore is offline   Reply With Quote
Old 11-14-2011, 01:12 PM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

You're not the only person to hit that,
http://biostar.stackexchange.com/que...ric-in-numeric
maubp is offline   Reply With Quote
Old 11-14-2011, 02:18 PM   #7
ardmore
Member
 
Location: USA

Join Date: Jun 2011
Posts: 51
Default

Well. I think that it is the bug of samtools manual.
I used the command
Code:
samtools pileup -cf ref.fa aln.bam | samtools.pl pileup2fq -D100 > cns.fastq
Everything is fine.
ardmore is offline   Reply With Quote
Old 11-14-2011, 05:35 PM   #8
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by ardmore View Post
Well. I think that it is the bug of samtools manual.
I used the command
Code:
samtools pileup -cf ref.fa aln.bam | samtools.pl pileup2fq -D100 > cns.fastq
Everything is fine.
Pileup is deprecated. Everyone uses mpileup. Don't expect a lot of help if you stay with pileup.

And you do realize that neither method will touch putative indels, right?
swbarnes2 is offline   Reply With Quote
Old 11-15-2011, 12:27 AM   #9
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by swbarnes2 View Post
And you do realize that neither method will touch putative indels, right?
What do you recommend for getting an indel aware consensus from a BAM file?
maubp is offline   Reply With Quote
Old 01-13-2013, 05:42 AM   #10
kriikku
Junior Member
 
Location: Estonia

Join Date: Jan 2013
Posts: 5
Default

See here for one way to get a .vcf file with SNPs and indels from the .bam file, or a consensus sequence:
http://samtools.sourceforge.net/mpileup.shtml

The consensus sequence generated by this method has the problem that it only applies the SNPs to the reference sequence, but not the indels.
The .vcf file is better since it includes both SNPs and indels.

The .vcf file can be converted to a .fasta sequence using this tool:
https://www.broadinstitute.org/gatk/...Reference.html
However, note that this tool will only take into account indels of length up to 2 bases (as of January 2013). You may want to write your own script to insert all the indels (including the longer ones) from the .vcf into the .fasta.

Last edited by kriikku; 01-13-2013 at 05:46 AM.
kriikku is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO