Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract aligned reads from a BAM file above a certain threshold The Snow Bioinformatics 4 07-29-2013 03:02 AM
Extract reads from multiple regions from bam file meher Bioinformatics 7 12-14-2012 04:18 AM
Extract aligned sequence coordinates from SAM or BAM file pirates.genome Bioinformatics 5 08-20-2012 09:06 AM
Generate consensus sequence from BAM alignment Adam Witney General 2 09-14-2011 03:49 AM
Extract perfectly mapped reads from SAM/BAM file Graham Etherington Bioinformatics 2 07-21-2011 08:27 AM

Thread Tools
Old 03-11-2013, 08:13 AM   #1
Junior Member
Location: US

Join Date: Mar 2013
Posts: 9
Default How to extract a regional consensus sequence of reads from BAM file

Deal all,

How do I get a regional consensus sequence, say ch01: 77295611-77295961, from a BAM file by Samtools?

I extracted bam file for this region: ch01: 77295611-77295961 (region.bam) from all_aln.bam and used the recommanded command:

samtools mpileup -uf ref.fa region.bam | bcftools view -cg - | vcf2fq > cns.fq

However, the cns.fq seems contain sequence for the whole reference (took longer time), not just the region ch01: 77295611-77295961.

How to fix this?


jeffseq is offline   Reply With Quote
Old 03-13-2013, 04:30 AM   #2
Junior Member
Location: US

Join Date: Mar 2013
Posts: 9

anyone knows or do I need to write a script to parse it out?
jeffseq is offline   Reply With Quote
Old 03-13-2013, 09:38 AM   #3
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

Looking at the command line options for mpileup, why didn't you use -r there to specify a region?

Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

Input options:

-6 assume the quality is in the Illumina-1.3+ encoding
-A count anomalous read pairs
-B disable BAQ computation
-b FILE list of input BAM files [null]
-C INT parameter for adjusting mapQ; 0 to disable [0]
-d INT max per-BAM depth to avoid excessive memory usage [250]
-E extended BAQ for higher sensitivity but lower specificity
-f FILE faidx indexed reference sequence file [null]
-G FILE exclude read groups listed in FILE [null]
-l FILE list of positions (chr pos) or regions (BED) [null]
-M INT cap mapping quality at INT [60]
-r STR region in which pileup is generated [null]
-R ignore RG tags
-q INT skip alignments with mapQ smaller than INT [0]
-Q INT skip bases with baseQ/BAQ smaller than INT [13]
swbarnes2 is offline   Reply With Quote
Old 05-23-2013, 11:52 AM   #4
Junior Member
Location: US

Join Date: Mar 2013
Posts: 9

Thanks. Tried:

samtools mpileup -uf ref.fa region.bam -r ch01: 77295611-77295961 | bcftools view -cg - | vcf2fq > cns.fq

the consensus is indeed calculated for region 77295611-77295961 on ch01, however, the cns.fq contains sequence sprend over all ch01, in the form of NNNNN-consensus-NNNNN format. (Need to parse out by writing a script).

jeffseq is offline   Reply With Quote
Old 07-15-2013, 09:12 AM   #5
Location: Atlanta, GA

Join Date: Jun 2011
Posts: 12

I believe that your problem is the space between the "ch01:" and "77295611-77295961". Perhaps try it without the space.
kjlee is offline   Reply With Quote
Old 07-18-2013, 02:39 AM   #6
Senior Member
Location: UK

Join Date: Jul 2013
Posts: 131

which command is used to get the complete consensus from the bam file?
mmmm is offline   Reply With Quote
Old 10-28-2013, 01:43 AM   #7
Junior Member
Location: San Francisco

Join Date: Oct 2013
Posts: 1

Hi all,

I tried this and ran into the same problem. When I broke up the pipe, it looked like bcftools was giving the expected output, but vcfutils then tried to recreate the entire chromosome.

My solution was to make a small modification to vcfutils so that it does not add a "gap" at the beginning of the sequence.

Line 500 now reads:
if (($t[1] - $last_pos > 1) && ($last_pos > 0)) {

That seems to do it...
adamr is offline   Reply With Quote
Old 07-07-2014, 11:48 PM   #8
Junior Member
Location: RU

Join Date: Oct 2011
Posts: 8

to adamr:
Thanks a lot it works!
nazen is offline   Reply With Quote
Old 01-29-2015, 01:05 PM   #9
Location: Philadelphia

Join Date: Jan 2012
Posts: 58

Ive been trying to work with this pipe in order to generate short sequences.

samtools mpileup -uf ref.fa region.bam | bcftools view -cg - | vcf2fq > cns.fq
Ive even included adamr's changes to The above code works but only for a single region. If I try to give a bed file there are issues. The sequences are not named based on the BED format column.

Sample Bed:
1	1158620	1158631	"SDF4.1:1158631"
1	1650793	1650804	"CDK11A,CDK11B;CDK11B.1:1650797"
Also if just does not work for the multiple regions. I will get something like

At the very least I was hoping for a multiple sequence fq file. Does anyone have any insight?
bwubb is offline   Reply With Quote

bam, consensus, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:30 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO