Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jeffseq
    Junior Member
    • Mar 2013
    • 9

    How to extract a regional consensus sequence of reads from BAM file

    Deal all,

    How do I get a regional consensus sequence, say ch01: 77295611-77295961, from a BAM file by Samtools?

    I extracted bam file for this region: ch01: 77295611-77295961 (region.bam) from all_aln.bam and used the recommanded command:

    samtools mpileup -uf ref.fa region.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    However, the cns.fq seems contain sequence for the whole reference (took longer time), not just the region ch01: 77295611-77295961.

    How to fix this?

    Thanks.

    Jeff
  • jeffseq
    Junior Member
    • Mar 2013
    • 9

    #2
    anyone knows or do I need to write a script to parse it out?

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      Looking at the command line options for mpileup, why didn't you use -r there to specify a region?

      Usage: samtools mpileup [options] in1.bam [in2.bam [...]]

      Input options:

      -6 assume the quality is in the Illumina-1.3+ encoding
      -A count anomalous read pairs
      -B disable BAQ computation
      -b FILE list of input BAM files [null]
      -C INT parameter for adjusting mapQ; 0 to disable [0]
      -d INT max per-BAM depth to avoid excessive memory usage [250]
      -E extended BAQ for higher sensitivity but lower specificity
      -f FILE faidx indexed reference sequence file [null]
      -G FILE exclude read groups listed in FILE [null]
      -l FILE list of positions (chr pos) or regions (BED) [null]
      -M INT cap mapping quality at INT [60]
      -r STR region in which pileup is generated [null]
      -R ignore RG tags
      -q INT skip alignments with mapQ smaller than INT [0]
      -Q INT skip bases with baseQ/BAQ smaller than INT [13]

      Comment

      • jeffseq
        Junior Member
        • Mar 2013
        • 9

        #4
        Thanks. Tried:

        samtools mpileup -uf ref.fa region.bam -r ch01: 77295611-77295961 | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

        the consensus is indeed calculated for region 77295611-77295961 on ch01, however, the cns.fq contains sequence sprend over all ch01, in the form of NNNNN-consensus-NNNNN format. (Need to parse out by writing a script).

        Jeff

        Comment

        • kjlee
          Member
          • Jun 2011
          • 12

          #5
          I believe that your problem is the space between the "ch01:" and "77295611-77295961". Perhaps try it without the space.

          Comment

          • mmmm
            Senior Member
            • Jul 2013
            • 131

            #6
            which command is used to get the complete consensus from the bam file?

            Comment

            • adamr
              Junior Member
              • Oct 2013
              • 1

              #7
              Hi all,

              I tried this and ran into the same problem. When I broke up the pipe, it looked like bcftools was giving the expected output, but vcfutils then tried to recreate the entire chromosome.

              My solution was to make a small modification to vcfutils so that it does not add a "gap" at the beginning of the sequence.

              Line 500 now reads:
              if (($t[1] - $last_pos > 1) && ($last_pos > 0)) {

              That seems to do it...

              Comment

              • nazen
                Junior Member
                • Oct 2011
                • 8

                #8
                to adamr:
                Thanks a lot it works!

                Comment

                • bwubb
                  Member
                  • Jan 2012
                  • 61

                  #9
                  Ive been trying to work with this pipe in order to generate short sequences.

                  Code:
                  samtools mpileup -uf ref.fa region.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
                  Ive even included adamr's changes to vcfutils.pl. The above code works but only for a single region. If I try to give a bed file there are issues. The sequences are not named based on the BED format column.

                  Sample Bed:
                  Code:
                  1	1158620	1158631	"SDF4.1:1158631"
                  1	1650793	1650804	"CDK11A,CDK11B;CDK11B.1:1650797"
                  Also if just does not work for the multiple regions. I will get something like

                  Code:
                  @1
                  CATTTTGCnnnnnnnnnnnnnnnnnnnnnnnnnnnn....
                  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!....
                  ~~~~~~~~~~~~~~
                  At the very least I was hoping for a multiple sequence fq file. Does anyone have any insight?

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  30 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  42 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  64 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...