Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • liu_xt005
    Member
    • Jun 2011
    • 24

    Pileup / extract information from BAM/SAM files

    I am aiming to extract the full information at each site: read depth, # ref reads, # variant reads (for all non-reference alleles), strand information, etc. An old thread started by nilmot13 suggested "genomeCoverageBed", but the output seems to contain only simple read depth.

    My problem started from an observation that SAMtools and GATK generate very different REF/ALT read depth at some site from the same BAM file, while BAMview and IGV both tend to support GATK counts at the site.
  • clk
    Junior Member
    • Nov 2010
    • 4

    #2
    Hi,
    looking for an answer to the same question I came across your post... Did you ever find an answer to this question? The discrepancies I see between IGV and samtools mpileup are huge in my data, which is RNAseq data...


    Thanks!

    Comment

    • liu_xt005
      Member
      • Jun 2011
      • 24

      #3
      Sorry I did not find a good solution.
      But my study showed that SAMtools tend to keep longer tails for indels than GATK and others. For example, SAMtools gives TAAAA:TAAA (REF:ALT), while GATK gives TA:T.

      Comment

      • clk
        Junior Member
        • Nov 2010
        • 4

        #4
        Thanks for responding. I finally found a solution to this, so I'll post it here in case is useful for others.

        I found out that samtools filters reads before including them in the pileup; it reads the flag field in the bam file and discards reads that
        a) are not paired
        b) not properly mapped
        c) mate is not mapped
        d) alignment is not primary
        e) reads fail quality control of vendor
        f) is marked as PCR duplicates.

        If the filters (a) and (c) are not desired, you can use the parameter -A.
        In addition, samtools performs realignment unless the parameter -B is used, and discards low quality reads unless -Q0 is used. Finally, it stops at a certain number of reads unless the -d parameter is invoqued.


        I really needed a good quantification of the reads at each position, so I needed to make sure I could trust the pileup (or vcf) files generated by samtools. So I wrote a small script that parses the bam file, reading the flag field, and removes specific reads from the alignment. In this way, I was finally able to produce an alignment that gave me the exact same counts with IGV and samtools pileup.

        It would be really great if all these little details were more clear in the documentation, but in the end, the filtering criteria used by samtools was adequate for my needs. Except for the "anomalous read pairs" parameter (-A), which not very appropriate for RNA-seq data.

        Hope that helps somebody!

        Comment

        • zyxue
          Junior Member
          • May 2014
          • 3

          #5
          Hi clk, where did you find the information? Did you analyze the source code for samtools mpileup? Thanks!

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          46 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          106 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...