Unconfigured Ad

**clk** · 09-28-2012, 06:45 AM

Hi,
looking for an answer to the same question I came across your post... Did you ever find an answer to this question? The discrepancies I see between IGV and samtools mpileup are huge in my data, which is RNAseq data...

Thanks!

**liu_xt005** · 10-01-2012, 11:56 AM

Sorry I did not find a good solution.
But my study showed that SAMtools tend to keep longer tails for indels than GATK and others. For example, SAMtools gives TAAAA:TAAA (REF:ALT), while GATK gives TA:T.

**clk** · 10-01-2012, 12:07 PM

Thanks for responding. I finally found a solution to this, so I'll post it here in case is useful for others.

I found out that samtools filters reads before including them in the pileup; it reads the flag field in the bam file and discards reads that
a) are not paired
b) not properly mapped
c) mate is not mapped
d) alignment is not primary
e) reads fail quality control of vendor
f) is marked as PCR duplicates.

If the filters (a) and (c) are not desired, you can use the parameter -A.
In addition, samtools performs realignment unless the parameter -B is used, and discards low quality reads unless -Q0 is used. Finally, it stops at a certain number of reads unless the -d parameter is invoqued.

I really needed a good quantification of the reads at each position, so I needed to make sure I could trust the pileup (or vcf) files generated by samtools. So I wrote a small script that parses the bam file, reading the flag field, and removes specific reads from the alignment. In this way, I was finally able to produce an alignment that gave me the exact same counts with IGV and samtools pileup.

It would be really great if all these little details were more clear in the documentation, but in the end, the filtering criteria used by samtools was adequate for my needs. Except for the "anomalous read pairs" parameter (-A), which not very appropriate for RNA-seq data.

Hope that helps somebody!

**zyxue** · 01-19-2015, 10:14 AM

Hi clk, where did you find the information? Did you analyze the source code for samtools mpileup? Thanks!

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 106 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Pileup / extract information from BAM/SAM files

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News