Unconfigured Ad

**Richard Finney** · 07-21-2011, 10:27 AM

Show the actual command you use.

Check your bcftools flags. Do you want all locations? Or just variants?

**Hkins552** · 07-21-2011, 10:35 AM

$ samtools mpileup -uf /hg19.fasta input.sorted.rmdup.reordered.realigned.recalibrated.bam > input.variants.raw

I would like to call just variants, not all loci.

I do not pipe directly to bcftools like the manual, but later use this command on the file from above:

$ bcftools view -bvcg input.variants.raw > input.variants.raw.bcf

$ bcftools view input.variants.raw.bcf | vcfutils.pl varFilter -d 3 -D 1000 -G 20 > input.variants.flt.vcf

**oiiio** · 07-21-2011, 11:17 AM

Have you already taken a look at the lines of your smaller and larger file? Are there differences?

Maybe worth mentioning is using the unix 'comm' command, it will compare the two files for you.

**swbarnes2** · 07-21-2011, 11:20 AM

I do exome capture, and I use bedtools to filter my .bams against the capture probe .bed file.

That might help. It should winnow out some false aligning.

**Michael.James.Clark** · 07-21-2011, 11:37 AM

How many lines are there in input.variants.raw.bcf? In input.variants.flt.vcf?

Using the GATK pipeline, my exome-seq VCF files are on the order of 50-80,000 variants (lines) and a size of around 10-20Mb (depending on platform used for enrichment).

Originally posted by swbarnes2 View Post

I do exome capture, and I use bedtools to filter my .bams against the capture probe .bed file.

That might help. It should winnow out some false aligning.

If you do this, I advise you to use a modified capture probe .bed file where you've added 50 or 100 bases (or however much) to the end of each target region. You'll drop a lot of good data if you cut off right at the boundaries of the target intervals.

**swbarnes2** · 07-21-2011, 12:03 PM

Originally posted by Michael.James.Clark View Post

If you do this, I advise you to use a modified capture probe .bed file where you've added 50 or 100 bases (or however much) to the end of each target region. You'll drop a lot of good data if you cut off right at the boundaries of the target intervals.

I'm pretty sure BEDTools includes reads that hang off the edge of your target, so you can still call SNPs that are just off target. But yes, I usually align to padded targets, to be sure, though I generally count coverage against unpadded target.

We use agilent capturing, and from the non-random sizes of the targets, I think the targets must be padded as well, with respect to the exons.

**Michael.James.Clark** · 07-21-2011, 01:20 PM

Originally posted by swbarnes2 View Post

I'm pretty sure BEDTools includes reads that hang off the edge of your target, so you can still call SNPs that are just off target. But yes, I usually align to padded targets, to be sure, though I generally count coverage against unpadded target.

Yeah, I think that's fair for coverage to be sure, but for variant calling I typically run GATK with an intervalsList containing the targets +50bp on each end.

We use agilent capturing, and from the non-random sizes of the targets, I think the targets must be padded as well, with respect to the exons.

With respect to the exons that's typically true (with regards to RefSeq at least). The Agilent baits tend to extend outside the exons a bit. Still, you do end up pulling down significant excess in flanking regions as expected, so you gain even more. This is why I typically see a lot more variants than expected for exome alone in one of these experiments.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Samtools mpileup creates extra large file after local realignment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News