Seqanswers Leaderboard Ad

**swbarnes2** · 02-10-2012, 11:32 AM

The mpileup command you used generates a flat text pileup that is not suitable for import into bcftools, though it should not be quick. Try looking at that pileup file, see if it looks like it should.

To make a vcf, try something like

samtools mpileup -ugf ref.fa file.bam | bcftools view -bcvg - > file.bcf
bcftools view file.bcf > file.vcf

Personally, I always add a -B to the mpileup command, because I've occasionally found that SNPs got missed if I didn't, and I'd much rather deal with false positives than false negatives.

**fc35802** · 02-13-2012, 01:02 AM

To day I used:

samtools mpileup -d8000 -B -ugf /GenoStorage/Genomas/hg19/hg19RefGenome.fa sample02187A_align_sorted.bam | bcftools view -bvcg - > varSample02187A.bcf
[mpileup] 1 samples in 1 input files
[bcfview] 100000 sites processed.
[afs] 0:73270.205 1:12061.094 2:14668.700
[bcfview] 200000 sites processed.
[afs] 0:93673.767 1:2809.202 2:3517.031
[bcfview] 300000 sites processed.
[afs] 0:62908.369 1:16718.513 2:20373.117
[bcfview] 400000 sites processed.
[afs] 0:62494.168 1:16735.723 2:20770.109
[bcfview] 500000 sites processed.
[afs] 0:67447.351 1:14587.030 2:17965.618
[bcfview] 600000 sites processed.
[afs] 0:67444.849 1:14410.756 2:18144.395
[bcfview] 700000 sites processed.
[afs] 0:69741.610 1:13586.786 2:16671.604
[bcfview] 800000 sites processed.
[afs] 0:66746.668 1:14996.990 2:18256.342
[afs] 0:7849.588 1:1177.825 2:1518.587

Only took me 9 minutes. It's really strange

**fc35802** · 02-13-2012, 02:59 AM

I started almost from the beginning.
I also marked the duplicates.

My mpileup as 14GB when I use the command -f, and he is smaller when I use -ugf

The odd thing is when I use -ugf I see some strange symbols.

In both cases the mpileup goes on for your if I only use the mpileup command, but if I save the output in a file or do the bcftools part, it takes only a few minutes

**swbarnes2** · 02-13-2012, 09:34 AM

I don't see how marking duplicates matters. I don't think mpileup pays attention to that.

Using -ug makes a file that is suitable for import into bcftools for SNP calling. Without it, mpileup makes a human readable pileup. The pileup is going to have one line for every base in your reference, so if your reference is large, it'll be a huge file. But you can eyeball it to see how mpileup is interpreting your .bam

**fc35802** · 02-14-2012, 01:39 AM

This was my command:

samtools mpileup -P ILLUMINA -d8000 -B -ugf /GenoStorage/Genomas/hg19/hg19RefGenome.fa sample02187A_align_sorted.bam | bcftools view -bcvg - > varSample02187A.bcf
[mpileup] 1 samples in 1 input files
[bcfview] 100000 sites processed.
[afs] 0:73270.205 1:12061.094 2:14668.700
[bcfview] 200000 sites processed.
[afs] 0:93673.767 1:2809.202 2:3517.031
[bcfview] 300000 sites processed.
[afs] 0:62908.369 1:16718.513 2:20373.117
[bcfview] 400000 sites processed.
[afs] 0:62494.168 1:16735.723 2:20770.109
[bcfview] 500000 sites processed.
[afs] 0:67447.351 1:14587.030 2:17965.618
[bcfview] 600000 sites processed.
[afs] 0:67444.849 1:14410.756 2:18144.395
[bcfview] 700000 sites processed.
[afs] 0:69741.610 1:13586.786 2:16671.604
[bcfview] 800000 sites processed.
[afs] 0:66746.668 1:14996.990 2:18256.342
[afs] 0:7849.588 1:1177.825 2:1518.587

It always stops in here, it took 9 minutes to do this and generated a .bcf file empty.
I really can't understand what is wrong with the command

**fc35802** · 02-15-2012, 07:54 AM

I really can't understand why mpileup is so quick just when I save the output for a file or piping with bcftools.
While doing the following command I saw

samtools mpileup -P ILLUMINA -d8000 -A -B -C50 -uf /GenoStorage/Genomas/hg19/hg19RefGenome.fa sample02187A_align_sorted.bam
[mpileup] 1 samples in 1 input files
\ufffdBC\ufffd\ufffd9\ufffdBCF\ufffdchr1chr2chr3chr4chr5chr6chr7chr8chr9chr10chr11chr12chr13chr14chr15chr16chr17chr18chr19chr20chr21chr22chrXchrYchrMsample%##samtoolsVersion=0.1.18 (r982:295)

It's normal?
Another thing that a thought was if I should need my memory swap to be the double of my ram memory. This is true?

I know I'm being really annoying. I'm really sorry.

**swbarnes2** · 02-15-2012, 09:36 AM

For the third time, why don't you look at the human readable pileup output of mpileup?

**fc35802** · 02-15-2012, 09:45 AM

It looks like this

chr1 10018 c 1 ^!. D
chr1 10019 t 1 . F
chr1 10020 a 1 . F
chr1 10021 a 1 . F
chr1 10022 c 1 . H
chr1 10023 c 1 . H
chr1 10024 c 1 . H
chr1 10025 t 1 . H
chr1 10026 a 1 . H
chr1 10027 a 2 .^!. IF
chr1 10028 c 2 .. JD
chr1 10029 c 2 .. J+
chr1 10030 c 2 .. J2
chr1 10031 t 2 .. J=
chr1 10032 a 2 .. JC
chr1 10033 a 2 .. CD
chr1 10034 c 2 .. HC
chr1 10035 c 2 .. GC
chr1 10036 c 2 .. G@
chr1 10037 t 2 .. JF
chr1 10038 a 2 .. GH
chr1 10039 a 2 .. GG
chr1 10040 c 2 .. IH
chr1 10041 c 2 .. CI
chr1 10042 c 2 .. ED
chr1 10043 t 2 .. HG
chr1 10044 a 2 .. GC
chr1 10045 a 3 ..^2c I@#
chr1 10046 c 3 .., JB#
chr1 10047 c 3 .., IBC
chr1 10048 c 4 ..,^+. I98F

**swbarnes2** · 02-15-2012, 10:37 AM

The good news is at least mpileup has figure out that the reference fasta you gave it is the one used to make the .bam file. So it has the reference all squared away.

This part has a coverage of 2, and 1 in places, but in isn't this region telomeric junk in the human genome? Is the coverage higher in a more representaitve region of the genome?

What % of your reads aligned? How many reads aligned total? Maybe this run was junk, and that's why nothing is working.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

A doubt about mpileup

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News