![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
error when running samtools mpileup and bcftools | linp5 | Bioinformatics | 5 | 08-22-2014 05:41 AM |
samtools bcftools no FQ value | StefanP | Bioinformatics | 2 | 06-20-2014 03:51 PM |
BCFTOOLS 0.2 Filter problems | simono101 | Bioinformatics | 0 | 05-27-2014 06:13 AM |
question about bcftools merge | giror | Genomic Resequencing | 2 | 02-27-2014 02:38 AM |
bcftools error | Morpse | Bioinformatics | 8 | 10-05-2012 12:24 PM |
![]() |
|
Thread Tools |
![]() |
#1 | ||
Senior Member
Location: USA Join Date: Nov 2013
Posts: 182
|
![]()
outPut.sam is my sam file from bowtie2.
I am trying to generate a consensus fq/fasta file from it. I ran: Quote:
samtools-1.1/./samtools mpileup -uf sequence.fasta sorted_bam.bam.bam | bcftools/./bcftools view -cg - | bcftools/./vcfutils.pl vcf2fq > cns.fq I get an error, Quote:
It is something with bcftools. ![]() References: http://seqanswers.com/forums/showthread.php?t=11536 https://www.biostars.org/p/63429/ Please guide. Last edited by bio_informatics; 10-15-2014 at 07:59 AM. |
||
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
See this recent thread: http://seqanswers.com/forums/showthread.php?t=47210 If the following does not work then we can try splitting the three steps as indicated in this thread.
Can you try running your command like this: Code:
$ samtools-1.1/samtools mpileup -uf sequence.fasta sorted_bam.bam.bam | bcftools/bcftools view -cg - | bcftools/vcfutils.pl vcf2fq > cns.fq |
![]() |
![]() |
![]() |
#3 | ||
Senior Member
Location: USA Join Date: Nov 2013
Posts: 182
|
![]()
Hi GenoMax,
Thanks for your response. I ran the command you provided. It didn't work, then I proceeded with individual. # Code:
samtools-1.1/./samtools mpileup -uf sequence.fasta sorted_bam.bam.bam > bcf_file.bcf Below one threw error Code:
bcftools/./bcftools view -cg bcf_file.bcf > bcftoolsout When I removed -gc, it ran successfully. Strange? Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
It looks like the new bcftools (v.1.x) expects an integer to go with -c option. Try it with that.
Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: USA Join Date: Nov 2013
Posts: 182
|
![]() |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: USA Join Date: Nov 2013
Posts: 182
|
![]()
Hi Genomax,
So, I tried few things with my file, Code:
bcftools/./bcftools view --min-ac 0 -g "^miss" bcf_file.bcf -g "^miss" If I understand this correctly, I am excluding all the calls with missing genotypes. Kindly throw light more on the genotype flag. |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
I think I have the same problem here, and I may have a clue of where it comes from, but none about how to solve it...
I'm using Illumina paired end data, that I assembled using Soap De Novo which gave me a file of the scafolds in a fasta format. I then mapped this scaffold using bwa to a close reference, converted it to a .bam file using samtools (without forgetting to index the reference). I'm now trying to generate the consensus file with this command Code:
/usit/abel/u1/maxime/8_samtools/bin/samtools mpileup -uf chrysanthemum_indicum_chloroplast.fa 1_file_sorted.bam | /usit/abel/u1/maxime/8_samtools/bin/bcftools call -c - | /usit/abel/u1/maxime/8_samtools/bin/vcfutils.pl vcf2fq > ass_aln.fq Code:
[mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 Use of uninitialized value in length at /usit/abel/u1/maxib/8_samtools/bin/vcfutils.pl line 565, <> line 114. Use of uninitialized value in length at /usit/abel/u1/maxib/8_samtools/bin/vcfutils.pl line 565, <> line 114. Note that bcftools view seems to have been replaced by bcftools call according to their page From my understanding, I may have this problem because at some point, I lost the base quality encoding (in the de novo assembly). Please find attached under this line the output of the bcftools call -c - command https://dl.dropboxusercontent.com/u/...all_output.txt |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I'm still using samtools 0.1.19, can someone just post the portion of vcfutils that is causing the problem? It's a perl script, you can open it in any text editor. I feel like I've seen that error in older versions, but I can't pinpoint it without know where to look.
My first suggestion would be to reindex the reference with samtools faidx, make sure that happens without errors, as mpileup tends not to warn you if that index isn't right. |
![]() |
![]() |
![]() |
#9 | |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 521
|
![]()
Can you switch to bcftools consensus?
https://samtools.github.io/bcftools/...html#consensus Quote:
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
Hi,
I'd like to try to use consensus. Till the faidx command, everything is fine, however, I'm now stuck at the mpileup stage. Here is my code : Code:
/usit/abel/u1/maxib/8_samtools/bin/samtools view -bS 1_align.sam | /usit/abel/u1/maxib/8_samtools/bin/samtools sort - file_sorted /usit/abel/u1/maxib/8_samtools/bin/samtools index file_sorted.bam /usit/abel/u1/maxib/8_samtools/bin/samtools faidx chrysanthemum_indicum_chloroplast.fasta /usit/abel/u1/maxib/8_samtools/bin/samtools mpileup -v -f chrysanthemum_indicum_chloroplast.fasta file_sorted.bam -o file.vcf.gz /usit/abel/u1/maxib/8_samtools/bin/bcftools consensus -i chrysanthemum_indicum_chloroplast.fasta file.vcf.gz > out.fa Code:
[mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 ./sam.sh: line 4: 24661 Abandon /usit/abel/u1/maxib/8_samtools/bin/samtools mpileup -v -f chrysanthemum_indicum_chloroplast.fasta file_sorted.bam -o file.vcf.gz Last edited by MaximeOfOslo; 06-18-2015 at 06:01 AM. |
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
@Maxime: Are you using new versions of samtools and bcftools (v.1.x)? You can't mix and match old/new versions.
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
Thanks for your help @GenoMax, it's highly appreciated
![]() Yes, I'm using the latest versions of both samtools and bcftools Code:
-bash-4.1$ ls 8_samtools/install/ bcftools-1.2 htslib-1.2.1 samtools-1.2 Last edited by MaximeOfOslo; 06-18-2015 at 06:17 AM. |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
Was the NCBI sequence used for generating the index and doing alignments? Is that the only error you are seeing?
|
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
Indeed it is. The alignement was performed with bwa using the fasta file mentioned above.
Here is the complete output of my script : Code:
-bash-4.1$ ./sam.sh [bam_sort_core] merging from 20 files... [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 ./sam.sh: line 4: 4949 Abandon /usit/abel/u1/maxib/8_samtools/bin/samtools mpileup -v -f chrysanthemum_indicum_chloroplast.fasta file_sorted.bam -o file.vcf.gz About: Create consensus sequence by applying VCF variants to a reference fasta file. Usage: bcftools consensus [OPTIONS] <file.vcf> Options: -f, --fasta-ref <file> reference sequence in fasta format -H, --haplotype <1|2> apply variants for the given haplotype -i, --iupac-codes output variants in the form of IUPAC ambiguity codes -m, --mask <file> replace regions with N -o, --output <file> write output to a file [standard output] -c, --chain <file> write a chain file for liftover -s, --sample <name> apply variants of the given sample Examples: # Get the consensus for one region. The fasta header lines are then expected # in the form ">chr:from-to". samtools faidx ref.fa 8:11870-11890 | bcftools consensus in.vcf.gz > out.fa |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
Have you tried to manually run the mpileup command? Are all the files in the current directory? Does the sorted bam file look to be of about the same size?
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
The sorted and not-sorted bam files are the same size
Code:
-bash-4.1$ pwd /usit/abel/u1/maxib/1_data/1_project/1st_assembly_strategy -bash-4.1$ du -sh * 7,0G 1_align.sam 84K chrysanthemum_indicum_chloroplast.fasta 3,5K chrysanthemum_indicum_chloroplast.fasta.fai 15G contig.fa 1,8G file.bam 1,8G file_sorted.bam 6,5K file_sorted.bam.bai 0 file.vcf.gz 0 out.fa 512 sam.sh 6,3G scafseq.fa 1,5K test.vcf.gz 0 vcffile Code:
-bash-4.1$ /usit/abel/u1/maxib/8_samtools/bin/samtools mpileup -v -f chrysanthemum_indicum_chloroplast.fasta file_sorted.bam -o file.vcf.gz [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 Abandon |
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
By chance do you have extremely deep coverage (> 8000)? That is a small genome and the result is a large bam file.
|
![]() |
![]() |
![]() |
#18 |
Junior Member
Location: Oslo Join Date: Jun 2015
Posts: 6
|
![]()
Well, I'm working with WGS data to extract, for the moment, the chloroplast genome.
So I have 307 210 727 reads of mean length 151 which equals 46 696 030 504 base pairs. The chloroplast I've mapped them to is 86444 bp. So the coverage is around 540 188... Well, I think you found the problem ! Thanks for your help, I'll randomly subsample my fastq files before alignment by 1000 folds ! ps : to whom might be interested, here is a script to do it : Code:
# Written by Aaronquinlan # https://www.biostars.org/p/6544/ # Starting FASTQ files export FQ1=1.fq export FQ2=2.fq # The names of the random subsets you wish to create export FQ1SUBSET=1.rand.fq export FQ2SUBSET=2.rand.fq # How many random pairs do we want? export N=100 # paste the two FASTQ such that the # header, seqs, seps, and quals occur "next" to one another paste $FQ1 $FQ2 | \ # "linearize" the two mates into a single record. Add a random number to the front of each line awk 'BEGIN{srand()}; {OFS="\t"; \ getline seqs; getline sep; getline quals; \ print rand(),$0,seqs,sep,quals}' | \ # sort by the random number sort -k1,1 | \ # grab the first N records head -n $N | \ # Convert the stream back to 2 separate FASTQ files. awk '{OFS="\n"; \ print $2,$4,$6,$8 >> ENVIRON["FQ1SUBSET"]; \ print $3,$5,$7,$9 >> ENVIRON["FQ2SUBSET"]}' Last edited by MaximeOfOslo; 06-18-2015 at 08:15 AM. |
![]() |
![]() |
![]() |
#19 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
540,000x
![]() Reformat.sh from BBMap can also subsample directly from sam/bam. This can save you some alignment time. Code:
$ reformat.sh in=in.sam out=out.sam sample=some_number_here |
![]() |
![]() |
![]() |
#20 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Picard tools can also randomly downsample a .bam file.
And the cheesy way to do it yourself would be to use awk or grep to only grab reads from a particular tile, one not on the edge would be preferable. |
![]() |
![]() |
![]() |
Tags |
bcftools, samtools |
Thread Tools | |
|
|