![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
samtools mpileup calls way too less SNPs | TuA | Bioinformatics | 17 | 03-01-2018 05:17 PM |
Samtools mpileup calls drastically more SNPs with -I | agel | Bioinformatics | 0 | 01-20-2012 02:20 PM |
MPileup not considering all reads when making calls? | Heisman | Bioinformatics | 11 | 01-05-2012 08:13 AM |
Range of quality of base calls at each position in my alignment of 454 reads | trasver | 454 Pyrosequencing | 1 | 03-07-2011 05:31 AM |
mpileup on haploid samples | agc | Bioinformatics | 1 | 11-24-2010 05:14 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: San Diego, CA Join Date: Aug 2011
Posts: 2
|
![]()
I'm generating FASTQ consensus file with the samtools mpileup | bcftools view | vcfutils.pl vcf2fq chain from BAM files generated with BWA. These BAM files are alignments of Illumina reads to a mitochondrial (haploid) reference sequence.
What I would like is a FASTQ file where the consensus base calls are made based on the most common (high quality) base in the reads at that site. That is, I don't want any ambiguity codes in my sequence. Alternatively, I'd like to be able to set a frequency threshold that a base has to exist at to be called, otherwise it is set to N. I have played with all of the parameters to samtools mpileup, bcftools view, and vcfutils.pl vcf2fq focusing on bcftools view's '-p', the variant probability threshold, but I can't get it to do what I want as it seems to be designed for SNP calling in diploid data and thus potentially valid heterozygous bases are called as such rather than being able to force a "homozygous" call of the most common base. As an example, I have a site that has 8 C's and 17 T's, all at high quality. I would like the consensus sequence of this site to have T rather than Y. In a perfect world, if I had a site with 8 C's and 8 T's, I'd like the option of making it either an N or calling it whatever base the reference has. If the mpileup -> bcftools -> vcf2fq chain won't do this, is there some other command line software that will do this from a BAM file? I need it to be command line driven because I'm doing this for 100s of BAM files as part of a batch pipeline I've written in R. Thanks in advance for any help. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Austin, TX Join Date: Sep 2011
Posts: 20
|
![]()
I'm interested to know how to force haploid calls as well! Does anyone have any ideas how to do this?
Thank you! |
![]() |
![]() |
![]() |
#3 |
Member
Location: Edinburgh Join Date: Mar 2010
Posts: 16
|
![]()
Hi,
did you ever find a way to solve this? |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: Canada Join Date: Jun 2011
Posts: 1
|
![]()
Hello,
I'm also interested to know if you have found a solution for SNP calls on haploid genomes. Thank you |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: usa Join Date: Jul 2011
Posts: 3
|
![]()
You may try this recent program SNVer.
http://snver.sourceforge.net/ It models the number of haploids in its model so it is applicable to haplid genomes too. We have tried it on mitochondria for a collaborator. It works well. |
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: India Join Date: Apr 2013
Posts: 1
|
![]()
I still want this answer can someone please tell me how to solve this???
Thank you |
![]() |
![]() |
![]() |
#7 |
Member
Location: Austin, TX Join Date: Sep 2011
Posts: 20
|
![]()
For my applications, forcing an ambiguous call ended up not being necessary. However, I think the Genome Analysis Toolkit does a very nice job of genotyping and if I recall correctly it has a feature where you can set it to call haploid calls. Maybe dig into the forums over at GATK to find out more?
Good luck! |
![]() |
![]() |
![]() |
Thread Tools | |
|
|