I have a dataset that was generated using the GAIIx.
I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.
I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.
In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.
After falling from my chair and getting coffee, I started looking at the data.
One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?
Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?
Another thing, why would the SNP score for these be 1
chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`
especially for the first one, it looks good to me...but what do I know :-)
I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.
BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.
I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.
In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.
After falling from my chair and getting coffee, I started looking at the data.
One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?
Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?
Another thing, why would the SNP score for these be 1
chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`
especially for the first one, it looks good to me...but what do I know :-)
I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.
BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
Comment