Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie vs bwa + samtools == confusion

    I have a dataset that was generated using the GAIIx.

    I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.

    I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.

    In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.

    After falling from my chair and getting coffee, I started looking at the data.

    One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?

    Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?

    Another thing, why would the SNP score for these be 1
    chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
    chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`

    especially for the first one, it looks good to me...but what do I know :-)

    I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.

    BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
    Last edited by lletourn; 04-29-2010, 06:25 AM.

  • #2
    If somebody want's to see the pipelined script.
    The name of the files have been changed to preserve the identity of those files...ahem... ;-)

    $BWA aln -t 8 <GENOME> <FASTQ> > bwa.sai
    $BWA samse <GENOME> bwa.sai <FASTQ> > sam.sam

    $SAMTOOLS_HOME/samtools view -bS -q20 -o bam.bam sam.sam
    $SAMTOOLS_HOME/samtools sort bam.bam bam.sorted
    $SAMTOOLS_HOME/samtools index bam.sorted.bam

    mkdir -p wig
    java -Xmx15G -Xms15G -jar ~/fp4/FindPeaks.jar -input bam.sorted.bam -aligner sam -output wig/ -dist_type 3 -name bam

    $SAMTOOLS_HOME/samtools pileup -vcf <GENOME> bam.sorted.bam > bam.sorted.pileup
    $SAMTOOLS_HOME/samtools.pl varFilter -Q 30 bam.sorted.pileup > bam.sorted.coveraged.pileup

    Comment


    • #3
      For bowtie it's the same except for the aligner call:
      $BOWTIE -t --chunkmbs 1024 -p 14 -k 1 --best -y -S --solexa1.3-quals -S <GENOME> <FASTQ> > sam.sam

      Comment


      • #4
        bwa does not use base quality, but samtools cares. you should convert the quality.

        Comment


        • #5
          Hi Ih3.

          I'm having the same problem.

          (I have aligned paired end reads from Illumina GAIIx with Bowtie and BWA. I called SNVs with VarScan, but I would like to call SNVs also with samtools varfilter so that I can compare the resulting SNVs).

          What is the best way to convert these qualities for samtools? Is there maybe some software already available?

          Thanks!

          Comment


          • #6
            I used embosses seqret. lh3 was right, it solved all my problems.
            $EMBOSS_HOME/bin/seqret fastq-illumina::q64.fastq fastq::q33.fastq

            I use the CSHL fastx tool first to trim. It reads phred-64 by default, but you can run it on phred33 data by using the (hidden) -Q33 parameter

            Comment


            • #7
              Hi,

              I'm new to these analyses, so I don't really understand it yet.
              Do you use this seqret on the original fastq files? And only when aligning with BWA, or also when aligning with Bowtie?
              Do you trim before or after the seqret, and what do you trim for? Only when you use barcodes, or also for noise?

              Thanks a lot!

              Comment


              • #8
                Hi

                I'm new to these analyses, so I don't understand everything yet.

                Do you use the seqret on original fastq files? Do you only have to convert the qualities when aligning with BWA, or also when aligning with Bowtie?
                Do you trim before or after seqret, and what do you trim for? (is it for barcodes, or also for noise?)

                Thanks a lot!

                Comment


                • #9
                  You may use MAQ to convert illumina fastq into sanger fastq and proceed with BWA. While using Bowtie, you have to specify --solexa1.3-quals for pipeline 1.3+ reads.

                  Comment


                  • #10
                    The thing with maq is that you need a patch to make it work. It doesn't convert solexa >1.3 only <1.3.
                    The patch is here:


                    As seq_GA said bowtie can use phred33 or phred64 formats.

                    BWA doesn't use quality to align so you don't have to convert the qualities, but if you use samtools afterwards, the qualities in your sam file *must* be phred33.

                    If you don't it might call snps where bases are of low quality in phred 64 but look of high quality when you interpret them as phred33.


                    Instead of patching and compiling I prefer to use emboss. Hence, my suggestion.

                    Comment


                    • #11
                      Hi,

                      I used the Emboss seqret and I think it worked alright.
                      Thanks for the help!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Innovations in Spatial Biology
                        by seqadmin


                        Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                        3D Genomics
                        While spatial biology often involves studying proteins and RNAs in their...
                        01-01-2025, 07:30 PM
                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 01-09-2025, 04:04 PM
                      0 responses
                      431 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 01-09-2025, 09:42 AM
                      0 responses
                      440 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 01-08-2025, 03:17 PM
                      0 responses
                      452 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 01-03-2025, 11:18 AM
                      1 response
                      50 views
                      1 like
                      Last Post Tonia
                      by Tonia
                       
                      Working...
                      X