Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie vs bwa + samtools == confusion

    I have a dataset that was generated using the GAIIx.

    I took the fastq, ran it through bowtie with --best and -k1, transformed it in a sam format, pileuped it, varFiltered it with -Q30 and obtained 9309 SNPs.

    I usually just start with bwa, but in this case I wanted to look at a coverage graph quickly that's why I used bowtie first.

    In any case, I used default params for bwa, I used -q20 when creating the bam file, ran pileup, ran varFIlter -Q30. I got 56k SNPs.

    After falling from my chair and getting coffee, I started looking at the data.

    One thing that struck me, was that the samtools spec says that base quality is phred33, but when I check the pileup files from my bwa run, the characters are in the phred64 range. Can this affect the consensus/SNP calling from pileup?

    Should I convert the original fastq files from phred64 to phred33 (bwa doesn't use qualities so it doesn't matter for it)?

    Another thing, why would the SNP score for these be 1
    chrI 1599 T Y 1 1 37 10 ,,.c,.,c.^F. \RcL^dMHfe
    chrI 3658 A M 1 1 37 23 .$....,........,cc,c,,,^F. ^fdccdeefeddff^GGbGY\L`

    especially for the first one, it looks good to me...but what do I know :-)

    I am at the point of comparing the alignment of individual reads between both to see what's going on, but before doing this I was wondering if any of you had any clues.

    BTW, I'm also trying the GATK IndelRealigner to see how it changes the SNP calling for bwa.
    Last edited by lletourn; 04-29-2010, 06:25 AM.

  • #2
    If somebody want's to see the pipelined script.
    The name of the files have been changed to preserve the identity of those files...ahem... ;-)

    $BWA aln -t 8 <GENOME> <FASTQ> > bwa.sai
    $BWA samse <GENOME> bwa.sai <FASTQ> > sam.sam

    $SAMTOOLS_HOME/samtools view -bS -q20 -o bam.bam sam.sam
    $SAMTOOLS_HOME/samtools sort bam.bam bam.sorted
    $SAMTOOLS_HOME/samtools index bam.sorted.bam

    mkdir -p wig
    java -Xmx15G -Xms15G -jar ~/fp4/FindPeaks.jar -input bam.sorted.bam -aligner sam -output wig/ -dist_type 3 -name bam

    $SAMTOOLS_HOME/samtools pileup -vcf <GENOME> bam.sorted.bam > bam.sorted.pileup
    $SAMTOOLS_HOME/samtools.pl varFilter -Q 30 bam.sorted.pileup > bam.sorted.coveraged.pileup

    Comment


    • #3
      For bowtie it's the same except for the aligner call:
      $BOWTIE -t --chunkmbs 1024 -p 14 -k 1 --best -y -S --solexa1.3-quals -S <GENOME> <FASTQ> > sam.sam

      Comment


      • #4
        bwa does not use base quality, but samtools cares. you should convert the quality.

        Comment


        • #5
          Hi Ih3.

          I'm having the same problem.

          (I have aligned paired end reads from Illumina GAIIx with Bowtie and BWA. I called SNVs with VarScan, but I would like to call SNVs also with samtools varfilter so that I can compare the resulting SNVs).

          What is the best way to convert these qualities for samtools? Is there maybe some software already available?

          Thanks!

          Comment


          • #6
            I used embosses seqret. lh3 was right, it solved all my problems.
            $EMBOSS_HOME/bin/seqret fastq-illumina::q64.fastq fastq::q33.fastq

            I use the CSHL fastx tool first to trim. It reads phred-64 by default, but you can run it on phred33 data by using the (hidden) -Q33 parameter

            Comment


            • #7
              Hi,

              I'm new to these analyses, so I don't really understand it yet.
              Do you use this seqret on the original fastq files? And only when aligning with BWA, or also when aligning with Bowtie?
              Do you trim before or after the seqret, and what do you trim for? Only when you use barcodes, or also for noise?

              Thanks a lot!

              Comment


              • #8
                Hi

                I'm new to these analyses, so I don't understand everything yet.

                Do you use the seqret on original fastq files? Do you only have to convert the qualities when aligning with BWA, or also when aligning with Bowtie?
                Do you trim before or after seqret, and what do you trim for? (is it for barcodes, or also for noise?)

                Thanks a lot!

                Comment


                • #9
                  You may use MAQ to convert illumina fastq into sanger fastq and proceed with BWA. While using Bowtie, you have to specify --solexa1.3-quals for pipeline 1.3+ reads.

                  Comment


                  • #10
                    The thing with maq is that you need a patch to make it work. It doesn't convert solexa >1.3 only <1.3.
                    The patch is here:


                    As seq_GA said bowtie can use phred33 or phred64 formats.

                    BWA doesn't use quality to align so you don't have to convert the qualities, but if you use samtools afterwards, the qualities in your sam file *must* be phred33.

                    If you don't it might call snps where bases are of low quality in phred 64 but look of high quality when you interpret them as phred33.


                    Instead of patching and compiling I prefer to use emboss. Hence, my suggestion.

                    Comment


                    • #11
                      Hi,

                      I used the Emboss seqret and I think it worked alright.
                      Thanks for the help!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X