Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa sampe bam input

    I have paired-end data in two separate fastq files. I started out with converting and merging the two fastq files into a single unaligned bam using picard FastqToSam (using the FASTQ and FASTQ2 options). I used this bam file with bwa aln and successfully generated the sai file.
    Now, for the sampe step I am supposed to provide the sai file and the original sequence file. Since bwa aln is able to accept bam files as input there should be a way to provide the same bam file even for the sampe step.
    Any ideas?

  • #2
    according to the bwa manual page:


    -b Specify the input read sequence file is the BAM format. For paired-end data, two ends in a pair must be grouped together and options -1 or -2 are usually applied to specify which end should be mapped. Typical command lines for mapping pair-end data in the BAM format are:

    bwa aln ref.fa -b1 reads.bam > 1.sai
    bwa aln ref.fa -b2 reads.bam > 2.sai
    bwa sampe ref.fa 1.sai 2.sai reads.bam reads.bam > aln.sam

    Comment


    • #3
      Merging the fastqs wasn't right for use with bwa, and converting them to .bams is unecessary. You need to align each one separately to the reference fasta.

      bwa aln ref.fa fastq1.fq > 1.sai
      bwa aln ref.fa fastq2.fq > 2.sai
      bwa sampe ref.fa 1.sai 2.sai fastq1.fq fastq2.fq > aln.sam

      Here's the tricky part. Yes, you do want .bams in the end. But bams is a very flexible format. A .bam ends up containing all the orignal information of the fastq, and the information about where it aligned. When you converted the fastq to bam, you just made a bam file that contained no information about where the reads mapped, because you never mapped them.

      Run the bwa aln program to align the reads to the reference. Those reads can be in bam format or fastq format. But you definately need read1 and read2 in separate files. Then sampe will take them both, and know that they are in pairs, and make the .bam with all the alignment information and will work out pair insert sizes.

      Comment


      • #4
        Thank you very much lletourn and swbarnes2 for your answers.
        When I had asked this question half a year ago I was trying to play with all the possibilities with bwa. But I soon realized, as swbarnes2 suggested, that use of unaligned bam does not provide any real advantage as far as the alignment is concerned. So I had dropped that idea.
        However, I want to thank you both again for the inputs.

        Comment


        • #5
          Re:

          DELETED - duplicate post
          Last edited by avinash; 10-25-2011, 09:52 AM. Reason: duplicate post

          Comment


          • #6
            I read the first post too fast. At our center we use unaligned bams as inputs to bwa using the technique I mentioned above. This works with paired and single datasets.

            What I propose is fine for this, but to build paired bams you can just append fastqs as swbarnes2 mentions.

            sorry for the confusion.

            Comment


            • #7
              Related to the topic of this thread:

              If we are re-aligning paired data from a previously aligned bam file (because I had a bam file returned to me from a vendor rather than fastq files), what is the syntax for the sampe step? Do we just list the BAM file twice? Example:

              bwa sampe -r "@RG\tID:bwa\tLB:bwa\tSM:bwa\tPL:ILLUMINA" human_g1k_v37_decoy bwa.1.sai bwa.2.sai in.bam in.bam > out.bwa.sam

              I started a run that way just now and it appears to be working but I wanted to make sure this is the way to do it if anyone can confirm. Thanks!
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment


              • #8
                one question, if I have read pairs does it for sampe if I switch read pairs for the input?

                like:
                bwa sampe ref.fa 2.sai 1.sai read2.fq read1.fq > aln.sam

                instead of:
                bwa sampe ref.fa 1.sai 2.sai read1.fq read2.fq > aln.sam

                or does it not matter for the sampe step which read was sequenced first. I think the direction of the sequenced read should matter, or not?
                seq:___-----------------------------------
                r1:____------------>__is___<---------- r2

                maybe I just try it and check if the output changes. ;-)
                Last edited by Thorondor; 09-18-2012, 12:20 AM.

                Comment


                • #9
                  Originally posted by Michael.James.Clark View Post
                  Related to the topic of this thread:

                  If we are re-aligning paired data from a previously aligned bam file (because I had a bam file returned to me from a vendor rather than fastq files), what is the syntax for the sampe step? Do we just list the BAM file twice? Example:

                  bwa sampe -r "@RG\tID:bwa\tLB:bwa\tSM:bwa\tPL:ILLUMINA" human_g1k_v37_decoy bwa.1.sai bwa.2.sai in.bam in.bam > out.bwa.sam

                  I started a run that way just now and it appears to be working but I wanted to make sure this is the way to do it if anyone can confirm. Thanks!
                  If it's really working, then that works, but I would be skeptical that it's really treating that one file listed twice as if it has pairs of paired reads in it. I'd check on that before I concluded that it was working. It's probably figuring that the first read is paired with the first read, and the second read is paired with the second read, like is usually the case.

                  What might work better is to split the .bam into a read 1 .bam and a read 2 .bam, and use those.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X