Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA alignment of multiple read files, the order of processing matters?

    Hi,
    I have multiple fastq files coming from different sequencing lanes but of the same tissue origin. So I want to align them using BWA. I have tried two ways of doing this:

    a: BWA align individual fastq and get bam file for each and then use samtools to merge the individual bam file into one final bam file

    b: Merge fastq first, then BWA align and end up with one bam file.

    The results differs only a little bit (7 reads out of 100+million) when I counted the number of mapped reads. But I thought they should be the same.

    Does anyone know why is the difference between method a and b?

    Thanks!

  • #2
    I haven't used BWA but perhaps there is inherent randomness in regards to your settings with mapping reads that don't map uniquely. Regardless, if only 7 out of 100+ million reads are different, it probably won't be significant downstream. How did you determine only 7 reads were different?

    Also, different topic, but I prefer using Picard to merge .bam files if you have proper @RG headers in each one as it will maintain those.

    Comment


    • #3
      Originally posted by Heisman View Post
      I haven't used BWA but perhaps there is inherent randomness in regards to your settings with mapping reads that don't map uniquely. Regardless, if only 7 out of 100+ million reads are different, it probably won't be significant downstream. How did you determine only 7 reads were different?

      Also, different topic, but I prefer using Picard to merge .bam files if you have proper @RG headers in each one as it will maintain those.
      I used a in-house tool to count. I think the idea is to look at the bitwise field in the sam file and determine if a read is mapped or not.

      Do you mean samtools merge don't maintain those header lines?
      Last edited by gene_x; 07-12-2013, 11:35 AM.

      Comment


      • #4
        In the header file I don't believe it does, although it may in a field for each aligned read. This also only matters if you have the aligner include header information when you align. If you do and you have two separate .bam files now (with different header information), you can go ahead and try samtools merge and also Picard MergeSamFiles separately and see exactly what it keeps with samtools view -h <merged.bam>. I don't remember exactly what they do differently so I don't want to guess incorrectly.

        Comment


        • #5
          I didn't realize I can look at the bam file...

          Just checked it since I have done it and the header is there and are the same between the merged bam file using samtools merge and the bam file produced from aligning merged fastq files.

          Comment


          • #6
            Right, I think the difference would be if you have different "@RG\tID:" tags for each file that you merge together. I just checked and it doesn't affect the aligned reads themselves, just the header section. For example, one .bam file has "RG:A1_ACAGT_L003" and another has "RG:A2_AGTGAC_L005", and those show up with the RG flag in each aligned read, that will be maintained in the merged file with either program. With samtools merge, however, the header lines will only have one of the RG lines, while with Picard it will have both of them.

            If you just type "samtools merge" there's a note at the bottom that actually says this.

            Comment


            • #7
              I asked you in another post what RG LB IDs are..

              Do you mean with Picard, it will have both of the header files (no matter they are the same or different) in the beginning of the alignment file?

              When I run samtools merge with a bunch of bam files. I didn't specify any options. All the bam file are alignment of reads coming from the same sequencer (with reads from multiple lanes). Then the merged bam file has the same header as the individual bam file. I guess it just retain the header when the to-be-merged bam files have the same header.

              Comment


              • #8
                Yeah, that sounds right. If these are all from the same library just sequenced many times it doesn't really matter. If these are all from the same sample but different libraries and each library was sequenced one time it still probably doesn't matter too much. If there are different libraries and some were sequenced more than once, then this will become more important for properly removing duplicate reads.

                Try to read through what I linked in the other thread and if there are still things unclear feel free to provide a more thorough description/flowchart of the overall experiment (detailing the number of samples, how many libraries were made of each sample, and how many times each library was sequenced) and we can try to figure out the best way to analyze the data.

                Comment


                • #9
                  Originally posted by Heisman View Post
                  Yeah, that sounds right. If these are all from the same library just sequenced many times it doesn't really matter. If these are all from the same sample but different libraries and each library was sequenced one time it still probably doesn't matter too much. If there are different libraries and some were sequenced more than once, then this will become more important for properly removing duplicate reads.

                  Try to read through what I linked in the other thread and if there are still things unclear feel free to provide a more thorough description/flowchart of the overall experiment (detailing the number of samples, how many libraries were made of each sample, and how many times each library was sequenced) and we can try to figure out the best way to analyze the data.
                  They are actually different libraries but sequenced on the same machine. I'll read the sam format a bit more. My feeling is reads from the same sequencer in one run have the same header.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X