Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA alignment of multiple read files, the order of processing matters?

    Hi,
    I have multiple fastq files coming from different sequencing lanes but of the same tissue origin. So I want to align them using BWA. I have tried two ways of doing this:

    a: BWA align individual fastq and get bam file for each and then use samtools to merge the individual bam file into one final bam file

    b: Merge fastq first, then BWA align and end up with one bam file.

    The results differs only a little bit (7 reads out of 100+million) when I counted the number of mapped reads. But I thought they should be the same.

    Does anyone know why is the difference between method a and b?

    Thanks!

  • #2
    I haven't used BWA but perhaps there is inherent randomness in regards to your settings with mapping reads that don't map uniquely. Regardless, if only 7 out of 100+ million reads are different, it probably won't be significant downstream. How did you determine only 7 reads were different?

    Also, different topic, but I prefer using Picard to merge .bam files if you have proper @RG headers in each one as it will maintain those.

    Comment


    • #3
      Originally posted by Heisman View Post
      I haven't used BWA but perhaps there is inherent randomness in regards to your settings with mapping reads that don't map uniquely. Regardless, if only 7 out of 100+ million reads are different, it probably won't be significant downstream. How did you determine only 7 reads were different?

      Also, different topic, but I prefer using Picard to merge .bam files if you have proper @RG headers in each one as it will maintain those.
      I used a in-house tool to count. I think the idea is to look at the bitwise field in the sam file and determine if a read is mapped or not.

      Do you mean samtools merge don't maintain those header lines?
      Last edited by gene_x; 07-12-2013, 11:35 AM.

      Comment


      • #4
        In the header file I don't believe it does, although it may in a field for each aligned read. This also only matters if you have the aligner include header information when you align. If you do and you have two separate .bam files now (with different header information), you can go ahead and try samtools merge and also Picard MergeSamFiles separately and see exactly what it keeps with samtools view -h <merged.bam>. I don't remember exactly what they do differently so I don't want to guess incorrectly.

        Comment


        • #5
          I didn't realize I can look at the bam file...

          Just checked it since I have done it and the header is there and are the same between the merged bam file using samtools merge and the bam file produced from aligning merged fastq files.

          Comment


          • #6
            Right, I think the difference would be if you have different "@RG\tID:" tags for each file that you merge together. I just checked and it doesn't affect the aligned reads themselves, just the header section. For example, one .bam file has "RG:A1_ACAGT_L003" and another has "RG:A2_AGTGAC_L005", and those show up with the RG flag in each aligned read, that will be maintained in the merged file with either program. With samtools merge, however, the header lines will only have one of the RG lines, while with Picard it will have both of them.

            If you just type "samtools merge" there's a note at the bottom that actually says this.

            Comment


            • #7
              I asked you in another post what RG LB IDs are..

              Do you mean with Picard, it will have both of the header files (no matter they are the same or different) in the beginning of the alignment file?

              When I run samtools merge with a bunch of bam files. I didn't specify any options. All the bam file are alignment of reads coming from the same sequencer (with reads from multiple lanes). Then the merged bam file has the same header as the individual bam file. I guess it just retain the header when the to-be-merged bam files have the same header.

              Comment


              • #8
                Yeah, that sounds right. If these are all from the same library just sequenced many times it doesn't really matter. If these are all from the same sample but different libraries and each library was sequenced one time it still probably doesn't matter too much. If there are different libraries and some were sequenced more than once, then this will become more important for properly removing duplicate reads.

                Try to read through what I linked in the other thread and if there are still things unclear feel free to provide a more thorough description/flowchart of the overall experiment (detailing the number of samples, how many libraries were made of each sample, and how many times each library was sequenced) and we can try to figure out the best way to analyze the data.

                Comment


                • #9
                  Originally posted by Heisman View Post
                  Yeah, that sounds right. If these are all from the same library just sequenced many times it doesn't really matter. If these are all from the same sample but different libraries and each library was sequenced one time it still probably doesn't matter too much. If there are different libraries and some were sequenced more than once, then this will become more important for properly removing duplicate reads.

                  Try to read through what I linked in the other thread and if there are still things unclear feel free to provide a more thorough description/flowchart of the overall experiment (detailing the number of samples, how many libraries were made of each sample, and how many times each library was sequenced) and we can try to figure out the best way to analyze the data.
                  They are actually different libraries but sequenced on the same machine. I'll read the sam format a bit more. My feeling is reads from the same sequencer in one run have the same header.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  48 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X