Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling alignments spanning chromosomes boundaries in BWA

    Hi All,

    When BWA aligns a read that spans over two chromosomes, it sets the flag as "unmapped" but leaves mapping quality (MAPQ) and CIGAR unchanged.

    Picard's validator tool ValidateSamFile throws a validation error for such reads because it expects an unmapped read to have MAPQ of zero, and CIGAR value of "*".

    I would like to understand the generally acceptable way of handling this type of reads. Should I modify these reads to have MAPQ 0 and CIGAR value of * ?

    I would appreciate any suggestions and best practices to follow for such reads.

    Thanks in advance.

  • #2
    Most people turn the stringency down on Picard to lenient. It'll yell at you about those reads, but it will go through. You'll want to do that with every Picard command.

    Comment


    • #3
      Hi swbarnes2,

      Yes, we can have picard continue through these errors by the lenient stringency. But I was wondering what would be the correct way to handle the reads themselves so that no other downstream tool (not necessarily picard) will choke.

      Comment


      • #4
        I guess what he wants to know is if there is anyone out there that is doing more than ignoring those cases. Maybe patching BWA or rewriting the fields in the BAM. I can see the big sequencing centers doing something like before submitting their data.
        -drd

        Comment


        • #5
          It would be very interesting to understand what proportion of these type of reads are generated by chimerism introduced in the library prep steps, versus biologically significant events.

          My guess is that the fraction caused by the library prep is fairly high, and people just haven't understood that yet because 1) the read lengths haven't been long enough to map both halves, and 2) most people are already overwhelmed by the reads they can map so throwing out a few % that map ambiguously is pretty common practice.

          Cool stuff.

          Comment


          • #6
            Spanning two chromosomes in BWA (short-read) comes from concatenating all chromosomes into one long sequence for use with the BWT and FM-index. So the case where a read is unmapped but has a chr/pos occurs when an alignment spans a chr/chr junction, where the two chromosomes are consecutive. This is different from the read spanning two random locations in the genome on different chromosomes (inter-chromosomal translocations etc.). The BWA (long-read) algorithm will actually produce the latter on longer reads (>100bp), giving two non-overlapping mappings for a read.

            Comment


            • #7
              Now that I think more about it, BWA is following the SAM spec. If the unmapped flag is set, it doesn't matter what you have in the other fields. Downstream tools should take that into account.

              So I'd suggest you tell picard to don't report those warnings, or just ignore them. I guess that is what other people do.
              -drd

              Comment


              • #8
                The maintainers of the spec and the Picard developers have nonetheless decided to call it invalid. Better to patch BWA.

                Comment


                • #9
                  I wrote a simple tool to handle this based on the API provided by Picard. Basically, it is an extension of Picard's CleanSam.jar with the change that if a read has unmapped flag, CIGAR is set to * and mapping quality to zero.

                  If anyone is interested in looking at it / using it, I can have it available in a public place.

                  Comment


                  • #10
                    Handling alignments spanning chromosomes boundaries in BWA

                    Hi nirav99,

                    Could you post or make your code available? I think that I am having this problem on the GATK GenomeAnalysisTK.jar -T IndelRealigner command, which doesn't allow the VALIDATION_STRINGENCY=LENIENT parameter.

                    Thanks,

                    Vince

                    Comment


                    • #11
                      Hi Vince,

                      Download and build the java code at https://github.com/nirav99/Illumina-.../java/FixCIGAR

                      This should help you.

                      Nirav

                      Comment


                      • #12
                        Hi Nirav
                        I would also like to fix my reads to have MAPQ 0 and CIGAR value of * using your FixCIGAR tool. however Java is so overwhelming for me could you please explain in more detail how to build your script and show an example of a command line?

                        i also tried filtering only the mapped reads using ViewSam.jar ALIGNMENT_STATUS=Aligned but it didn't work, don't these type of reads have a flag of 4?

                        Thank you!
                        Daphna

                        Comment


                        • #13
                          Hi Daphna,

                          I am assuming that you are running on Unix/ Linux and you have the latest java compiler. If so, you can follow these steps :

                          1) Download and untar / (unzip) latest Picard. We need two Jar files sam-*.jar and picard-*.jar.

                          2) Download the files from the FixCIGAR directory

                          3) Open GenerateJar.sh and edit the picardPath variable to the location where you have installed the picard library. You should point to the directory path where sam-*.jar and picard-*.jar are located.

                          4) Run GenerateJar.sh, it should build the tool for you.

                          5) It will create a .jar file FixCIGAR.jar. You can run it using the command

                          java -jar FixCIGAR.jar INPUT=bamToFix

                          or

                          java -jar FixCIGAR.jar INPUT=bamToFix OUTPUT=FixedBAM

                          Comment


                          • #14
                            Thank you so much for your help!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            51 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X