Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug of Picard's Markduplicate

    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.

  • #2
    Originally posted by xiang View Post
    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
    Can you post a smaller representation of the BAM you are trying to use? I suggest you also send this to the picard mailing list.
    -drd

    Comment


    • #3
      Originally posted by xiang View Post
      If I use
      java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

      I got an error as
      Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

      I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
      Does RNAME or NRNM (check SAM spec) matches the reference genome specified on the BAM header?
      -drd

      Comment


      • #4
        I created a very short bam file, with the same error when using markduplicates. It's content is as follows


        GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 163 Chr3 13021517 97 36M * 0 189 AAGCAAATGTACCATATGGGCAAGTGAATGTACTTA @@@CCABBCABA>?B?BBBB@:@B?B==AB><BBB? RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 83 Chr3 13021670 97 36M * 0 -189 GTAGCAATCAGCTCATCCTCTTCGTTCTTGACCATT ::::::::::8778:878688888878778688:/! RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1184#0 163 chloroplast 87135 0 36M * 0 176 ATTATATGGATGATCCGATCCCCCAGGGCCCTGATT ?BB>B@<BC>CBBCCC@@@>################ RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:3 UQ:i:6 H0:i:0 H1:i:0
        GAII02:3:1:0:1184#0 83 chloroplast 87275 0 36M * 0 -176 ATGTTTGCTTTTCGTGAAAAAATACCAATTGAAGTT 9799997747576:::<<<<<9948699699:<;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:1151#0 163 chloroplast 89820 0 36M * 0 176 ATTTTCCACAAAGTGGTGACGAAAGGTATAACTTGT BBBBCCCB@CBBB6@@=?B@@8=ABB8BB@B64??< RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:0 UQ:i:0 H0:i:2 H1:i:0
        GAII02:3:1:0:1151#0 83 chloroplast 89960 0 36M * 0 -176 AATTTTGAAAGAACGTATTGTCAAACTCTTTCAGAT 99993::<<<<<85::777;656;<7;8::9<:;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:333#0 163 chloroplast 112427 59 36M * 0 146 TTTTGATGAATGCAACTTAGAAAAATTTGTTGAATA BCCCCB@=AB?BA?=@BBCBCBCCBBBBC@B:>B@? RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:30 NM:i:0 UQ:i:0 H0:i:1 H1:i:1
        GAII02:3:1:0:333#0 83 chloroplast 112537 59 36M * 0 -146 TTTTGTTGCTGTCGGAAAAAGGAGAAGTCCAACTCT 78871850136315:5:;:89;;;:::9:9;996,! RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:29 NM:i:1 UQ:i:0 H0:i:0 H1:i:1

        You can download the bam file directly from

        Comment


        • #5
          I created a very short bam file at

          Comment


          • #6
            The header is:

            @HD VN:1.0 GO:none SO:coordinate
            @SQ SN:Chr1 LN:30427671
            @SQ SN:Chr2 LN:19698289
            @SQ SN:Chr3 LN:23459830
            @SQ SN:Chr4 LN:18585056
            @SQ SN:Chr5 LN:26975502
            @SQ SN:chloroplast LN:154478
            @SQ SN:mitochondria LN:366924
            @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
            @PG ID:maq VN:0.7.1-6

            Then the reads:
            GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
            GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0

            Comment


            • #7
              Originally posted by xiang View Post
              The header is:

              @HD VN:1.0 GO:none SO:coordinate
              @SQ SN:Chr1 LN:30427671
              @SQ SN:Chr2 LN:19698289
              @SQ SN:Chr3 LN:23459830
              @SQ SN:Chr4 LN:18585056
              @SQ SN:Chr5 LN:26975502
              @SQ SN:chloroplast LN:154478
              @SQ SN:mitochondria LN:366924
              @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
              @PG ID:maq VN:0.7.1-6

              Then the reads:
              GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
              GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
              You don't have the NRNM and MPOS properly setup for both mates:

              This works:

              Code:
              @HD     VN:1.0  GO:none SO:coordinate
              @SQ     SN:Chr1 LN:1000
              @RG     ID:WTCHG        PL:SLX  LB:WTCHG        PI:200  DS:test_Genome  SM:test
              @PG     ID:maq  VN:0.7.1-6
              GAII02:3:1:0:1074#0     99      Chr1    155     97      36M     Chr1    255     170     NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA  !19987888899:88859:;999:88777999999:    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:33  NM:i:1       UQ:i:0  H0:i:0  H1:i:1
              GAII02:3:1:0:1074#0     147     Chr1    255     97      36M     Chr1    155     -170    TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA  BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:64  NM:i:0       UQ:i:0  H0:i:1  H1:i:0
              -drd

              Comment


              • #8
                It works. Drio, thank you very much.

                Comment


                • #9
                  What does "...have the NRNM and MPOS properly setup for both mates" mean and how does one go about correcting the bam file so that it is setup properly for both mates?

                  Comment


                  • #10
                    I have the same problem, how to fix the MRNM and MPOS information in SAM file ???

                    Comment


                    • #11
                      Originally posted by av_d View Post
                      I have the same problem, how to fix the MRNM and MPOS information in SAM file ???
                      Copy the chromosome (column 3) from mate1 to column 7 (MRNM) of mate2, and position (column 4) of mate1 to column 8 (MPOS) of mate2. And vice versa (copy the chromosome (column 3) from mate2 to column 7 of mate1, and position (column 4) of mate2 to column 8 of mate1).

                      Comment


                      • #12
                        samtools fixmate **

                        Comment


                        • #13
                          Or you can use the GATK's AddOrReplaceReadGroups :

                          Comment


                          • #14
                            Note: I get the same error on a pair of reads that don't even have alignments at all (unaligned bits are set). But setting the VALIDATION_STRINGENCY=SILENT worked for me.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X