Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug of Picard's Markduplicate

    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.

  • #2
    Originally posted by xiang View Post
    I use Picard's Markduplicates. The version is 1.3. The bam files is obtained using maq2sam-long. Then I sorted it using SortSam.


    When I run
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt VALIDATION_STRINGENCY=SILENT

    I got an error as:

    INFO 2010-02-16 14:55:55 MarkDuplicates Start of doWork freeMemory: 8668240; totalMemory: 9109504; maxMemory: 1398145024
    INFO 2010-02-16 14:55:55 MarkDuplicates Reading input file and constructing read end information.
    INFO 2010-02-16 14:55:55 MarkDuplicates Will retain up to 6241718 data points before spilling to disk.
    [Tue Feb 16 14:55:55 GMT 2010] net.sf.picard.sam.MarkDuplicates done.
    Runtime.totalMemory()=108986368
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.ArrayList.get(ArrayList.java:324)
    .....


    If I use
    java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

    I got an error as
    Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

    I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
    Can you post a smaller representation of the BAM you are trying to use? I suggest you also send this to the picard mailing list.
    -drd

    Comment


    • #3
      Originally posted by xiang View Post
      If I use
      java -Xmx2g -jar ~/bin/MarkDuplicates.jar TMP_DIR=. I=mapset_withdup_0.bam O=aa.bam METRICS_FILE=cc.txt

      I got an error as
      Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name GAII01:5:34:1106:456#0, Mapped mate should have mate reference name

      I checked the file. It is well sorted by coordinate. I can merge the file correctly. But I just can't make markduplicates work.
      Does RNAME or NRNM (check SAM spec) matches the reference genome specified on the BAM header?
      -drd

      Comment


      • #4
        I created a very short bam file, with the same error when using markduplicates. It's content is as follows


        GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 163 Chr3 13021517 97 36M * 0 189 AAGCAAATGTACCATATGGGCAAGTGAATGTACTTA @@@CCABBCABA>?B?BBBB@:@B?B==AB><BBB? RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
        GAII02:3:1:0:1856#0 83 Chr3 13021670 97 36M * 0 -189 GTAGCAATCAGCTCATCCTCTTCGTTCTTGACCATT ::::::::::8778:878688888878778688:/! RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
        GAII02:3:1:0:1184#0 163 chloroplast 87135 0 36M * 0 176 ATTATATGGATGATCCGATCCCCCAGGGCCCTGATT ?BB>B@<BC>CBBCCC@@@>################ RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:3 UQ:i:6 H0:i:0 H1:i:0
        GAII02:3:1:0:1184#0 83 chloroplast 87275 0 36M * 0 -176 ATGTTTGCTTTTCGTGAAAAAATACCAATTGAAGTT 9799997747576:::<<<<<9948699699:<;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:1151#0 163 chloroplast 89820 0 36M * 0 176 ATTTTCCACAAAGTGGTGACGAAAGGTATAACTTGT BBBBCCCB@CBBB6@@=?B@@8=ABB8BB@B64??< RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:0 UQ:i:0 H0:i:2 H1:i:0
        GAII02:3:1:0:1151#0 83 chloroplast 89960 0 36M * 0 -176 AATTTTGAAAGAACGTATTGTCAAACTCTTTCAGAT 99993::<<<<<85::777;656;<7;8::9<:;/! RG:Z:WTCHG MF:i:18 AM:i:0 SM:i:0 NM:i:1 UQ:i:0 H0:i:0 H1:i:2
        GAII02:3:1:0:333#0 163 chloroplast 112427 59 36M * 0 146 TTTTGATGAATGCAACTTAGAAAAATTTGTTGAATA BCCCCB@=AB?BA?=@BBCBCBCCBBBBC@B:>B@? RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:30 NM:i:0 UQ:i:0 H0:i:1 H1:i:1
        GAII02:3:1:0:333#0 83 chloroplast 112537 59 36M * 0 -146 TTTTGTTGCTGTCGGAAAAAGGAGAAGTCCAACTCT 78871850136315:5:;:89;;;:::9:9;996,! RG:Z:WTCHG MF:i:18 AM:i:29 SM:i:29 NM:i:1 UQ:i:0 H0:i:0 H1:i:1

        You can download the bam file directly from

        Comment


        • #5
          I created a very short bam file at

          Comment


          • #6
            The header is:

            @HD VN:1.0 GO:none SO:coordinate
            @SQ SN:Chr1 LN:30427671
            @SQ SN:Chr2 LN:19698289
            @SQ SN:Chr3 LN:23459830
            @SQ SN:Chr4 LN:18585056
            @SQ SN:Chr5 LN:26975502
            @SQ SN:chloroplast LN:154478
            @SQ SN:mitochondria LN:366924
            @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
            @PG ID:maq VN:0.7.1-6

            Then the reads:
            GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
            GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0

            Comment


            • #7
              Originally posted by xiang View Post
              The header is:

              @HD VN:1.0 GO:none SO:coordinate
              @SQ SN:Chr1 LN:30427671
              @SQ SN:Chr2 LN:19698289
              @SQ SN:Chr3 LN:23459830
              @SQ SN:Chr4 LN:18585056
              @SQ SN:Chr5 LN:26975502
              @SQ SN:chloroplast LN:154478
              @SQ SN:mitochondria LN:366924
              @RG ID:WTCHG PL:SLX LB:WTCHG PI:200 DS:test_Genome SM:test
              @PG ID:maq VN:0.7.1-6

              Then the reads:
              GAII02:3:1:0:1074#0 99 Chr1 1556161 97 36M * 0 170 NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA !19987888899:88859:;999:88777999999: RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:33 NM:i:1 UQ:i:0 H0:i:0 H1:i:1
              GAII02:3:1:0:1074#0 147 Chr1 1556295 97 36M * 0 -170 TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB RG:Z:WTCHG MF:i:18 AM:i:33 SM:i:64 NM:i:0 UQ:i:0 H0:i:1 H1:i:0
              You don't have the NRNM and MPOS properly setup for both mates:

              This works:

              Code:
              @HD     VN:1.0  GO:none SO:coordinate
              @SQ     SN:Chr1 LN:1000
              @RG     ID:WTCHG        PL:SLX  LB:WTCHG        PI:200  DS:test_Genome  SM:test
              @PG     ID:maq  VN:0.7.1-6
              GAII02:3:1:0:1074#0     99      Chr1    155     97      36M     Chr1    255     170     NTTGAAGGATATCTGGATTCTGAGAAGGAAACCGCA  !19987888899:88859:;999:88777999999:    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:33  NM:i:1       UQ:i:0  H0:i:0  H1:i:1
              GAII02:3:1:0:1074#0     147     Chr1    255     97      36M     Chr1    155     -170    TGAAGCATCTGGAGTTGCTGATACTAGAAAAGTGGA  BAAA>BAA@>@?B??@@@BAB@@AABBBBCBB?BBB    RG:Z:WTCHG      MF:i:18 AM:i:33 SM:i:64  NM:i:0       UQ:i:0  H0:i:1  H1:i:0
              -drd

              Comment


              • #8
                It works. Drio, thank you very much.

                Comment


                • #9
                  What does "...have the NRNM and MPOS properly setup for both mates" mean and how does one go about correcting the bam file so that it is setup properly for both mates?

                  Comment


                  • #10
                    I have the same problem, how to fix the MRNM and MPOS information in SAM file ???

                    Comment


                    • #11
                      Originally posted by av_d View Post
                      I have the same problem, how to fix the MRNM and MPOS information in SAM file ???
                      Copy the chromosome (column 3) from mate1 to column 7 (MRNM) of mate2, and position (column 4) of mate1 to column 8 (MPOS) of mate2. And vice versa (copy the chromosome (column 3) from mate2 to column 7 of mate1, and position (column 4) of mate2 to column 8 of mate1).

                      Comment


                      • #12
                        samtools fixmate **

                        Comment


                        • #13
                          Or you can use the GATK's AddOrReplaceReadGroups :

                          Comment


                          • #14
                            Note: I get the same error on a pair of reads that don't even have alignments at all (unaligned bits are set). But setting the VALIDATION_STRINGENCY=SILENT worked for me.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Advancing Precision Medicine for Rare Diseases in Children
                              by seqadmin




                              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                              12-16-2024, 07:57 AM
                            • seqadmin
                              Recent Advances in Sequencing Technologies
                              by seqadmin



                              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                              Long-Read Sequencing
                              Long-read sequencing has seen remarkable advancements,...
                              12-02-2024, 01:49 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 12-17-2024, 10:28 AM
                            0 responses
                            23 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-13-2024, 08:24 AM
                            0 responses
                            42 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-12-2024, 07:41 AM
                            0 responses
                            28 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-11-2024, 07:45 AM
                            0 responses
                            42 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X