Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • slowsmile
    Member
    • May 2011
    • 22

    Error with MarkDuplicates in Picard

    Dear All
    I am still on the learning curve with the GATK tool but I encountered an error at the duplicates marking step with Picard tool.

    The procedure I did is the following:

    I generated bam files for each sample using tophat 1.33 and I sorted
    each bam file (one file per sample) using picard ReorderSam.jar to
    hg19 reference genome.

    After that I added read group information using Picard
    AddOrReplaceReadGroups.jar.

    Then I tried to remove pair duplicates using the MarkDuplicates.jar in
    Picard. However, I encountered error at this step and failed to
    generated the duplicates-removed files after running the Picard code.

    The errors I received are like the following:


    [Thu Feb 16 15:06:56 EST 2012] net.sf.picard.sam.MarkDuplicates
    INPUT=[/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/sorted_GP.bam]
    OUTPUT=/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/marked.bam
    METRICS_FILE=/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/metrics
    REMOVE_DUPLICATES=false ASSUME_SORTED=false
    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
    MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
    SORTING_COLLECTION_SIZE_RATIO=0.25
    READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).*
    OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
    VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
    MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Thu Feb 16 15:06:56 EST 2012] Executing as
    slowsmile@slowsmile-HP-xw8600-Workstation on Linux 3.0.0-15-generic
    amd64; OpenJDK 64-Bit Server VM 1.7.0_147-icedtea-b147; Picard
    version: 1.60(1086)
    INFO 2012-02-16 15:06:56 MarkDuplicates Start of doWork freeMemory:
    124147272; totalMemory: 125698048; maxMemory: 1866006528
    INFO 2012-02-16 15:06:56 MarkDuplicates Reading input file and
    constructing read end information.
    INFO 2012-02-16 15:06:56 MarkDuplicates Will retain up to 7404787 data
    points before spilling to disk.
    INFO 2012-02-16 15:07:14 MarkDuplicates Read 1000000 records. Tracking
    129157 as yet unmatched pairs. 6550 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:19 MarkDuplicates Read 2000000 records. Tracking
    136196 as yet unmatched pairs. 9506 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:24 MarkDuplicates Read 3000000 records. Tracking
    190648 as yet unmatched pairs. 61032 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:29 MarkDuplicates Read 4000000 records. Tracking
    144992 as yet unmatched pairs. 9135 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:34 MarkDuplicates Read 5000000 records. Tracking
    180193 as yet unmatched pairs. 36398 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:39 MarkDuplicates Read 6000000 records. Tracking
    186193 as yet unmatched pairs. 35242 records in RAM. Last sequence
    index: 0
    [Thu Feb 16 15:07:42 EST 2012] net.sf.picard.sam.MarkDuplicates done.
    Elapsed time: 0.78 minutes.
    Runtime.totalMemory()=1352466432
    Exception in thread "main" net.sf.picard.PicardException: Value was
    put into PairInfoMap more than once. 1:
    HT29.LANE1:HWI-ST978:1370AHMACXX:5:1207:8810:84360
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)

    I read the log carefully but cannot figure out the source of error.
    What does "Value was put into PairInfoMap more than once" mean here?
    Can you help me resolve this problem?

    Thanks a lot
  • ginolhac
    Junior Member
    • Oct 2010
    • 4

    #2
    Hello,

    I had the same issue, does someone has any clues about this?

    Thanks,

    Comment

    • ginolhac
      Junior Member
      • Oct 2010
      • 4

      #3
      I am answering myself,
      it was due to fake read mapped with bwa such as:
      (null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
      (null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
      (null) 65 chr21 48313514 25 0M chr18 18626503 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:25 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0

      (null) was found more than twice and MarkDuplicates complained. By increasing the mapping quality to 26 we can get rid of them or using samtools view -f 0x2 since they are not properly paired.

      Comment

      • shawpa
        Member
        • Aug 2011
        • 73

        #4
        I am running into the same error with picardmarkduplicates. My alignment was done with bowtie2. I have run this script before on different data sets and didn't see this error. Since you figured out what was wrong with your data I was hoping you could let me know how you did that. Here's the error I get.

        Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: L3:MWR-PRG-0014:74:C0E94ACXX:3:1206:11809:158670

        Comment

        • upendra_35
          Senior Member
          • Apr 2010
          • 102

          #5
          Hi ginolhac,

          I encountered the same problem as you when i tried to use MarkDuplicate command and when i looked at the problematic read i found that the Mapping Quality of those two reads were more than 25. Then how do we remove those reads? Thanks in advance for your help......

          Comment

          • ginolhac
            Junior Member
            • Oct 2010
            • 4

            #6
            He,

            actually the issue came from fastq files that were not in sync. Some reads were missing at the end of one of the file. That explained those reads with a (null) name.
            To remove those, I used:
            Code:
            samtools view -h file.bam | grep -v null | samtools view -bS - > file_clean.bam
            hope this helps

            Comment

            • JezSupreme
              Junior Member
              • Mar 2013
              • 6

              #7
              I encountered the same error using picard tools MarkDuplicates and it was related to the alignment I had done using BWA (BWA MEM).

              I had failed to use the -M option when running the alignment which enables compatibility with picard-tools MarkDuplicates function. I went back and re-ran the alignment with that option and it fixed the error.

              From the BWA manual site:
              -M Mark shorter split hits as secondary (for Picard compatibility).

              Comment

              • bwubb
                Member
                • Jan 2012
                • 61

                #8
                I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

                I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.

                Comment

                • Heisman
                  Senior Member
                  • Dec 2010
                  • 534

                  #9
                  Originally posted by bwubb View Post
                  I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

                  I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.
                  What is the actual cause of your problem? In this thread there were different causes posted (ie, some fastq files with lines truncated or using bwa without -M).

                  Comment

                  • bwubb
                    Member
                    • Jan 2012
                    • 61

                    #10
                    Ah I am having issues with:

                    Code:
                    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: E0005-FGC0298:HWI-ST970:298:C0MUAACXX:4:1201:13786:41745
                    	at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
                    	at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
                    	at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
                    	at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:418)
                    	at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161)
                    	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
                    	at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)
                    Driving me crazy because this is repeat analysis, but adding yet another hi-seq run to it. I use bwa-sw (bwa aln) for alignment. Is it recommended to use bwa-mem instead with the -M option?

                    EDIT:

                    There must be something greater at work here. I cannot even run ValidateSamFile without running into this error...
                    Last edited by bwubb; 07-10-2013, 09:02 AM.

                    Comment

                    • thedamian
                      Member
                      • Feb 2012
                      • 50

                      #11
                      Hi All,
                      I have the same problem with BWA mem. I used -M option but still I get:
                      Code:
                      .PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
                      I have tried a trick with
                      Code:
                      samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
                      but it didn't help me.

                      With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

                      Did anyone solve this problem?

                      Comment

                      • Clown_Bassie
                        Junior Member
                        • Sep 2014
                        • 2

                        #12
                        Originally posted by thedamian View Post
                        Hi All,
                        I have the same problem with BWA mem. I used -M option but still I get:
                        Code:
                        .PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
                        I have tried a trick with
                        Code:
                        samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
                        but it didn't help me.

                        With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

                        Did anyone solve this problem?
                        This is exactly the same issue I'm running into! Does someone has the answer already?

                        Comment

                        • AdrianP
                          Senior Member
                          • Apr 2011
                          • 130

                          #13
                          Originally posted by JezSupreme View Post
                          From the BWA manual site:
                          -M Mark shorter split hits as secondary (for Picard compatibility).
                          I am going to try this solution now as I have the same issue.

                          Comment

                          • zhkzhou
                            Junior Member
                            • Nov 2011
                            • 4

                            #14
                            JezSupreme and AdrianP are right!
                            The BWA-MEM algorithm performs local alignment. It may produce multiple primary alignments for different part of a query sequence. This is a crucial feature for long sequences. However, some tools such as Picard’s markDuplicates does not work with split alignments. One may consider to use option -M to flag shorter split hits as secondary.

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by SEQadmin2


                              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                              Here are nine questions we think about, in roughly the order they matter, before...
                              Today, 07:11 AM
                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 06:09 AM
                            0 responses
                            16 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            37 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-05-2026, 10:09 AM
                            0 responses
                            42 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-04-2026, 08:59 AM
                            0 responses
                            49 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...