Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error with MarkDuplicates in Picard

    Dear All
    I am still on the learning curve with the GATK tool but I encountered an error at the duplicates marking step with Picard tool.

    The procedure I did is the following:

    I generated bam files for each sample using tophat 1.33 and I sorted
    each bam file (one file per sample) using picard ReorderSam.jar to
    hg19 reference genome.

    After that I added read group information using Picard
    AddOrReplaceReadGroups.jar.

    Then I tried to remove pair duplicates using the MarkDuplicates.jar in
    Picard. However, I encountered error at this step and failed to
    generated the duplicates-removed files after running the Picard code.

    The errors I received are like the following:


    [Thu Feb 16 15:06:56 EST 2012] net.sf.picard.sam.MarkDuplicates
    INPUT=[/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/sorted_GP.bam]
    OUTPUT=/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/marked.bam
    METRICS_FILE=/media/FreeAgent GoFlex
    Drive/RNAseq-coloncancer/LID46437/tophat_out/metrics
    REMOVE_DUPLICATES=false ASSUME_SORTED=false
    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
    MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
    SORTING_COLLECTION_SIZE_RATIO=0.25
    READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).*
    OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
    VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
    MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Thu Feb 16 15:06:56 EST 2012] Executing as
    slowsmile@slowsmile-HP-xw8600-Workstation on Linux 3.0.0-15-generic
    amd64; OpenJDK 64-Bit Server VM 1.7.0_147-icedtea-b147; Picard
    version: 1.60(1086)
    INFO 2012-02-16 15:06:56 MarkDuplicates Start of doWork freeMemory:
    124147272; totalMemory: 125698048; maxMemory: 1866006528
    INFO 2012-02-16 15:06:56 MarkDuplicates Reading input file and
    constructing read end information.
    INFO 2012-02-16 15:06:56 MarkDuplicates Will retain up to 7404787 data
    points before spilling to disk.
    INFO 2012-02-16 15:07:14 MarkDuplicates Read 1000000 records. Tracking
    129157 as yet unmatched pairs. 6550 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:19 MarkDuplicates Read 2000000 records. Tracking
    136196 as yet unmatched pairs. 9506 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:24 MarkDuplicates Read 3000000 records. Tracking
    190648 as yet unmatched pairs. 61032 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:29 MarkDuplicates Read 4000000 records. Tracking
    144992 as yet unmatched pairs. 9135 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:34 MarkDuplicates Read 5000000 records. Tracking
    180193 as yet unmatched pairs. 36398 records in RAM. Last sequence
    index: 0
    INFO 2012-02-16 15:07:39 MarkDuplicates Read 6000000 records. Tracking
    186193 as yet unmatched pairs. 35242 records in RAM. Last sequence
    index: 0
    [Thu Feb 16 15:07:42 EST 2012] net.sf.picard.sam.MarkDuplicates done.
    Elapsed time: 0.78 minutes.
    Runtime.totalMemory()=1352466432
    Exception in thread "main" net.sf.picard.PicardException: Value was
    put into PairInfoMap more than once. 1:
    HT29.LANE1:HWI-ST978:1370AHMACXX:5:1207:8810:84360
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)

    I read the log carefully but cannot figure out the source of error.
    What does "Value was put into PairInfoMap more than once" mean here?
    Can you help me resolve this problem?

    Thanks a lot

  • #2
    Hello,

    I had the same issue, does someone has any clues about this?

    Thanks,

    Comment


    • #3
      I am answering myself,
      it was due to fake read mapped with bwa such as:
      (null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
      (null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
      (null) 65 chr21 48313514 25 0M chr18 18626503 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:25 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0

      (null) was found more than twice and MarkDuplicates complained. By increasing the mapping quality to 26 we can get rid of them or using samtools view -f 0x2 since they are not properly paired.

      Comment


      • #4
        I am running into the same error with picardmarkduplicates. My alignment was done with bowtie2. I have run this script before on different data sets and didn't see this error. Since you figured out what was wrong with your data I was hoping you could let me know how you did that. Here's the error I get.

        Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: L3:MWR-PRG-0014:74:C0E94ACXX:3:1206:11809:158670

        Comment


        • #5
          Hi ginolhac,

          I encountered the same problem as you when i tried to use MarkDuplicate command and when i looked at the problematic read i found that the Mapping Quality of those two reads were more than 25. Then how do we remove those reads? Thanks in advance for your help......

          Comment


          • #6
            He,

            actually the issue came from fastq files that were not in sync. Some reads were missing at the end of one of the file. That explained those reads with a (null) name.
            To remove those, I used:
            Code:
            samtools view -h file.bam | grep -v null | samtools view -bS - > file_clean.bam
            hope this helps

            Comment


            • #7
              I encountered the same error using picard tools MarkDuplicates and it was related to the alignment I had done using BWA (BWA MEM).

              I had failed to use the -M option when running the alignment which enables compatibility with picard-tools MarkDuplicates function. I went back and re-ran the alignment with that option and it fixed the error.

              From the BWA manual site:
              -M Mark shorter split hits as secondary (for Picard compatibility).

              Comment


              • #8
                I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

                I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.

                Comment


                • #9
                  Originally posted by bwubb View Post
                  I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

                  I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.
                  What is the actual cause of your problem? In this thread there were different causes posted (ie, some fastq files with lines truncated or using bwa without -M).

                  Comment


                  • #10
                    Ah I am having issues with:

                    Code:
                    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: E0005-FGC0298:HWI-ST970:298:C0MUAACXX:4:1201:13786:41745
                    	at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
                    	at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
                    	at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
                    	at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:418)
                    	at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161)
                    	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
                    	at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)
                    Driving me crazy because this is repeat analysis, but adding yet another hi-seq run to it. I use bwa-sw (bwa aln) for alignment. Is it recommended to use bwa-mem instead with the -M option?

                    EDIT:

                    There must be something greater at work here. I cannot even run ValidateSamFile without running into this error...
                    Last edited by bwubb; 07-10-2013, 09:02 AM.

                    Comment


                    • #11
                      Hi All,
                      I have the same problem with BWA mem. I used -M option but still I get:
                      Code:
                      .PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
                      I have tried a trick with
                      Code:
                      samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
                      but it didn't help me.

                      With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

                      Did anyone solve this problem?

                      Comment


                      • #12
                        Originally posted by thedamian View Post
                        Hi All,
                        I have the same problem with BWA mem. I used -M option but still I get:
                        Code:
                        .PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
                        I have tried a trick with
                        Code:
                        samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
                        but it didn't help me.

                        With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

                        Did anyone solve this problem?
                        This is exactly the same issue I'm running into! Does someone has the answer already?

                        Comment


                        • #13
                          Originally posted by JezSupreme View Post
                          From the BWA manual site:
                          -M Mark shorter split hits as secondary (for Picard compatibility).
                          I am going to try this solution now as I have the same issue.

                          Comment


                          • #14
                            JezSupreme and AdrianP are right!
                            The BWA-MEM algorithm performs local alignment. It may produce multiple primary alignments for different part of a query sequence. This is a crucial feature for long sequences. However, some tools such as Picard’s markDuplicates does not work with split alignments. One may consider to use option -M to flag shorter split hits as secondary.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X