SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use Picard's MarkDuplicates cliff Bioinformatics 12 01-26-2015 11:56 PM
Picard MarkDuplicates error for RNA-Seq RockChalkJayhawk Bioinformatics 6 07-11-2012 03:07 PM
Error "RG ID on SAMRecord not found in header" from Picard's MarkDuplicates.jar‏ cliff Bioinformatics 4 11-10-2011 04:27 AM
MarkDuplicates in picard bair Bioinformatics 3 12-23-2010 12:00 PM
Picard MarkDuplicates wangzkai Bioinformatics 2 05-18-2010 10:14 PM

Reply
 
Thread Tools
Old 02-17-2012, 08:44 AM   #1
slowsmile
Member
 
Location: long island

Join Date: May 2011
Posts: 22
Default Error with MarkDuplicates in Picard

Dear All
I am still on the learning curve with the GATK tool but I encountered an error at the duplicates marking step with Picard tool.

The procedure I did is the following:

I generated bam files for each sample using tophat 1.33 and I sorted
each bam file (one file per sample) using picard ReorderSam.jar to
hg19 reference genome.

After that I added read group information using Picard
AddOrReplaceReadGroups.jar.

Then I tried to remove pair duplicates using the MarkDuplicates.jar in
Picard. However, I encountered error at this step and failed to
generated the duplicates-removed files after running the Picard code.

The errors I received are like the following:


Quote:
[Thu Feb 16 15:06:56 EST 2012] net.sf.picard.sam.MarkDuplicates
INPUT=[/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/sorted_GP.bam]
OUTPUT=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/marked.bam
METRICS_FILE=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/metrics
REMOVE_DUPLICATES=false ASSUME_SORTED=false
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
SORTING_COLLECTION_SIZE_RATIO=0.25
READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).*
OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Feb 16 15:06:56 EST 2012] Executing as
[email protected] on Linux 3.0.0-15-generic
amd64; OpenJDK 64-Bit Server VM 1.7.0_147-icedtea-b147; Picard
version: 1.60(1086)
INFO 2012-02-16 15:06:56 MarkDuplicates Start of doWork freeMemory:
124147272; totalMemory: 125698048; maxMemory: 1866006528
INFO 2012-02-16 15:06:56 MarkDuplicates Reading input file and
constructing read end information.
INFO 2012-02-16 15:06:56 MarkDuplicates Will retain up to 7404787 data
points before spilling to disk.
INFO 2012-02-16 15:07:14 MarkDuplicates Read 1000000 records. Tracking
129157 as yet unmatched pairs. 6550 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:19 MarkDuplicates Read 2000000 records. Tracking
136196 as yet unmatched pairs. 9506 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:24 MarkDuplicates Read 3000000 records. Tracking
190648 as yet unmatched pairs. 61032 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:29 MarkDuplicates Read 4000000 records. Tracking
144992 as yet unmatched pairs. 9135 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:34 MarkDuplicates Read 5000000 records. Tracking
180193 as yet unmatched pairs. 36398 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:39 MarkDuplicates Read 6000000 records. Tracking
186193 as yet unmatched pairs. 35242 records in RAM. Last sequence
index: 0
[Thu Feb 16 15:07:42 EST 2012] net.sf.picard.sam.MarkDuplicates done.
Elapsed time: 0.78 minutes.
Runtime.totalMemory()=1352466432
Exception in thread "main" net.sf.picard.PicardException: Value was
put into PairInfoMap more than once. 1:
HT29.LANE1:HWI-ST978:1370AHMACXX:5:1207:8810:84360
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)

I read the log carefully but cannot figure out the source of error.
What does "Value was put into PairInfoMap more than once" mean here?
Can you help me resolve this problem?

Thanks a lot
slowsmile is offline   Reply With Quote
Old 04-02-2012, 02:04 AM   #2
ginolhac
Junior Member
 
Location: Copenhagen

Join Date: Oct 2010
Posts: 4
Default

Hello,

I had the same issue, does someone has any clues about this?

Thanks,
ginolhac is offline   Reply With Quote
Old 04-02-2012, 08:52 AM   #3
ginolhac
Junior Member
 
Location: Copenhagen

Join Date: Oct 2010
Posts: 4
Default

I am answering myself,
it was due to fake read mapped with bwa such as:
(null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
(null) 73 chr21 48313514 25 0M = 48313514 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:0 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0
(null) 65 chr21 48313514 25 0M chr18 18626503 0 * * XT:A:U NM:i:0 SM:i:25 AM:i:25 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:0

(null) was found more than twice and MarkDuplicates complained. By increasing the mapping quality to 26 we can get rid of them or using samtools view -f 0x2 since they are not properly paired.
ginolhac is offline   Reply With Quote
Old 05-01-2012, 10:44 AM   #4
shawpa
Member
 
Location: Pittsburgh

Join Date: Aug 2011
Posts: 71
Default

I am running into the same error with picardmarkduplicates. My alignment was done with bowtie2. I have run this script before on different data sets and didn't see this error. Since you figured out what was wrong with your data I was hoping you could let me know how you did that. Here's the error I get.

Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: L3:MWR-PRG-0014:74:C0E94ACXX:3:1206:11809:158670
shawpa is offline   Reply With Quote
Old 07-09-2012, 03:00 PM   #5
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Hi ginolhac,

I encountered the same problem as you when i tried to use MarkDuplicate command and when i looked at the problematic read i found that the Mapping Quality of those two reads were more than 25. Then how do we remove those reads? Thanks in advance for your help......
upendra_35 is offline   Reply With Quote
Old 07-10-2012, 12:00 AM   #6
ginolhac
Junior Member
 
Location: Copenhagen

Join Date: Oct 2010
Posts: 4
Default

He,

actually the issue came from fastq files that were not in sync. Some reads were missing at the end of one of the file. That explained those reads with a (null) name.
To remove those, I used:
Code:
samtools view -h file.bam | grep -v null | samtools view -bS - > file_clean.bam
hope this helps
ginolhac is offline   Reply With Quote
Old 04-17-2013, 08:23 PM   #7
JezSupreme
Junior Member
 
Location: Perth, Western Australia

Join Date: Mar 2013
Posts: 6
Default

I encountered the same error using picard tools MarkDuplicates and it was related to the alignment I had done using BWA (BWA MEM).

I had failed to use the -M option when running the alignment which enables compatibility with picard-tools MarkDuplicates function. I went back and re-ran the alignment with that option and it fixed the error.

From the BWA manual site:
-M Mark shorter split hits as secondary (for Picard compatibility).
JezSupreme is offline   Reply With Quote
Old 07-10-2013, 08:21 AM   #8
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 55
Default

I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.
bwubb is offline   Reply With Quote
Old 07-10-2013, 08:37 AM   #9
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

Quote:
Originally Posted by bwubb View Post
I have been struggling with this issue. I have sample data merged from a Illumina PE runs. When trying to find other information/solutions it was suggested to modify the read group ID to include lane or run identification and then re-merge.

I have done that, but I still receive this error. Has anyone been able to resolve this issue? I could try to remove the offending read, but Im concerned there will be many more after.
What is the actual cause of your problem? In this thread there were different causes posted (ie, some fastq files with lines truncated or using bwa without -M).
Heisman is offline   Reply With Quote
Old 07-10-2013, 08:50 AM   #10
bwubb
Member
 
Location: Philadelphia

Join Date: Jan 2012
Posts: 55
Default

Ah I am having issues with:

Code:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: E0005-FGC0298:HWI-ST970:298:C0MUAACXX:4:1201:13786:41745
	at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
	at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
	at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
	at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:418)
	at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:161)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
	at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:145)
Driving me crazy because this is repeat analysis, but adding yet another hi-seq run to it. I use bwa-sw (bwa aln) for alignment. Is it recommended to use bwa-mem instead with the -M option?

EDIT:

There must be something greater at work here. I cannot even run ValidateSamFile without running into this error...

Last edited by bwubb; 07-10-2013 at 10:02 AM.
bwubb is offline   Reply With Quote
Old 09-13-2013, 02:27 AM   #11
thedamian
Member
 
Location: Barcelona

Join Date: Feb 2012
Posts: 49
Default

Hi All,
I have the same problem with BWA mem. I used -M option but still I get:
Code:
.PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
I have tried a trick with
Code:
samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
but it didn't help me.

With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

Did anyone solve this problem?
thedamian is offline   Reply With Quote
Old 09-01-2014, 02:00 AM   #12
Clown_Bassie
Junior Member
 
Location: Groningen

Join Date: Sep 2014
Posts: 2
Default

Quote:
Originally Posted by thedamian View Post
Hi All,
I have the same problem with BWA mem. I used -M option but still I get:
Code:
.PicardException: Value was put into PairInfoMap more than once.  1: null:M00840:39:000000000-A5TE9:1:2103:11538:25521
I have tried a trick with
Code:
samtools view -h before.bam | grep -v null | samtools view -bS - > cleaned.bam
but it didn't help me.

With BWA aln everthing is ok, but it's not recommened for my data since reads are ~251 bases long.

Did anyone solve this problem?
This is exactly the same issue I'm running into! Does someone has the answer already?
Clown_Bassie is offline   Reply With Quote
Old 09-03-2014, 09:32 AM   #13
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default

Quote:
Originally Posted by JezSupreme View Post
From the BWA manual site:
-M Mark shorter split hits as secondary (for Picard compatibility).
I am going to try this solution now as I have the same issue.
AdrianP is offline   Reply With Quote
Old 11-01-2015, 04:16 AM   #14
zhkzhou
Junior Member
 
Location: Beijing

Join Date: Nov 2011
Posts: 4
Default

JezSupreme and AdrianP are right!
The BWA-MEM algorithm performs local alignment. It may produce multiple primary alignments for different part of a query sequence. This is a crucial feature for long sequences. However, some tools such as Picard’s markDuplicates does not work with split alignments. One may consider to use option -M to flag shorter split hits as secondary.
zhkzhou is offline   Reply With Quote
Reply

Tags
error, gatk, markduplicates, picard

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO