![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Error with MarkDuplicates in Picard | slowsmile | Bioinformatics | 13 | 11-01-2015 04:16 AM |
Picard's MarkDuplicates -> OutOfMemoryError | elgor | Bioinformatics | 15 | 08-05-2013 07:37 AM |
MarkDuplicates in picard | bair | Bioinformatics | 3 | 12-23-2010 12:00 PM |
picard markduplicates on huge files | rcorbett | Bioinformatics | 2 | 09-17-2010 05:39 AM |
Picard MarkDuplicates | wangzkai | Bioinformatics | 2 | 05-18-2010 10:14 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: USA Join Date: Oct 2009
Posts: 41
|
![]()
I just tried Picard to remove PCR duplicates and used the test_sorted.bam (obtained by using samtools sort) as the input file. My following command
java -jar MarkDuplicates.jar test_sorted.bam test_rmdup.bam gave me an error ERROR: Invalid argument 'test_sorted.bam'. Anybody knows where I did wrong? Thanks for all your help in advance. |
![]() |
![]() |
![]() |
#2 | |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Member
Location: USA Join Date: Oct 2009
Posts: 41
|
![]()
I tried again
java -Xmx2g -jar ~/picard-tools-1.21/MarkDuplicates.jar INPUT=test_sorted.bam OUTPUT=test_rmdup.bam METRICS_FILE=PCR_duplicates REMOVE_DUPLICATES=true And I got this error: [Sat Jun 12 22:11:22 EDT 2010] net.sf.picard.sam.MarkDuplicates INPUT=test_sorted.bam OUTPUT=test_rmdup.bam METRICS_FILE=PCR_duplicates REMOVE_DUPLICATES=true ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9] ![]() ![]() ![]() INFO 2010-06-12 22:11:22 MarkDuplicates Start of doWork freeMemory: 31062256; totalMemory: 31588352; maxMemory: 1908932608 INFO 2010-06-12 22:11:22 MarkDuplicates Reading input file and constructing read end information. INFO 2010-06-12 22:11:22 MarkDuplicates Will retain up to 7575129 data points before spilling to disk. [Sat Jun 12 22:11:23 EDT 2010] net.sf.picard.sam.MarkDuplicates done. Runtime.totalMemory()=152829952 Exception in thread "main" net.sf.picard.PicardException: test_sorted.bam is not coordinate sorted. at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:250) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:112) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:96) It said "test_sorted.bam is not coordinate sorted.", but I got this test_sorted.bam after I used "samtools sort" actually... where did I do wrong?.. |
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
You can view the header information for your bam file with the command Code:
samtools view -H test_sorted.bam Before you can use Picard to remove duplicates you will have to fix the SO tag. Fourtunately Picard has a command to this, ReplaceSamHeader. Alternatively you could use the Picard SortSam instead of the samtools sort (For the record I don't know for sure if Picard SortSam properly updates the SO tag.) |
|
![]() |
![]() |
![]() |
#5 |
Nils Homer
Location: Boston, MA, USA Join Date: Nov 2008
Posts: 1,285
|
![]()
You can also add the "AS=true" option to assume that the input is sorted.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Antwerp, BE or Cambrigde, UK Join Date: Nov 2008
Posts: 12
|
![]()
Thanks. I got the exactly same problem...
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Huntsville AL Join Date: Jul 2008
Posts: 13
|
![]()
Greetings
I'm having the same problem. I used the command line argument to assume it was sorted but I'm getting screwy results. When the MarkDuplicates method says it wants 'coordinate sorted' data are they referring to tile-x-y or a genomic alignment? It seems one could find duplicates without reference to a genome. If it's tile-x-y then is it lexical or numeric? Thanks Mike |
![]() |
![]() |
![]() |
#8 |
Member
Location: Ann Arbor, MI Join Date: Oct 2008
Posts: 57
|
![]()
The simple solution is to use samtools sort the file first. I've been using the Picard tools MergeSamFiles.jar to both merge and sort because I typically have multiple lanes of data for each sample.
Mike, I don't think it will work without being aligned because I believe that Picard works by looking at the mappings. |
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Yes, you could find duplicates without reference to a genome. You would have to perform an all vs. all search which would require an huge amount of time and RAM when you are talking about tens or hundreds of million reads. |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Wageningen Join Date: Jan 2009
Posts: 11
|
![]()
I would like to use Picard duplicate removal also. However, i ran into some trouble using a SAM-file outputted by CLC-Bio Genomics workbench. Anyone had an idead how to fix this issue?
Code:
root@thomasg-desktop:/home/thomasg/Downloads/\tmp/picard-tools-1.27# java -jar MergeSamFiles.jar I=/home/thomasg/RF_7.fastq\ trimmed\ \(paired\)\ mapping\ \(11205\ references\).sam SO=coordinate AS=false O=/home/thomasg/out.sam [Thu Aug 12 14:30:53 CEST 2010] net.sf.picard.sam.MergeSamFiles OUTPUT=/home/thomasg/out.sam SORT_ORDER=coordinate ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=false USE_THREADING=false TMP_DIR=/tmp/root VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 INFO 2010-08-12 14:30:53 MergeSamFiles Sorting input files using temp directory /tmp/root [Thu Aug 12 14:30:53 CEST 2010] net.sf.picard.sam.MergeSamFiles done. Runtime.totalMemory()=379322368 Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Paired read should be marked as first of pair or second of pair.; File /home/thomasg/RF_7.fastq trimmed (paired) mapping (11205 references).sam; Line 11208 Line: RF_43280 25 Contig_1 1 60 50M * 0 0 ACAGCGACTCAACCAAAGGAATCCTATATAGAAATGCTATTAGGAATCCC HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH NH:i:1 at net.sf.samtools.SAMTextReader.reportErrorParsingLine(SAMTextReader.java:220) at net.sf.samtools.SAMTextReader.access$500(SAMTextReader.java:40) at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:424) at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:268) at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:240) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:609) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:587) at net.sf.picard.util.PeekableIterator.advance(PeekableIterator.java:71) at net.sf.picard.util.PeekableIterator.<init>(PeekableIterator.java:41) at net.sf.picard.sam.ComparableSamRecordIterator.<init>(ComparableSamRecordIterator.java:51) at net.sf.picard.sam.MergingSamRecordIterator.addIterator(MergingSamRecordIterator.java:93) at net.sf.picard.sam.MergingSamRecordIterator.startIterationIfRequired(MergingSamRecordIterator.java:102) at net.sf.picard.sam.MergingSamRecordIterator.hasNext(MergingSamRecordIterator.java:117) at net.sf.picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:190) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) at net.sf.picard.sam.MergeSamFiles.main(MergeSamFiles.java:83) |
![]() |
![]() |
![]() |
#11 |
Member
Location: Huntsville AL Join Date: Jul 2008
Posts: 13
|
![]()
I had a similar problem with sam files derived from Illumina output. The problem was the mate IDs that Illumina uses, i.e., index:pairN:filterFlag. I believe the tools expect pair IDs in the form /1 and /2. Check the output from the workbench to see how they identify pairs.
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: The Netherlands Join Date: Jan 2010
Posts: 3
|
![]()
Dear all,
For my sequencing project I would also like to remove duplicates. Did any of you already work with the CLC Assembly Cell to remove them? I have no idea where to start.
__________________
Time is a great teacher. Unfortunately, it kills all its pupils. |
![]() |
![]() |
![]() |
#13 | |
Junior Member
Location: Beijing.China Join Date: Jan 2015
Posts: 1
|
![]() Quote:
java -jar $softwave/SortSam.jar I=$O/HFHm001_1_Tri.fastq_bismark_bt2_pe.bam O=$O/HFHm001_1_Tri.fastq_bismark_bt2_pe.sorted.bam sort_order=coordinate |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|