![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing duplicates is it really necessary? | foxyg | Bioinformatics | 34 | 12-02-2016 02:17 PM |
hello from Mark at Omixon | mzpitman | Introductions | 0 | 11-08-2011 08:23 AM |
Picard - MakeDuplicates (remove pcr duplicates) | dmb | Bioinformatics | 2 | 03-16-2011 08:56 AM |
Picard MarkDuplicates - How to identify duplicates in generated BAM file | makarovv | Bioinformatics | 6 | 11-10-2010 09:02 AM |
26% duplicates | KevinLam | Bioinformatics | 4 | 08-19-2010 05:20 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Liverpool, UK Join Date: Feb 2011
Posts: 30
|
![]()
I am tying to use Picard MarkDuplicates to remove my pcr duplicates from a human rna-seq bam file. The run was paired-end but I only have about 30% properly paired (that is another story).
My command for picard was this: PHP Code:
Original data: PHP Code:
PHP Code:
PHP Code:
Am I doing something wrong? Please help! Thanks Helen |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Liverpool, UK Join Date: Feb 2011
Posts: 30
|
![]()
Thank you for the reply, I will try your suggestion of changing the read names.
I have tried samtools rmdup but it removes >90% of my reads (as duplicates). ![]() Helen |
![]() |
![]() |
![]() |
#4 |
Member
Location: Liverpool, UK Join Date: Feb 2011
Posts: 30
|
![]()
I have changed the read names as suggested and re-run picard, but am still not happy that it has worked properly.
I also included an extra command in the command line READ_NAME_REGEX="[0-9]_[0-9]_[0-9]". An example of the output can be seen below: PHP Code:
I ran samtools rmdup on this Picard output and that removed a further 75% as duplicates! But why is picard not picking them up? My full Picard command is: HTML Code:
java -Xmx10g -jar /path/to/MarkDuplicates.jar INPUT=filename.bam OUTPUT=output.bam METRICS_FILE=output.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT READ_NAME_REGEX="[0-9]_[0-9]_[0-9]" Helen |
![]() |
![]() |
![]() |
#5 |
Member
Location: Ireland Join Date: Jan 2011
Posts: 13
|
![]()
This might be helpful. It suggests that the problem has to do with underscores in the names so the underscore has to be part of the regex.
http://biostar.stackexchange.com/que...markduplicates PK |
![]() |
![]() |
![]() |
#6 |
Member
Location: New York, NY Join Date: Oct 2011
Posts: 41
|
![]()
Your regular expression won't match your read names as you have them. The problem is that you need to include (1) capturing parentheses and (2) use + (to indicate one or more character matches).
To match identifiers like yours (e.g., 1240_122_334) use: READ_NAME_REGEX="([0-9]+)_([0-9]+)_([0-9]+)" Also be sure that the three number in your read names are (in this order) tile/region, x coordinate, y coordinate per the Picard tools documentation |
![]() |
![]() |
![]() |
#7 |
Member
Location: New York, NY Join Date: Oct 2011
Posts: 41
|
![]()
One final thought, check the metrics file to determine how many of your reads are being detected as optical and PCR duplicates. If your regex is working, there should be > 0 optical duplicates
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Philadelphia Join Date: Jul 2009
Posts: 16
|
![]()
Helen,
It seems like you are using an aligner that reports multiple alignments, right? One thing I noticed is that all these remaining reads are "not primary alignment" (I used http://picard.sourceforge.net/explain-flags.html to interpret the flag in col 2). Picard might skip these reads for dup detection. I've used samtools view -F 0x100 xx.bam to remove the non-primary-alignment. Lifeng |
![]() |
![]() |
![]() |
Tags |
pcr duplicates, picard |
Thread Tools | |
|
|