Seqanswers Leaderboard Ad

**whataBamBam** · 05-28-2013, 06:30 AM

Hey tdoniger

Did you ever find the solution to your problem? I have the same problem and my metrics file only claims one percent!

I'm currently running 3 different solutions to the problem and i'm waiting for the batches to finish..

- use samtools rmdup instead
- add VALIDATION_STRINGENCY=LENIENT to my command string for MarkDuplicates
- don't bother to mark duplicates (I only did this becuase GATK requires it - my other pipeline will run happily without this step) and try to trick GATK into accepting my file by adding an @PG line into my header to say I ran MarkDuplicates (yeah I know this is probably not recommeneded I just thought I'd see what happened) i.e. use samtols reheader to take the header from the file that MarkDuplicates marked ever read as a duplicate in and put it onto the file I wuld of used as input.

My command was

java -Xmx2G -jar MarkDuplicates.jar INPUT=infile.sorted.bam OUTPUT=outfile.sorted.dedupe.bam METRICS_FILE=myMetricsFile

I'd also previoulsy sorted the bam using samtools

**tdoniger** · 07-09-2013, 01:12 AM

I thought because every line contained: PG:Z:MarkDuplicates - that these were the reads marked as duplicates. This is not the case. It is the flag set in the second column that indicates whether it is a duplicate or not.

try: samtools flagstat library_no_dups.bam
You can find the flags that represent the duplicates. See- http://picard.sourceforge.net/explain-flags.html

**whataBamBam** · 07-09-2013, 11:24 AM

I solved this and now I can't remember how. Really slack of me not to come back and post the solution but tommorow I'll check my pipelines and have a look

**tdoniger** · 07-10-2013, 04:57 AM

Hi,

Thanks! But I was trying to explain that I managed to solve it. I had thought that every line marked by "PG:Z:MarkDuplicates" was a duplicate, but really this was not the case. The duplicates are marked in the flag in the second column of the same file.

Best,
Tirza

**TonyBrooks** · 07-10-2013, 05:06 AM

Going off piste here, but 43% duplication is pretty high.
How much PCR did you do?

**tdoniger** · 07-10-2013, 05:19 AM

quite a bit. very little starting material

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Picard MarkDuplicates - whole bam file marked as duplicates

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News