Unconfigured Ad

**Bukowski** · 02-16-2013, 11:29 AM

Why are you marking duplicates in an amplicon based assay? I'm just curious..

**fongchun** · 02-16-2013, 11:32 AM

Originally posted by Bukowski View Post

Why are you marking duplicates in an amplicon based assay? I'm just curious..

Just part of our standard pipeline that we used to analyse all sequencing libraries. Like I mentioned, we could just build a separate analyses pipeline just for this, but it seems odd there isn't simply a parameter to just ignore PCR duplicates....

**Bukowski** · 02-16-2013, 02:54 PM

Originally posted by fongchun View Post

Just part of our standard pipeline that we used to analyse all sequencing libraries. Like I mentioned, we could just build a separate analyses pipeline just for this, but it seems odd there isn't simply a parameter to just ignore PCR duplicates....

Marking duplicates when there is PCR involved seems counter-intuitive. Wont that lead to lots of things with the same start and stop position which will appear to be duplicated? If that's part of the design, why remove them?

Edit:

I should clarify this. I do a lot of in-solution capture analysis, and I de-duplicate the data if I'm using (for instance) SureSelect. But if the experiment is HaloPlex I don't - because de-duplicating the data removes data that is there because of the design - it's unavoidable to have data that matches the characteristics of 'duplicates'.

**fongchun** · 02-16-2013, 03:25 PM

I probably wasn't clear on what I want to actually do. I agree with you that we expect a lot of PCR duplicates and yes it is counter-intuitive to remove them. I am not suggesting that we remove them. I am just saying that as part of an already established pipeline we use, any libraries we align will automatically marks duplicates all sequencing libraries. I was just wondering if there was a parameter in the CollectTargetedPcrMetrics to calculate statistics on a library and ignore the fact there are marked duplicates. This would serve as an alternative solution to developing a branch in the pipeline that won't run mark duplicates. Either solution is fine. We can easily develop a branch. I would just like to know whether there was other options available.

I intend to use all the reads whether they are duplicates or not in our future analyses.

Hope that clarifies the confusion.

Originally posted by Bukowski View Post

Marking duplicates when there is PCR involved seems counter-intuitive. Wont that lead to lots of things with the same start and stop position which will appear to be duplicated? If that's part of the design, why remove them?

Edit:

I should clarify this. I do a lot of in-solution capture analysis, and I de-duplicate the data if I'm using (for instance) SureSelect. But if the experiment is HaloPlex I don't - because de-duplicating the data removes data that is there because of the design - it's unavoidable to have data that matches the characteristics of 'duplicates'.

**Bukowski** · 02-16-2013, 03:51 PM

I guess it's no surprise that in my pipelines I have a switch that says 'don't de-dup the data' for when I need it. Pipelines are not immobile, immovable things, and they're never suitable for every situation.

It seems to me that the problem isn't with Picard, it's behaving exactly as it should, the fact is the pipeline shouldn't be marking duplicates. I guess that answers your question though!

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Ignore PCR Duplicates in CollectTargetedPcrMetrics

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News