Seqanswers Leaderboard Ad

**swbarnes2** · 11-29-2011, 09:52 AM

That sounds horribly memory intensive, that's probably why almost no one does it that way.

**HESmith** · 11-29-2011, 10:59 AM

I agree but, without a reference genome for alignment, it seems like the only option. A simplistic approach would be to generate hash tables using the first 10 nucleotides from read 1 and read 2 as the key, and keep only one sequence per key. It doesn't account for sequencing errors, but would probably be good enough for my purposes (or at least give me a sense of how much duplication is present). Alternatively, I suppose I could build an assembly from the whole data set, then align to that assembly to identify duplicates.

Any advice/recommendations/alternative approaches would be welcome.

**stuka** · 11-29-2011, 11:03 AM

I've developed a naive tool to brute force compare to do some basic removal using hadoop

GitHub - oklasoft/b-tangs: Binning Trimmer of Artifacts in Next Gen Sequence - Clean out possible PCR artifacts by searching for like sequence reads via some map reduce

https://github.com/oklasoft/b-tangs

Binning Trimmer of Artifacts in Next Gen Sequence - Clean out possible PCR artifacts by searching for like sequence reads via some map reduce - oklasoft/b-tangs

**rudi283** · 11-30-2011, 03:13 AM

In Genomics Workbench, from CLC Bio, you can remove PCR duplicates before alignment

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how to filter unaligned duplicate reads

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News