Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem removing duplicate reads? (samtools and picard) cbl Bioinformatics 19 09-17-2015 12:01 PM
example for using Picard removing duplicate reads? fabrice Bioinformatics 9 10-18-2013 03:32 AM
duplicate reads in Illumina short, single end reads of RNAseq data inbarpl Bioinformatics 4 05-22-2012 09:36 AM
Removing duplicate reads for tophat? hong_sunwoo RNA Sequencing 2 10-09-2010 01:46 AM
Removing duplicate reads from multigig .csfasta Bueller_007 Bioinformatics 7 06-26-2010 04:07 PM

Thread Tools
Old 07-31-2012, 11:31 AM   #1
Senior Member
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Default removing duplicate PE reads from unmappable data

This topic has been discussed a fair bit on seqanswers but I haven't found the answer to this exact question, so am throwing it out there again. I have some 100 bp PE metagenomic data sets from rather low complexity samples. Analysis of duplication levels in read 1 and read 2 separately shows rather high levels of duplication (25-50%). I'm interested in identifying true PCR duplicates through analysis of read 1 and read 2 together-- i.e. true duplicates should be identical at both ends of the molecule. This data is from a community without reference genomes so mapping then using samtools rmdup is not an option. Is there an existing tool for non-map based duplicate removal of PE reads or do I need to cobble something together? I think something like the following could work:

1. separate out the first xx bp of read 1 and read 2, then merge using fastq-joiner in galaxy or the like
2. remove exact duplicates using fastx-collapser in galaxy fastx toolkit
3. extract list of non-duplicated reads from output of fastx-collapser
4. pull these reads out of the original fastq files for read 1 and read 2
greigite is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 08:21 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO