Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to sequence and quantify a 100 bp genomic region using a MiSeq? laeb Sample Prep / Library Generation 1 08-16-2013 08:03 AM
Tools to remove duplicate reads fanx Bioinformatics 3 01-29-2013 12:36 PM
Obtaining UCSC Genomic sequence Given Genomic Coordinates modi2020 Bioinformatics 0 12-03-2012 08:45 PM
exome CNV: remove duplicate reads? mrfox Bioinformatics 2 10-22-2012 07:27 AM
CREST remove duplicate reads tujchl Bioinformatics 0 04-26-2012 07:39 PM

Thread Tools
Old 01-12-2015, 06:59 AM   #1
Location: Netherlands

Join Date: Sep 2014
Posts: 24
Default remove duplicate reads 100% sequence identity and genomic coordinates

Hi everyone. I have to analyze some paired end reads coming from a Illumina MiSeq experiment. What I want to do is removing duplicate reads that not only have the same start-end coordinates but also have 100% sequence identity. Is there any tool that can help me do that? I want to work with BAM files not with FastQ files. Thanks!
thiNGS is offline   Reply With Quote
Old 01-12-2015, 09:53 AM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

If sequences have 100% identity then they should have the same mapping coordinates, so there's no reason to work with bam files in this case. I wrote a program that can do this for fastq, but not for bam: in=reads.fq out=deduped.fq ac=f t=1

There should be tools that can do so on bam files by sorting by sequence, but I don't know what they are offhand.
Brian Bushnell is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:20 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO