![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to sequence and quantify a 100 bp genomic region using a MiSeq? | laeb | Sample Prep / Library Generation | 1 | 08-16-2013 08:03 AM |
Tools to remove duplicate reads | fanx | Bioinformatics | 3 | 01-29-2013 12:36 PM |
Obtaining UCSC Genomic sequence Given Genomic Coordinates | modi2020 | Bioinformatics | 0 | 12-03-2012 08:45 PM |
exome CNV: remove duplicate reads? | mrfox | Bioinformatics | 2 | 10-22-2012 07:27 AM |
CREST remove duplicate reads | tujchl | Bioinformatics | 0 | 04-26-2012 07:39 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Netherlands Join Date: Sep 2014
Posts: 24
|
![]()
Hi everyone. I have to analyze some paired end reads coming from a Illumina MiSeq experiment. What I want to do is removing duplicate reads that not only have the same start-end coordinates but also have 100% sequence identity. Is there any tool that can help me do that? I want to work with BAM files not with FastQ files. Thanks!
|
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
If sequences have 100% identity then they should have the same mapping coordinates, so there's no reason to work with bam files in this case. I wrote a program that can do this for fastq, but not for bam:
dedupe.sh in=reads.fq out=deduped.fq ac=f t=1 There should be tools that can do so on bam files by sorting by sequence, but I don't know what they are offhand. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|