SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Forward adapter at end of forward paired end read? darthsequencer Illumina/Solexa 1 03-04-2012 01:37 PM
Separate forward and reverse coverage in Artemis nivea Bioinformatics 4 12-12-2011 03:20 AM
how to define a forward or reverse read file poorphd Illumina/Solexa 3 11-22-2011 12:34 PM
How to separate coverage of forward and reverse reads on same axis? Kennels Bioinformatics 6 05-04-2011 12:12 AM
forward and reverse sequance asankaf General 5 05-27-2009 07:48 AM

Reply
 
Thread Tools
Old 03-11-2012, 06:35 AM   #1
ct586
Junior Member
 
Location: beijing

Join Date: Mar 2012
Posts: 7
Default rmdup can not move duplicates in forward and reverse strand for single-end reads

Hi, I have a sam file produced by BWA for single-end reads.

Quote:
bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
There is one type of results like the following:

Quote:
SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII

XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII

XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.

Quote:
samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
I have two puzzles:

First, I wonder if those duplicates shoud be kept?

Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?

Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.

Last edited by ct586; 03-11-2012 at 06:41 AM. Reason: To make the words more accurately
ct586 is offline   Reply With Quote
Old 03-11-2012, 07:39 AM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
nilshomer is offline   Reply With Quote
Old 03-11-2012, 10:25 AM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
swbarnes2 is offline   Reply With Quote
Old 03-11-2012, 04:52 PM   #4
ct586
Junior Member
 
Location: beijing

Join Date: Mar 2012
Posts: 7
Smile

Quote:
Originally Posted by nilshomer View Post
I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
Thank you! I get it.
ct586 is offline   Reply With Quote
Old 03-11-2012, 05:01 PM   #5
ct586
Junior Member
 
Location: beijing

Join Date: Mar 2012
Posts: 7
Default

Quote:
Originally Posted by swbarnes2 View Post
PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
Thank you for the explaination of duplicates!

I do not understand why rmdup is iffy for single end data. I wonder if you can explain it deeply if it does not bother very much.
ct586 is offline   Reply With Quote
Reply

Tags
forward, reverse, rmdup, samtools, single-end sequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO