Hi, I have a sam file produced by BWA for single-end reads.
There is one type of results like the following:
When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.
I have two puzzles:
First, I wonder if those duplicates shoud be kept?
Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?
Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII
XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII
XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII
XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
First, I wonder if those duplicates shoud be kept?
Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?
Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
Comment