Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ct586
    Junior Member
    • Mar 2012
    • 7

    rmdup can not move duplicates in forward and reverse strand for single-end reads

    Hi, I have a sam file produced by BWA for single-end reads.

    bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
    There is one type of results like the following:

    SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.

    samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
    I have two puzzles:

    First, I wonder if those duplicates shoud be kept?

    Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?

    Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
    Last edited by ct586; 03-11-2012, 06:41 AM. Reason: To make the words more accurately
  • nilshomer
    Nils Homer
    • Nov 2008
    • 1283

    #2
    I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

      rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.

      Comment

      • ct586
        Junior Member
        • Mar 2012
        • 7

        #4
        Originally posted by nilshomer View Post
        I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
        Thank you! I get it.

        Comment

        • ct586
          Junior Member
          • Mar 2012
          • 7

          #5
          Originally posted by swbarnes2 View Post
          PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

          rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
          Thank you for the explaination of duplicates!

          I do not understand why rmdup is iffy for single end data. I wonder if you can explain it deeply if it does not bother very much.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          16 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          34 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          37 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          24 views
          0 reactions
          Last Post SEQadmin2  
          Working...