Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rmdup can not move duplicates in forward and reverse strand for single-end reads

    Hi, I have a sam file produced by BWA for single-end reads.

    bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
    There is one type of results like the following:

    SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.

    samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
    I have two puzzles:

    First, I wonder if those duplicates shoud be kept?

    Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?

    Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
    Last edited by ct586; 03-11-2012, 06:41 AM. Reason: To make the words more accurately

  • #2
    I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.

    Comment


    • #3
      PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

      rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.

      Comment


      • #4
        Originally posted by nilshomer View Post
        I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
        Thank you! I get it.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

          rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
          Thank you for the explaination of duplicates!

          I do not understand why rmdup is iffy for single end data. I wonder if you can explain it deeply if it does not bother very much.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X