Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rmdup can not move duplicates in forward and reverse strand for single-end reads

    Hi, I have a sam file produced by BWA for single-end reads.

    bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
    There is one type of results like the following:

    SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.

    samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
    I have two puzzles:

    First, I wonder if those duplicates shoud be kept?

    Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?

    Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
    Last edited by ct586; 03-11-2012, 06:41 AM. Reason: To make the words more accurately

  • #2
    I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.

    Comment


    • #3
      PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

      rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.

      Comment


      • #4
        Originally posted by nilshomer View Post
        I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
        Thank you! I get it.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

          rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
          Thank you for the explaination of duplicates!

          I do not understand why rmdup is iffy for single end data. I wonder if you can explain it deeply if it does not bother very much.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X