Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rmdup can not move duplicates in forward and reverse strand for single-end reads

    Hi, I have a sam file produced by BWA for single-end reads.

    bwa samse database.fasta aln_sa.sai short_read.fastq >aln.sam
    There is one type of results like the following:

    SRR015141.1022459 16 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIII1.IIII9IIIIIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    SRR015141.1621515 0 chr17 33965188 37 26M * 0 0 AAAACCCAACCTCCCCCCATTATTAA IIIIIIIIIIIIGIIBIIIIIIIIII

    XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:26
    When I use samtools rmdup remove duplicates from sorted bam files,these duplicates can not be recognized and kept remained.

    samtools rmdup -s input.SORT.bam input.SORT.rmdup.s.bam
    I have two puzzles:

    First, I wonder if those duplicates shoud be kept?

    Second, if it is possible that we can tell forward or reverse strand from single read sequencing, just as the flag 16 and 0 shows?

    Also, if I want to remove this type of duplicates, what parameters should I use? I have written a python script which can do this, but it would be better if standard tools can have this function.
    Last edited by ct586; 03-11-2012, 06:41 AM. Reason: To make the words more accurately

  • #2
    I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.

    Comment


    • #3
      PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

      rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.

      Comment


      • #4
        Originally posted by nilshomer View Post
        I don't think duplicates will sequence on opposite strands. Therefore the rmdup behavior is correct.
        Thank you! I get it.

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          PCR duplicates happen after the adapters have been added. If you have the exact same sequence in both directions, that means they aren't PCR duplicates, because the adaptors were put on in different ways.

          rmdup with single end data is iffy. You will greatly overestimate the true number of PCR duplicates. If you really only have 26-mers, you are imposing a 52x cap on your data, which might be kind of low, depending on how much coverage you really have.
          Thank you for the explaination of duplicates!

          I do not understand why rmdup is iffy for single end data. I wonder if you can explain it deeply if it does not bother very much.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Working...
          X