Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard MarkDuplicates per chromosome

    Is it ok to split an aligned bam by chromosome and then run MarkDuplicates on each of the files?

  • #2
    No in the sense you'll miss PE reads that have each read map to different chromosomes. This is a key advantage of using MarkDuplicates over the samtools method.

    Comment


    • #3
      I wouldn't worry about splitting by chromosome and running MarkDuplicates, in fact seems quite astute to me. Do you expect gene duplicates and this is why you are doing it? Or is it due to limited computational resources, or another issue?

      I have not tried this but would be interested to see if differences occur between split-by-chr and full bams, can you report back about it if you do that?

      Comment


      • #4
        Originally posted by Heisman View Post
        No in the sense you'll miss PE reads that have each read map to different chromosomes. This is a key advantage of using MarkDuplicates over the samtools method.
        What do these reads tell us? That there is a gene duplication? Would you retain these? And are they common? I have not found any in my own data, and would certainly remove them. I have only dealt with 100bp PE data though.

        Comment


        • #5
          Ive been asked to make a pipeline run faster on a distributed system. We can parallize most of the steps (alignment, some of the gatk steps, etc) but the MarkDuplicates is quite time consuming. If I split by chormosome then run on multiple machines it gets done much faster but I don't want it to affect the results.

          I am planning tests to compare the deduped bams that are produced soon. I will post the results.

          Comment


          • #6
            Read through this FAQ for possible tips/explanations to make it faster: http://sourceforge.net/apps/mediawik...=Main_Page#FAQ

            bruce01, no idea how common they are but if you want to remove the duplicates you can't split the bam file up by chromosome, I don't think (I could in theory be wrong; I've never considered doing this).

            Comment


            • #7
              Originally posted by smleighton View Post
              I don't want it to affect the results.
              you will not have the exact same results. the likelihood that will affect your conclusion is pretty low, it also depends on the question.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X