Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Picard MarkDuplicates on multiple bam files

    Hi everyone,

    I've had a search around and can't find the answer I'm looking for.

    I'm wondering if anyone can tell me if it's possible to run Picard MarkDuplicates on a batch of bam files at once? I have 13 bam files in a single directory and would like to remove duplicates from all files and output 13 new bam files with the duplicates removed.

    Since this is a java based program I wasn't sure if the same rules applied as running samtools for example on a batch of files from the command line.....

  • #2
    Are you looking to mark duplicates from all 13 files considered together or just within individual files?

    Comment


    • #3
      Thanks for your reply. I want to remove the duplicates from each of the 13 individual files and produce 13 new individual files that have had the duplicates removed. If that makes sense.....

      Comment


      • #4
        It would work best if you have access to a cluster. You can start 13 parallel markduplicate jobs. You can use the same procedure (shell script?) for running the jobs from the command line. You should not need to remove those duplicates (http://gatkforums.broadinstitute.org...ove-duplicates).

        Comment


        • #5
          Ah. I don't have access to a cluster right now. So I guess I'll have to do them one by one. So it's not possible to run java programs on a batch of files at once?

          To note, I have to remove duplicates from these ChIP-seq files to comply with previous analyses so that I can make them comparable.

          Comment


          • #6
            Certainly possible to run multiple. You could start all of them but then they would compete for hardware resources on your local machine and get in each others way.

            If you have multiple cores available (and a fast disk, SSD) try starting 3-4 in parallel and see if they all proceed well. Watch for CPU/Disk usage.

            Comment


            • #7
              Ok. Thank you.

              Let's put my mac to the test.......

              Comment


              • #8
                If you don't need to do in parallel, you can sequentially remove duplicates with a shell script like:

                Code:
                ## Save a script.sh, run as "sh script.sh" at least on Linux with bash
                ## 
                
                for i in `ls *.bam`
                
                        do
                        java -jar PicardCommandWhatever input=$i
                
                done

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                37 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X