Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Picard MarkDuplicates on multiple bam files

    Hi everyone,

    I've had a search around and can't find the answer I'm looking for.

    I'm wondering if anyone can tell me if it's possible to run Picard MarkDuplicates on a batch of bam files at once? I have 13 bam files in a single directory and would like to remove duplicates from all files and output 13 new bam files with the duplicates removed.

    Since this is a java based program I wasn't sure if the same rules applied as running samtools for example on a batch of files from the command line.....

  • #2
    Are you looking to mark duplicates from all 13 files considered together or just within individual files?

    Comment


    • #3
      Thanks for your reply. I want to remove the duplicates from each of the 13 individual files and produce 13 new individual files that have had the duplicates removed. If that makes sense.....

      Comment


      • #4
        It would work best if you have access to a cluster. You can start 13 parallel markduplicate jobs. You can use the same procedure (shell script?) for running the jobs from the command line. You should not need to remove those duplicates (http://gatkforums.broadinstitute.org...ove-duplicates).

        Comment


        • #5
          Ah. I don't have access to a cluster right now. So I guess I'll have to do them one by one. So it's not possible to run java programs on a batch of files at once?

          To note, I have to remove duplicates from these ChIP-seq files to comply with previous analyses so that I can make them comparable.

          Comment


          • #6
            Certainly possible to run multiple. You could start all of them but then they would compete for hardware resources on your local machine and get in each others way.

            If you have multiple cores available (and a fast disk, SSD) try starting 3-4 in parallel and see if they all proceed well. Watch for CPU/Disk usage.

            Comment


            • #7
              Ok. Thank you.

              Let's put my mac to the test.......

              Comment


              • #8
                If you don't need to do in parallel, you can sequentially remove duplicates with a shell script like:

                Code:
                ## Save a script.sh, run as "sh script.sh" at least on Linux with bash
                ## 
                
                for i in `ls *.bam`
                
                        do
                        java -jar PicardCommandWhatever input=$i
                
                done

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                31 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X