Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple bam files one command

    I have multiple .bam files in a directory that I would like to run the following command on:

    Code:
    samtools view -H Input.bam | sed '/^@PG/d' | samtools reheader - Input.bam > Input_newheader.bam
    The command works great for one file, but I am trying to use that command on all .bam file in a directory (/home/cmccabe/Desktop/NGS).

    to do multiple? or is there a better way? Thank you .

    Code:
    find *bam | parallel 'samtools view -H Input.bam | sed '/^@PG/d' | samtools reheader - Input.bam > Input_newheader.bam'

  • #2
    Code:
    for f in *.bam
    do
    prefix=${f%%.bam}
    samtools view -H $f | sed '/^@PG/d' | samtools reheader - $f > ${prefix}_newheader.bam
    done
    This is going to quickly become IO bound, so you're unlikely to see much benefit from parallel. BTW, to do this with parallel, the simplest method is to just write a shell script that takes a single file as input and use that with parallel.

    Comment


    • #3
      If all the bam files are stored on a separate drive (/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215), and the output gets re-directed to (/home/cmccabe/Desktop/NGS/pool_I_090215)will the below work?

      Code:
      cd "/home/cmccabe/Desktop/NGS" -- path to samtools
      
      for f in *.bam
      do
      prefix=${f%%.bam}
      samtools view -H /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/$f | sed '/^@PG/d' | samtools reheader - /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/$f > /home/cmccabe/Desktop/NGS/pool_I_090215/${prefix}_newheader.bam
      done
      Thank you

      Comment


      • #4
        "*.bam" is looking in the current working directory, so no, that won't work. If you said "/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam" then note that you would need to do something like:
        Code:
        bname=`basename $f`
        pref=${bname%%.bam}
        That would strip the path to the file appropriately. You would also then just use $f instead of /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/$f. For why that's the case, run:

        Code:
        for f in /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam
        do
        echo $f
        done

        Comment


        • #5
          Makes sense, so I tried: the bold is the output so why is it looking for those files? Thank you for your help.

          Code:
          cmccabe@HPZ640:~/Desktop/NGS$ for f in /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam; do prefix=${f%%.bam}; samtools view -H $f | sed '/^@PG/d' | samtools reheader - $f > /home/cmccabe/Desktop/NGS/pool_I_090215${prefix}_newheader.bam; done
          [B]bash: /home/cmccabe/Desktop/NGS/pool_I_090215/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902_newheader.bam: No such file or directory
          bash: /home/cmccabe/Desktop/NGS/pool_I_090215/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_015_rawlib_newheader.bam: No such file or directory
          bash: /home/cmccabe/Desktop/NGS/pool_I_090215/media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_016_150902_newheader.bam: No such file or directory[/B]

          Comment


          • #6
            You forgot the basename line.

            Comment


            • #7
              I thought I got it but I am a bit confused:

              Code:
              bname=`basename $f`
              pref=${bname%%.bam}
              for f in /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam
              do
              prefix=${f%%.bam}
              samtools view -H $f | sed '/^@PG/d' | samtools reheader - $f > /home/cmccabe/Desktop/NGS/pool_I_090215/${prefix}_newheader.bam
              done
              gives the below error:

              Code:
              cmccabe@HPZ640:~$ cd "/home/cmccabe/Desktop/NGS"
              cmccabe@HPZ640:~/Desktop/NGS$ bname=`basename $f`
              basename: missing operand
              Try 'basename --help' for more information.
              cmccabe@HPZ640:~/Desktop/NGS$ pref=${bname%%.bam}
              cmccabe@HPZ640:~/Desktop/NGS$ for f in /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam
              > do
              > prefix=${f%%.bam}
              > samtools view -H $f | sed '/^@PG/d' | samtools reheader - $f > /home/cmccabe/Desktop/NGS/pool_I_090215/${prefix}_newheader.bam
              > done
              bash: /home/cmccabe/Desktop/NGS/pool_I_090215//media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_008_150902_newheader.bam: No such file or directory
              bash: /home/cmccabe/Desktop/NGS/pool_I_090215//media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_015_rawlib_newheader.bam: No such file or directory
              bash: /home/cmccabe/Desktop/NGS/pool_I_090215//media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/IonXpress_016_150902_newheader.bam: No such file or directory
              Thank you .

              Comment


              • #8
                When you run basename, $f hasn't yet been defined...

                I'm actually not going to explicitly tell you the solution to this, you should be able to figure it out given a bit of playing around and noting the error message.

                Comment


                • #9
                  So here is the command:

                  Code:
                  /media/cmccabe/C2F8EFBFF8EFAFB9/pool_I_090215/*.bam ; do     bname=`basename $f`;     pref=${bname%%.bam};     samtools view -H $f | sed '/^@PG/d' | samtools reheader - $f > /home/cmccabe/Desktop/NGS/pool_I_090215/${pref}_newheader.bam; done
                  This seems to work great, thank you for your help .

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 08:47 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X