Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove a part of a filename in a Bash loop

    I have many files named like this:

    lib01.GFBAG_UHAU.fastq.sam.bam
    lib02.ABABAB_ZU.fastq.sam.bam
    lib03.ZGAZG_IAUDH.fastq.sam.bam

    Many parts of the filenames are thus variable in length, although they are connected through the same type of punctuation (. or _).
    What I want to achieve is to remove the part .fastq.sam.bam from a filename when I loop trough these files in BASH. How do I achieve this in Bash?

  • #2
    You want to split the string on a "." delimiter and then keep the first two parts. Or use ".fastq.sam.bam" as a delimiter, I suppose!

    To split string in Bash scripting with single character or set of single character delimiters, set IFS(Internal Field Separator) to the delimiter(s) and parse the string to array. To split string in Bash with multiple character delimiter use Parameter Expansions. Examples have been provided for Bash Split String operation.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Refer to https://unix.stackexchange.com/quest...ck-of-variable

      An example for changing extension from fastq.sam.bam to txt.

      for file in *.fastq.sam.bam
      do
      mv ${file%.fastq.sam.bam} ${file%.fastq.sam.bam}.txt
      done

      Comment


      • #4
        Originally posted by ungsik View Post
        Refer to https://unix.stackexchange.com/quest...ck-of-variable

        An example for changing extension from fastq.sam.bam to txt.

        for file in *.fastq.sam.bam
        do
        mv ${file%.fastq.sam.bam} ${file%.fastq.sam.bam}.txt
        done
        Don't you mean:

        for file in *.fastq.sam.bam
        do
        mv $file ${file%.fastq.sam.bam}.txt
        done

        --
        Phillip

        Comment


        • #5
          Originally posted by Marius View Post
          I have many files named like this:

          lib01.GFBAG_UHAU.fastq.sam.bam
          lib02.ABABAB_ZU.fastq.sam.bam
          lib03.ZGAZG_IAUDH.fastq.sam.bam

          Many parts of the filenames are thus variable in length, although they are connected through the same type of punctuation (. or _).
          What I want to achieve is to remove the part .fastq.sam.bam from a filename when I loop trough these files in BASH. How do I achieve this in Bash?
          Using BASH parameter expansion:

          Code:
          for i in *.fastq.sam.bam; do mv $i ${i%.fastq.sam.bam}; done;
          Which is pretty fun, the "%" more-or-less meaning "clip what follows from the the very end of the value stored in variable $i." "#" does the analogous thing, but clips from the very front.

          But "%%" does a "greedy" removal of whatever follows it. So:

          Code:
          i=lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam.fastq.sam.bam
          echo ${i%.fastq.sam.bam*}
          will produce:
          Code:
          lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam
          whereas:

          Code:
          i=lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam.fastq.sam.bam
          echo ${i%%.fastq.sam.bam*}
          will produce:
          Code:
          lib01.GFBAG_UHAU
          If you can run Perl, then finding the "rename.pl" script might be less arcane than deploying you BASH powers.

          rename.pl 's/.fastq.sam.bam$//' *.fastq.sam.bam

          Find rename.pl here:


          --
          Phillip

          Comment


          • #6
            Originally posted by Marius View Post
            I have many files named like this:

            lib01.GFBAG_UHAU.fastq.sam.bam
            lib02.ABABAB_ZU.fastq.sam.bam
            lib03.ZGAZG_IAUDH.fastq.sam.bam

            Many parts of the filenames are thus variable in length, although they are connected through the same type of punctuation (. or _).
            What I want to achieve is to remove the part .fastq.sam.bam from a filename when I loop trough these files in BASH. How do I achieve this in Bash?
            Using BASH parameter expansion:

            Code:
            for i in *.fastq.sam.bam; do mv $i ${i%.fastq.sam.bam}; done;
            Which is pretty fun, the "%" more-or-less meaning "clip what follows from the the very end of the value stored in variable $i." "#" does the analogous thing, but clips from the very front.

            But "%%" does a "greedy" removal of whatever follows it. So:

            Code:
            i=lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam.fastq.sam.bam
            echo ${i%.fastq.sam.bam*}
            will produce:
            Code:
            lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam
            whereas:

            Code:
            i=lib01.GFBAG_UHAU.fastq.sam.bam.fastq.sam.bam.fastq.sam.bam
            echo ${i%%.fastq.sam.bam*}
            will produce:
            Code:
            lib01.GFBAG_UHAU
            If you can run Perl, then finding the "rename.pl" script might be less arcane than deploying you BASH powers.

            rename.pl 's/.fastq.sam.bam$//' *.fastq.sam.bam

            Find rename.pl here:


            --
            Phillip

            Comment


            • #7
              A range of options exists for munging the pathnames

              The approach I would use might well depend on what else I going to do in the loop.

              FWIW:

              [basename](https://linux.die.net/man/1/basename) can be used to remove a suffix of a filename.

              [Shell Parameter Expansion](https://www.gnu.org/software/bash/ma...Expansion.html) can be used to strip or replace either suffixes or prefixes of pathnames stored in variables.

              [GNU parallel](https://www.gnu.org/software/parallel/) can be used in effect to replace your bash looping construct, and has simple syntax for to refer to the basename of a file or directory, including `{=perl expression=}` to munge the pathname any way you like. It has MANY great features and is well worth exploring and being in your toolbelt.

              [rename](https://www.computerhope.com/unix/rename.htm) is very useful for batch renaming of files using regular expressions (if that is all you need to do).

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Working...
              X