Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • thermophile
    Senior Member
    • Apr 2015
    • 243

    help with basemount copy script

    I have a script that I've cobbled together to copy fastq from each sample within a project into one folder on my computer.

    Code:
            for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
            do cp $f PROJECTNAME"/fastq/"${f##*Files/};
            done
    This works, except for the occasions that I have to resequence a particular sample. Only the original sample is copied because basemount put " (2)" on the sample folder name for the second run. How can I tweak this so it will copy the fastq for the second run (the fastq are unique because they all get the run info as part of the name). I think I need to change this so the second part of the cp command ignores the " (2)" but can't figure out how. My first thought was to remove the * but that made it fail for all samples, not just the duplicates.

    Rereading the bash documentation isn't helping http://tldp.org/LDP/abs/html/string-manipulation.html
    Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
  • neavemj
    Member
    • Feb 2014
    • 58

    #2
    Hi thermophile,

    Can you give us the complete name of the old and new fastq files?

    The first line in that script is where the variable 'f' is assigned to each of the file names, so it might be at this point where the files with (2) are being missed.

    Cheers,

    Matt.

    Comment

    • Bukowski
      Senior Member
      • Jan 2010
      • 388

      #3
      Use rsync?

      Comment

      • thermophile
        Senior Member
        • Apr 2015
        • 243

        #4
        I don't want to just rsync because I need all the fastq in a single folder for downstream processing

        Here I've echo'd the cp line and added a comma for readability

        [CODE]for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
        do echo $f "," "/PROJECTNAME"/fastq/"${f##*Files/};
        done
        [CODE]


        Code:
        ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1 (2),/Files/ADB2017Dec13SI1_S150_L001_R1_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S150_L001_R1_001.fastq.gz
        ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1 (2),/Files/ADB2017Dec13SI1_S150_L001_R2_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S150_L001_R2_001.fastq.gz
        ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1,/Files/ADB2017Dec13SI1_S70_L001_R1_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S70_L001_R1_001.fastq.gz
        ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1,/Files/ADB2017Dec13SI1_S70_L001_R2_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S70_L001_R2_001.fastq.gz
        Last edited by thermophile; 03-23-2018, 08:56 AM.
        Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

        Comment

        • neavemj
          Member
          • Feb 2014
          • 58

          #5
          Hi thermophile,

          I'd say the problem with the new file names are the extra space and the parentheses. The extra space makes it difficult for the 'cp' command to know what is the file to copy and what is the destination. Also, parentheses need to be 'escaped' if you want to use them in a filename. Something like the below (note the extra backslashes):

          ls ADB2017Dec13SI1\ \(2\)/

          This makes the whole thing pretty complicated, but I think if you put some quotes around the file names it will treat them as a whole rather than their parts:

          Code:
          for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
                  do cp "$f" PROJECTNAME"/fastq/"${f##*Files/};
                  done
          The only thing I changed was to put the $f in quotes. I'm not entirely sure if this will work without actually trying it out. There are some other quotes in the 'destination' bit and some other things going on that might mess it up.

          Let me know if it works!

          Cheers,

          Matt.

          Comment

          • neavemj
            Member
            • Feb 2014
            • 58

            #6
            P.S. I guess the best idea would be to change your workflow so that spaces and parentheses are not introduced into the file names. If you use some linux programs for trimming or other processing, these will probably also fail with these file names.

            Good luck!

            Matt.

            Comment

            • fmd
              Junior Member
              • Nov 2018
              • 1

              #7
              For anyone that comes across this thread looking for help with BaseMount, I've made a Python script that might be useful to you. It doesn't exactly do what thermophile asked for, but should make retrieving the fastq files straightforward. Given a BaseMount project directory, it will extract all of the runs and simulate the folder structure you'd expect from a local MiSeq run for each. In addition to the reads, it will grab the sample sheet, InterOp directory contents, and log files.

              Here it is:

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              12 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              48 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              107 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              125 views
              0 reactions
              Last Post SEQadmin2  
              Working...