SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl script mmmm Bioinformatics 3 01-17-2014 07:34 AM
R script help Bioinformaticsnewb Bioinformatics 2 04-14-2013 01:30 PM
Perl Script AdrianJ217 Bioinformatics 7 10-15-2012 05:58 AM
Where can I get a copy of AGILE? yunhuang Bioinformatics 1 07-20-2012 09:34 AM
Script Help DrD2009 Bioinformatics 4 07-15-2011 09:17 AM

Reply
 
Thread Tools
Old 03-22-2018, 03:58 PM   #1
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 234
Default help with basemount copy script

I have a script that I've cobbled together to copy fastq from each sample within a project into one folder on my computer.

Code:
        for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
        do cp $f PROJECTNAME"/fastq/"${f##*Files/};
        done
This works, except for the occasions that I have to resequence a particular sample. Only the original sample is copied because basemount put " (2)" on the sample folder name for the second run. How can I tweak this so it will copy the fastq for the second run (the fastq are unique because they all get the run info as part of the name). I think I need to change this so the second part of the cp command ignores the " (2)" but can't figure out how. My first thought was to remove the * but that made it fail for all samples, not just the duplicates.

Rereading the bash documentation isn't helping http://tldp.org/LDP/abs/html/string-manipulation.html
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.
thermophile is offline   Reply With Quote
Old 03-22-2018, 10:15 PM   #2
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

Hi thermophile,

Can you give us the complete name of the old and new fastq files?

The first line in that script is where the variable 'f' is assigned to each of the file names, so it might be at this point where the files with (2) are being missed.

Cheers,

Matt.
neavemj is offline   Reply With Quote
Old 03-23-2018, 05:31 AM   #3
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 355
Default

Use rsync?
Bukowski is offline   Reply With Quote
Old 03-23-2018, 09:48 AM   #4
thermophile
Senior Member
 
Location: CT

Join Date: Apr 2015
Posts: 234
Default

I don't want to just rsync because I need all the fastq in a single folder for downstream processing

Here I've echo'd the cp line and added a comma for readability

[CODE]for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
do echo $f "," "/PROJECTNAME"/fastq/"${f##*Files/};
done
[CODE]


Code:
../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1 (2),/Files/ADB2017Dec13SI1_S150_L001_R1_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S150_L001_R1_001.fastq.gz
../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1 (2),/Files/ADB2017Dec13SI1_S150_L001_R2_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S150_L001_R2_001.fastq.gz
../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1,/Files/ADB2017Dec13SI1_S70_L001_R1_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S70_L001_R1_001.fastq.gz
../basemountpoint/basespace/Projects/PROJECTNAME/Samples/ADB2017Dec13SI1,/Files/ADB2017Dec13SI1_S70_L001_R2_001.fastq.gz PROJECTNAME/fastq/ADB2017Dec13SI1_S70_L001_R2_001.fastq.gz
__________________
Microbial ecologist, running a sequencing core. I have lots of strong opinions on how to survey communities, pretty sure some are even correct.

Last edited by thermophile; 03-23-2018 at 09:56 AM.
thermophile is offline   Reply With Quote
Old 03-25-2018, 03:37 PM   #5
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

Hi thermophile,

I'd say the problem with the new file names are the extra space and the parentheses. The extra space makes it difficult for the 'cp' command to know what is the file to copy and what is the destination. Also, parentheses need to be 'escaped' if you want to use them in a filename. Something like the below (note the extra backslashes):

ls ADB2017Dec13SI1\ \(2\)/

This makes the whole thing pretty complicated, but I think if you put some quotes around the file names it will treat them as a whole rather than their parts:

Code:
for f in ../basemountpoint/basespace/Projects/PROJECTNAME/Samples/*/Files/*.gz;
        do cp "$f" PROJECTNAME"/fastq/"${f##*Files/};
        done
The only thing I changed was to put the $f in quotes. I'm not entirely sure if this will work without actually trying it out. There are some other quotes in the 'destination' bit and some other things going on that might mess it up.

Let me know if it works!

Cheers,

Matt.
neavemj is offline   Reply With Quote
Old 03-25-2018, 03:44 PM   #6
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

P.S. I guess the best idea would be to change your workflow so that spaces and parentheses are not introduced into the file names. If you use some linux programs for trimming or other processing, these will probably also fail with these file names.

Good luck!

Matt.
neavemj is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO