SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastX Trimmer takes too long and how to do multiple files? billstevens Bioinformatics 11 02-26-2015 10:33 AM
FastX-toolkit liu_xt005 Bioinformatics 13 10-11-2014 04:52 AM
FASTX-Toolkit: quality score value thinkRNA Bioinformatics 13 09-30-2014 09:25 AM
FastX-Toolkit qual_filter error nightsun Bioinformatics 2 07-21-2013 07:32 AM
FASTX - Toolkit error message Guigra Bioinformatics 2 06-05-2013 03:21 PM

Reply
 
Thread Tools
Old 08-01-2017, 09:29 AM   #1
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default Fastx-Toolkit - Analyze multiple files in directory (Linux/Qiime)

I am trying to run the fastq_quality_filter on a directory of .fastq files (~500) where each file has a unique name. I simply cannot get it to work on the Linux Command Line within the Qiime VirtualBox. As an output, I am looking to have a new folder with each quality-filtered .fastq file having the same unique name it previously had. I have successfully run the script on a single .fastq file with the following command…

fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/Mock_community.fastq -o /home/qiime/Desktop/Hilo_New/Mock_community_fqf.fastq

I would like to use this for all the files in a directory, however, I cannot. Also, doing this for 500 files seems quite daunting. Is there a way to make this happen? Here is the code I have tried…

fastq_quality_filter –Q33 -q 19 -p 89 –i “/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join/*.fastq” -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/

and I receive the following error…

fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
(10)

The file format is just fine, as it works with an individual file. I just can't seem to figure it out.

Finally, I would like to do the same directory analysis for the FastQ Artifacts Filter

Any help would be much appreciated. Thanks.
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 09:43 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

Try this :
Code:
for i in `ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done
Make sure /home/qiime/Desktop/Hilo_New/fastX-filterFastq is pre-created before you run this code.

Edit: This does not work with Fastx_toolkit. A more modern program like BBMap handles this fine. Example in post #12 below.

Last edited by GenoMax; 08-02-2017 at 04:02 AM.
GenoMax is offline   Reply With Quote
Old 08-01-2017, 10:06 AM   #3
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

Thanks. Quick follow up... should I be pasting the entirety of the code as written? The coding language I don't understand is the "for i in 'ls -1" etc.

Is this all one command?

Thanks.
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 11:27 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

This is small bash script which is using a for loop.

"ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//" - Takes the listing of files in your source directory (one at a time), removes the .fastq on the end of the file name using stream editor called "sed" (for reason mentioned below) and then assigns the first part of file name to a "variable" called i.

Variable i is then used to construct the fastx command line (one file at a time). For -i option we are adding the ".fastq" back on the variable i so the original file name is recreated. For -o (output file name) we are appending "_fqf.fastq" (which is what you had in your example) to make up a new file name while retaining the sample name (which is being saved in new output directory).

This process will iterate until all files in source directory are processed.
GenoMax is offline   Reply With Quote
Old 08-01-2017, 01:56 PM   #5
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

Okay, thanks for the clarification. Now I'm getting the following when I put in the two commands...

fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
(10)

I'm not sure why this is happening since I can push a single file through the filter...
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 02:17 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

What two command are you referring to? I only have a single command line up there. I have not recently used fastx toolkit and I assume your command line is correct? Have you made sure your fastq files are in the correct format?
GenoMax is offline   Reply With Quote
Old 08-01-2017, 05:06 PM   #7
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

Nice command GenoMax - think I'm learning something about loops! . But couldn't you just do something like this:

Move into the Hilo_New folder, create a new folder called "fastX-filterFastq", then do:

for i in *.fastq; do fastq_quality_filter –Q33 -q 19 -p 89 –i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

I was also under the impression that these bash loops didn't iterate until the current job was complete, so didn't need a 'sleep' call? Could be wrong there..
neavemj is offline   Reply With Quote
Old 08-01-2017, 05:48 PM   #8
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

I just loaded the fastQValidator and pulled a subsample out of my original folder. I checked 43 .fastq files with fastQValidator and they all passed with the following result...

qiime@qiime-190-virtual-box:~/fastQValidator$ /home/qiime/fastQValidator/bin/profile/fastQValidator --file /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq

Finished processing /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq with 58084 lines containing 14521 sequences.
There were a total of 0 errors.
Returning: 0 : FASTQ_SUCCESS

I also checked the header of my files and they all seem to look good
Code:
qiime@qiime-190-virtual-box:~$ head /home/qiime/Desktop/Hilo2_fastqjoin.join.fastq
@M01498:340:000000000-B86MB:1:1101:22408:1708 1:N:0:2
GTGAATCATCAAATTTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCTCAACGACCGAGTTTTTGTTAACTCGGGAGTTGGATCTTGAGCGCTGCCGGGTTCCTTGGGATCGTTGGCTCGCTTTAAAAGCTCGGATTGTGTCTTCGAGGTCGTTAATCCTAGTCGACGTGTAATTAGATTTATCGTTGGCGTTACGGAGGCCTCTTAACGGACCTTTCTCCCCTATCGTGCTCTTTAGGAGTGCAACTTTTGAACTTTTGACCTCAGATCAGTCGGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGEFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC
@M01498:340:000000000-B86MB:1:1101:9873:2268 1:N:0:2
GTGAATCATCAAATCTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCT
Then I made a folder to analyze only these 43 files with the command you suggested and it just hangs and does nothing for a while. When I press enter again the same error comes out...

fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
(10)

I was getting frustrated so I just tried a single file again to make sure it worked and it doesn't. I am getting the same error as above... I guess I don't know what's going on at all.

Last edited by GenoMax; 08-02-2017 at 03:48 AM.
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 05:53 PM   #9
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

neavemj - same story with the code you posted. Thanks.
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 05:58 PM   #10
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

If you can see something wrong, here is the .fastq file. I couldn't figure out how to post it, so just delete .pdf from the end (I think that will work).
Attached Files
File Type: pdf MockComm_fastqjoin.join.fastq.pdf (10.74 MB, 5 views)
j.cappellazzi is offline   Reply With Quote
Old 08-01-2017, 09:59 PM   #11
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

Huh, not sure. It seems like it's ignoring the -i flag and waiting for something from the STDIN. You could try giving it an opened file instead of the input flag, like so:

Code:
for i in *.fastq; do cat $i | fastq_quality_filter –Q33 -q 19 -p 89 -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

Something like that might work. I haven't tested it though.

Cheers,

Matt.

Last edited by GenoMax; 08-02-2017 at 04:14 AM.
neavemj is offline   Reply With Quote
Old 08-02-2017, 04:01 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

It looks like fastx_toolkit does not want to behave like a normal unix program.

@ j.cappellazzi: You can use @neavemj's suggestion above if you want to stick with fastx_toolkit. Otherwise, I suggest that you use bbduk.sh from BBMap suite (a more current program) for this.

Code:
for i in `ls -1 *.fq | sed 's/.fq//'`; do bbduk.sh qin=33 qtrim=r trimq=19 in=$i.fq out=$i\_fqf.fq; done
@neavemj: Multiple ways to skin the cat. You are right that we didn't need the sleep option.

Last edited by GenoMax; 08-02-2017 at 04:16 AM.
GenoMax is offline   Reply With Quote
Old 08-02-2017, 11:25 AM   #13
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

@Genomax @neavemj

Well, that was frustrating and silly. It wasn't recognizing the -i because it was a long "-" not a short one. Must have been from copying from word into the command line. WOW!

Now it works just fine with individual files again (must have mucked that up as well while copying last night), however, there is still a problem with @Genomax loop script. I entered...

for i in `ls -1 /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter -Q33 -q 19 -p 89 -i /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done

There is an output folder created and empty, patiently awaiting 43 new files, however, this is the response I receive...

fastq_quality_filter: failed to open input file '/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled//home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends-join_labeled/MockComm_fastqjoin.join.fastq.fastq': No such file or directory

It does this for every single file in the directory. The issue I can see is that "MockComm_fastqjoin.join.fastq.fastq" is not a file, as the input file will have only one ".fastq" in the name. I tried playing around with the script but honestly don't understand all the details and other errors kept popping up.

Any further help on this would be greatly appreciated. Thanks.
j.cappellazzi is offline   Reply With Quote
Old 08-02-2017, 11:39 AM   #14
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default

@neavemj

I also tried the code you suggested after getting into the proper directory at the command line and fixing the "-i" issue...

qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

I made a folder in the following directory titled "FastX-filterFastq"

qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled

It seemed to begin working but for each of my 43 .fastq files in that folder it gave me the following error message...

fastq_quality_filter: Failed to create output file (./fastX-filterFastq/MockComm_fastqjoin.join_fqf.fastq): No such file or directory

Thanks for any further help on this.
j.cappellazzi is offline   Reply With Quote
Old 08-02-2017, 12:51 PM   #15
j.cappellazzi
Member
 
Location: Oregon

Join Date: Aug 2017
Posts: 12
Default It worked!

Well, now I understand my coding limitations. I didn't know the "./" meant I needed to provide the path to the output folder. I messed around with the code and did...

qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/FastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

It worked like a charm. I frequently run into problems like this, when it's just my lack of coding knowledge that makes even the simplest tasks frustrating. Thank you so much for creating that code. I truly appreciate it.
j.cappellazzi is offline   Reply With Quote
Old 08-02-2017, 01:04 PM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

Quote:
It wasn't recognizing the -i because it was a long "-" not a short one. Must have been from copying from word into the command line. WOW!
This is a pretty common problem on both PC/mac OS. You should turn off "smart" hyphens (and other smart things) in keyboard preferences.
GenoMax is offline   Reply With Quote
Old 08-02-2017, 03:07 PM   #17
neavemj
Member
 
Location: MA, USA

Join Date: Feb 2014
Posts: 58
Default

Nice work j.cappellazzi!

I hadn't heard of that hyphen issue - bit of a trap!

The "./" in my command refers to a 'relative' path. I.e., it will look for a folder that is in your current directory. This is in contrast to absolute paths, i.e., "/home/qiime/Desktop/Hilo_New/" that will work no matter where you run them from.

I think the problem with my original command is that the folder name and the name in the command were slightly different. You created a folder called "FastX-filterFastq", but the command is trying to create output here "./fastX-filterFastq/". Note the lower-case 'f' in fastX.

Anyway, glad you got it working!

Cheers,

Matt.
neavemj is offline   Reply With Quote
Reply

Tags
batch, fastq_quality_filter, fastx toolkit, illumina, qiime

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO