Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastx-Toolkit - Analyze multiple files in directory (Linux/Qiime)

    I am trying to run the fastq_quality_filter on a directory of .fastq files (~500) where each file has a unique name. I simply cannot get it to work on the Linux Command Line within the Qiime VirtualBox. As an output, I am looking to have a new folder with each quality-filtered .fastq file having the same unique name it previously had. I have successfully run the script on a single .fastq file with the following command…

    fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/Mock_community.fastq -o /home/qiime/Desktop/Hilo_New/Mock_community_fqf.fastq

    I would like to use this for all the files in a directory, however, I cannot. Also, doing this for 500 files seems quite daunting. Is there a way to make this happen? Here is the code I have tried…

    fastq_quality_filter –Q33 -q 19 -p 89 –i “/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join/*.fastq” -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/

    and I receive the following error…

    fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
    (10)

    The file format is just fine, as it works with an individual file. I just can't seem to figure it out.

    Finally, I would like to do the same directory analysis for the FastQ Artifacts Filter

    Any help would be much appreciated. Thanks.

  • #2
    Try this :
    Code:
    for i in `ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done
    Make sure /home/qiime/Desktop/Hilo_New/fastX-filterFastq is pre-created before you run this code.

    Edit: This does not work with Fastx_toolkit. A more modern program like BBMap handles this fine. Example in post #12 below.
    Last edited by GenoMax; 08-02-2017, 04:02 AM.

    Comment


    • #3
      Thanks. Quick follow up... should I be pasting the entirety of the code as written? The coding language I don't understand is the "for i in 'ls -1" etc.

      Is this all one command?

      Thanks.

      Comment


      • #4
        This is small bash script which is using a for loop.

        "ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//" - Takes the listing of files in your source directory (one at a time), removes the .fastq on the end of the file name using stream editor called "sed" (for reason mentioned below) and then assigns the first part of file name to a "variable" called i.

        Variable i is then used to construct the fastx command line (one file at a time). For -i option we are adding the ".fastq" back on the variable i so the original file name is recreated. For -o (output file name) we are appending "_fqf.fastq" (which is what you had in your example) to make up a new file name while retaining the sample name (which is being saved in new output directory).

        This process will iterate until all files in source directory are processed.

        Comment


        • #5
          Okay, thanks for the clarification. Now I'm getting the following when I put in the two commands...

          fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
          (10)

          I'm not sure why this is happening since I can push a single file through the filter...

          Comment


          • #6
            What two command are you referring to? I only have a single command line up there. I have not recently used fastx toolkit and I assume your command line is correct? Have you made sure your fastq files are in the correct format?

            Comment


            • #7
              Nice command GenoMax - think I'm learning something about loops! . But couldn't you just do something like this:

              Move into the Hilo_New folder, create a new folder called "fastX-filterFastq", then do:

              for i in *.fastq; do fastq_quality_filter –Q33 -q 19 -p 89 –i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

              I was also under the impression that these bash loops didn't iterate until the current job was complete, so didn't need a 'sleep' call? Could be wrong there..

              Comment


              • #8
                I just loaded the fastQValidator and pulled a subsample out of my original folder. I checked 43 .fastq files with fastQValidator and they all passed with the following result...

                qiime@qiime-190-virtual-box:~/fastQValidator$ /home/qiime/fastQValidator/bin/profile/fastQValidator --file /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq

                Finished processing /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq with 58084 lines containing 14521 sequences.
                There were a total of 0 errors.
                Returning: 0 : FASTQ_SUCCESS

                I also checked the header of my files and they all seem to look good
                Code:
                qiime@qiime-190-virtual-box:~$ head /home/qiime/Desktop/Hilo2_fastqjoin.join.fastq
                @M01498:340:000000000-B86MB:1:1101:22408:1708 1:N:0:2
                GTGAATCATCAAATTTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCTCAACGACCGAGTTTTTGTTAACTCGGGAGTTGGATCTTGAGCGCTGCCGGGTTCCTTGGGATCGTTGGCTCGCTTTAAAAGCTCGGATTGTGTCTTCGAGGTCGTTAATCCTAGTCGACGTGTAATTAGATTTATCGTTGGCGTTACGGAGGCCTCTTAACGGACCTTTCTCCCCTATCGTGCTCTTTAGGAGTGCAACTTTTGAACTTTTGACCTCAGATCAGTCGGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA
                +
                CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGEFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC
                @M01498:340:000000000-B86MB:1:1101:9873:2268 1:N:0:2
                GTGAATCATCAAATCTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCT
                Then I made a folder to analyze only these 43 files with the command you suggested and it just hangs and does nothing for a while. When I press enter again the same error comes out...

                fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
                (10)

                I was getting frustrated so I just tried a single file again to make sure it worked and it doesn't. I am getting the same error as above... I guess I don't know what's going on at all.
                Last edited by GenoMax; 08-02-2017, 03:48 AM.

                Comment


                • #9
                  neavemj - same story with the code you posted. Thanks.

                  Comment


                  • #10
                    If you can see something wrong, here is the .fastq file. I couldn't figure out how to post it, so just delete .pdf from the end (I think that will work).
                    Attached Files

                    Comment


                    • #11
                      Huh, not sure. It seems like it's ignoring the -i flag and waiting for something from the STDIN. You could try giving it an opened file instead of the input flag, like so:

                      Code:
                      for i in *.fastq; do cat $i | fastq_quality_filter –Q33 -q 19 -p 89 -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                      Something like that might work. I haven't tested it though.

                      Cheers,

                      Matt.
                      Last edited by GenoMax; 08-02-2017, 04:14 AM.

                      Comment


                      • #12
                        It looks like fastx_toolkit does not want to behave like a normal unix program.

                        @ j.cappellazzi: You can use @neavemj's suggestion above if you want to stick with fastx_toolkit. Otherwise, I suggest that you use bbduk.sh from BBMap suite (a more current program) for this.

                        Code:
                        for i in `ls -1 *.fq | sed 's/.fq//'`; do bbduk.sh qin=33 qtrim=r trimq=19 in=$i.fq out=$i\_fqf.fq; done
                        @neavemj: Multiple ways to skin the cat. You are right that we didn't need the sleep option.
                        Last edited by GenoMax; 08-02-2017, 04:16 AM.

                        Comment


                        • #13
                          @Genomax @neavemj

                          Well, that was frustrating and silly. It wasn't recognizing the -i because it was a long "-" not a short one. Must have been from copying from word into the command line. WOW!

                          Now it works just fine with individual files again (must have mucked that up as well while copying last night), however, there is still a problem with @Genomax loop script. I entered...

                          for i in `ls -1 /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter -Q33 -q 19 -p 89 -i /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done

                          There is an output folder created and empty, patiently awaiting 43 new files, however, this is the response I receive...

                          fastq_quality_filter: failed to open input file '/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled//home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends-join_labeled/MockComm_fastqjoin.join.fastq.fastq': No such file or directory

                          It does this for every single file in the directory. The issue I can see is that "MockComm_fastqjoin.join.fastq.fastq" is not a file, as the input file will have only one ".fastq" in the name. I tried playing around with the script but honestly don't understand all the details and other errors kept popping up.

                          Any further help on this would be greatly appreciated. Thanks.

                          Comment


                          • #14
                            @neavemj

                            I also tried the code you suggested after getting into the proper directory at the command line and fixing the "-i" issue...

                            qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                            I made a folder in the following directory titled "FastX-filterFastq"

                            qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled

                            It seemed to begin working but for each of my 43 .fastq files in that folder it gave me the following error message...

                            fastq_quality_filter: Failed to create output file (./fastX-filterFastq/MockComm_fastqjoin.join_fqf.fastq): No such file or directory

                            Thanks for any further help on this.

                            Comment


                            • #15
                              It worked!

                              Well, now I understand my coding limitations. I didn't know the "./" meant I needed to provide the path to the output folder. I messed around with the code and did...

                              qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/FastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                              It worked like a charm. I frequently run into problems like this, when it's just my lack of coding knowledge that makes even the simplest tasks frustrating. Thank you so much for creating that code. I truly appreciate it.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X