Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq_quality_trimmer script

    Hi guys,

    I've got a load of RNA-Seq .txt files that I need to trim and filter. The files are very large (>12GB) and I have a lot of different controls etc, and I was wondering if there was a simple script that someone could help me with to perform the same action on multiple files and specify a unique output name.

    I imagine a for loop saying 'for every file ending in .txt run fastq_quality_trimmer and save as a .fastq file' would do this, but I'm not too sure how to go about writing it.

    Any help would be appreciated,

    Thanks a lot,

    Nick

  • #2
    Hi Nick,

    Below is a small perl script that should do what you want. Obviously you can change the parameters selected for fastq_quality_trimmer to the correct ones that you want to use.

    Hope this is what you were looking for.

    open (FILES, "ls *.txt |");
    while (<FILES>) {
    @split = split(/\s+/, $_);
    $file = $split[0];
    print "\n\nParsing $file...";


    system ("fastq_quality_trimmer -t 15 -i $file -o $file.fastq);

    }
    Last edited by cormicp; 07-23-2012, 05:32 AM.

    Comment


    • #3
      Awesome - thanks.

      So I just have to replace the $file with the -i/-o locations?

      Would it be possible for you to give me a brief explanation of the script (i.e what each line does?)

      Thanks a lot,

      Nick

      Comment


      • #4
        If you're running Linux/OSX, you can also do the whole thing at the command line like this:

        Code:
        $ for i in `ls *.txt`; do fastq_quality_trimmer -i $i -o $i.fastq; done
        This may seem very confusing but it's actually quite simple. Here it is broken down:

        Code:
        for i in `ls *.txt`;
        This bit runs `ls *.txt` and stores a list of the filenames. It then loops once for each file. In each loop, the name of the file is put in the variable $i. Note that `ls.txt` is in backticks (top left corner of your keyboard if you're in the US/UK), not single quotes.

        Code:
        do fastq_quality_trimmer -i $i -o $i.fastq;
        This command will run once every loop, with a different value of $i each time. So if you only have two files, A.txt and B.txt, what will happen is the following commands will run:

        fastq_quality_trimmer -i A.txt -o A.txt.fastq
        fastq_quality_trimmer -i B.txt -o B.txt.fastq

        As you can see, this is a simple but powerful way to run the same command on multiple files at once. If you wish to run fastq_quality_trimmer with other command-line options, just edit the line as required to insert those options.

        Code:
        done
        This indicates that you're not putting any more commands inside the loop. The loop will now run and with any luck you'll get your .txt.fastq output files.

        Word of warning: If you make a mistake while doing this, it's possible to specify both the input and output to be $i, which may cause the output to overwrite the input file, leading to much chaos. Having a backup of your files before you try this for the first time is recommended, but once you're comfortable with it you'll find it's really handy to have.

        Comment


        • #5
          Thanks guys, really helpful solutions!

          Cheers,

          Nick

          Comment


          • #6
            Originally posted by Rocketknight View Post

            Code:
            $ for i in `ls *.txt`;
            Just for the record you don't need the ls part of this command.

            Code:
            $ for i in *.txt; ...
            ..works just fine.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            58 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X