Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nested for loop to concatenate fastq files

    My pooled PE RNA-Seq data was demultiplexed by the sequencing facility, so the data I receive is a directory with sub directories for each sample that contain the R1 and R2 fastq files for each lane (i.e., the main directory "FISH_RNA_SEQ" has 96 folders, each labeled by sample-like "SpA.Treatment1.Rep1", etc.). If I am in a sub-directory for a particular sample, I can concatenate across lanes and write to a file in a new directory so I have two files for each sample (R1/R2) using this for loop:

    for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/samplename_cat_$SUFFIX
    done

    However, this requires me to manually run this for each of the 96 samples, going into the sub-directory and typing in the desired output name. Since I will have to repeat this in the future, does anyone have suggestions about how to use a nested for loop (or other way) to do this automatically/iteratively do this from the main directory for all subdirectories, naming the output files with by the subdirectory (i.e. sample name)?

    Working from the main directory, I was testing something like:
    for dir in *; do
    (for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/test_cat_$SUFFIX
    done)

    But this doesn't seem to work, and it doesn't solve the problem of naming the output files according to the sample names. Any suggestions appreciated!

  • #2
    Do you just want to use the directory name (e.g., SpA.Treatment1.Rep1) as the prefix, or some variant of whatever the file names are?

    Comment


    • #3
      Yes, ideally the directory name would be the file name prefix, so for example a 'Sample1.treatment1.Rep1' directory would produce two output files like: 'Sample1.treatment1.Rep1_cat_R1.001.fastq' and 'Sample1.treatment1.Rep1_cat_R2.001.fastq'

      And then the same for all the other directories/samples...
      Thank you!

      Comment


      • #4
        Code:
        for dir in `find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n"`
        do
            cd $dir
            cat *_L???_R1_*.fastq > $dir_R1.fastq
            cat *_L???_R2_*.fastq > $dir_R2.fastq
            cd ..
        done
        or something like that.

        Comment


        • #5
          thank you! I think I see what each part does except the "%f\n"? my apologies if it's obvious-I'm still fairly new.

          Comment


          • #6
            That's just the formatting that the results should be returned in.

            Comment


            • #7
              great, thank you so much for the help!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              72 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              82 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X