Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nested for loop to concatenate fastq files

    My pooled PE RNA-Seq data was demultiplexed by the sequencing facility, so the data I receive is a directory with sub directories for each sample that contain the R1 and R2 fastq files for each lane (i.e., the main directory "FISH_RNA_SEQ" has 96 folders, each labeled by sample-like "SpA.Treatment1.Rep1", etc.). If I am in a sub-directory for a particular sample, I can concatenate across lanes and write to a file in a new directory so I have two files for each sample (R1/R2) using this for loop:

    for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/samplename_cat_$SUFFIX
    done

    However, this requires me to manually run this for each of the 96 samples, going into the sub-directory and typing in the desired output name. Since I will have to repeat this in the future, does anyone have suggestions about how to use a nested for loop (or other way) to do this automatically/iteratively do this from the main directory for all subdirectories, naming the output files with by the subdirectory (i.e. sample name)?

    Working from the main directory, I was testing something like:
    for dir in *; do
    (for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/test_cat_$SUFFIX
    done)

    But this doesn't seem to work, and it doesn't solve the problem of naming the output files according to the sample names. Any suggestions appreciated!

  • #2
    Do you just want to use the directory name (e.g., SpA.Treatment1.Rep1) as the prefix, or some variant of whatever the file names are?

    Comment


    • #3
      Yes, ideally the directory name would be the file name prefix, so for example a 'Sample1.treatment1.Rep1' directory would produce two output files like: 'Sample1.treatment1.Rep1_cat_R1.001.fastq' and 'Sample1.treatment1.Rep1_cat_R2.001.fastq'

      And then the same for all the other directories/samples...
      Thank you!

      Comment


      • #4
        Code:
        for dir in `find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n"`
        do
            cd $dir
            cat *_L???_R1_*.fastq > $dir_R1.fastq
            cat *_L???_R2_*.fastq > $dir_R2.fastq
            cd ..
        done
        or something like that.

        Comment


        • #5
          thank you! I think I see what each part does except the "%f\n"? my apologies if it's obvious-I'm still fairly new.

          Comment


          • #6
            That's just the formatting that the results should be returned in.

            Comment


            • #7
              great, thank you so much for the help!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:35 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Working...
              X