SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
For loop in bash terminal to concatenate fastq files rcook34 Bioinformatics 6 02-04-2016 07:41 PM
nested for loop to concatenate fastq files lkomo Bioinformatics 6 04-03-2015 07:18 PM
Concatenate / merge IPS XML files cement_head Bioinformatics 0 12-08-2014 05:54 PM
Concatenate Paired End Reads Files In Order lala2013 Bioinformatics 3 10-15-2013 11:52 AM
Concatenate GFF Files mgaldos Bioinformatics 11 05-29-2013 08:55 AM

Reply
 
Thread Tools
Old 04-10-2018, 12:09 PM   #1
Desu
Junior Member
 
Location: Finland

Join Date: Apr 2018
Posts: 1
Default How to concatenate RNA-seq files generated in differnt lanes

I have very large RNA-seq files generated in different lanes. I extracted few of the file names as shown below.

MC9_FNEN_638A_S19_L008_R1_001.fastq.gz
MC9_FNEN_638A_S19_L008_R2_001.fastq.gz
MC9_FNEN_638A_S9_L001_R1_001.fastq.gz
MC9_FNEN_638A_S9_L001_R2_001.fastq.gz
MC9_FNEN_638A_S9_L002_R1_001.fastq.gz
MC9_FREN_638A_S9_L002_R2_001.fastq.gz
MC9_FREN_638A_S9_L006_R1_001.fastq.gz
MC9_FREN_638A_S9_L006_R2_001.fastq.gz
MC9_FREN_638A_S9_L008_R1_001.fastq.gz
MC9_FREN_638A_S9_L008_R2_001.fastq.gz
MC9_637A_S74_L001_R1_001.fastq.gz
MC9_ZH_637A_S74_L001_R2_001.fastq.gz
MC9_ZH_637A_S74_L003_R1_001.fastq.gz
MC9_ZH_637A_S74_L003_R2_001.fastq.gz
MC9_ZH_637A_S74_L007_R1_001.fastq.gz
MC9_ZH_637A_S74_L007_R2_001.fastq.gz
MC9_ZH_637A_S74_L008_R1_001.fastq.gz
MC9_ZH_637A_S74_L008_R2_001.fastq.gz
MC9_ZH_637A_S84_L008_R1_001.fastq.gz
MC9_ZH_637A_S84_L008_R2_001.fastq.gz
DR14_DCRP_479C_S50_L001_R1_001.fastq.gz
DR14_DCRP_479C_S50_L001_R2_001.fastq.gz
DR14_DCRP_479C_S50_L002_R1_001.fastq.gz
DR14_DCRP_479C_S50_L002_R2_001.fastq.gz
DR14_DCRP_479C_S50_L006_R1_001.fastq.gz
DR14_DCRP_479C_S50_L006_R2_001.fastq.gz
DR14_DCRP_479C_S50_L007_R1_001.fastq.gz
DR14_DCRP_479C_S50_L007_R2_001.fastq.gz
DR14_DCRP_479C_S50_L008_R1_001.fastq.gz
DR14_DCRP_479C_S50_L008_R2_001.fastq.gz

I want to concatenate all the sequence generated in different lanes for the forward and reverse read. For example the first 10 lines are sequence file from the same animal and specific tissue (`MC9_FREN`). I want to concatenate all the forward read `XXXXX_R1_001.fastq.gz` that are generated in different lanes and put in the file name `MC9_FREN_R1.fastq.gz` and all reverse reads `XXXX_R2_001.fastq.gz` to `MC9_FREN_R2.fastq.gz`

cat MC9_FREN_638A_S19_L008_R1_001.fastq.gz MC9_FREN_638A_S9_L001_R1_001.fastq.gz MC9_FREN_638A_S9_L002_R1_001.fastq.gz MC9_FREN_638A_S9_L007_R1_001.fastq.gz MC9_FREN_638A_S9_L008_R1_001.fastq.gz > MC9_FREN_R1.fastq.gz
cat MC9_FREN_638A_S19_L008_R2_001.fastq.gz MC9_FREN_638A_S9_L001_R2_001.fastq.gz MC9_FREN_638A_S9_L002_R2_001.fastq.gz MC9_FREN_638A_S9_L007_R2_001.fastq.gz MC9_FREN_638A_S9_L008_R2_001.fastq.gz > MC9_FREN_R2.fastq.gz
cat MC9_ZH_637A_S74_L001_R1_001.fastq.gz MC9_ZH_637A_S74_L003_R1_001.fastq.gz MC9_ZH_637A_S74_L007_R1_001.fastq.gz MC9_ZH_637A_S74_L008_R1_001.fastq.gz MC9_ZH_637A_S84_L008_R1_001.fastq.gz > MC9_ZH_R1.gz
cat MC9_ZH_637A_S74_L001_R2_001.fastq.gz MC9_ZH_637A_S74_L003_R2_001.fastq.gz MC9_ZH_637A_S74_L007_R2_001.fastq.gz MC9_ZH_637A_S74_L008_R2_001.fastq.gz MC9_ZH_637A_S84_L008_R2_001.fastq.gz > MC9_ZH_R2.gz
cat DR14_DCRP_479C_S50_L001_R1_001.fastq.gz DR14_DCRP_479C_S50_L002_R1_001.fastq.gz DR14_DCRP_479C_S50_L006_R1_001.fastq.gz DR14_DCRP_479C_S50_L007_R1_001.fastq.gz DR14_DCRP_479C_S50_L008_R1_001.fastq.gz > DR14_DCRP_R1.gz
cat DR14_DCRP_479C_S50_L001_R2_001.fastq.gz DR14_DCRP_479C_S50_L002_R2_001.fastq.gz DR14_DCRP_479C_S50_L006_R2_001.fastq.gz DR14_DCRP_479C_S50_L007_R2_001.fastq.gz DR14_DCRP_479C_S50_L008_R2_001.fastq.gz > DR14_DCRP_R1.gz

Last edited by Desu; 04-12-2018 at 07:29 AM.
Desu is offline   Reply With Quote
Old 04-10-2018, 01:53 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,780
Default

It could be best to leave them in pieces while you scan/trim/align them to allow brute-force parallelization. You can then merge the BAM files afterwards to generate a single one per sample.

Cross-posted (and answered) at Biostars: https://www.biostars.org/p/308606/
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO