Seqanswers Leaderboard Ad

**dariober** · 12-04-2013, 12:54 AM

Originally posted by akjones View Post

Where the bolded "1" or "2" indicate which member of the pair the read is, so I am trying to get all 1's into a separate file and all 2's into a separate file.

Hi Ana,

This is not perl but it should work, assuming you are on a Linux/Unix machine:

Code:

paste - - - - < test.fq \
| tee >(awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 1:N")) print $1,$2,$3,$4}' > test.r1.fq ) \
| awk 'BEGIN{FS="\t"; OFS="\n"} {if (match($1, " 2:N")) print $1,$2,$3,$4}' > test.r2.fq

test.fq is your merged input file. test.r1.fq and test.r2.fq are the split files. I think the issue is to set the matching pattern to be specific enough to correctly separate read 1 from read 2 given the read name. Here I set the patterns to " 1:N" and " 2:N" for read 1 and 2 respectively.

Loosely related to this question, take care that most programs expect the paired fastq files to have read 1 and read 2 in the same order (e.g. most aligners) and your pipeline above seems to break this requirement.

All the best
Dario

**akjones** · 12-04-2013, 07:56 AM

Hi Dario,

Thanks very much, your code works! What do you mean by

Loosely related to this question, take care that most programs expect the paired fastq files to have read 1 and read 2 in the same order (e.g. most aligners) and your pipeline above seems to break this requirement.

are you referring to the direction of the reads (5'-3' or 3'-5') or the order of the reads themselves, as in read 1 followed by read 2 of a pair? Sorry if that is a silly question, I just want to make sure I am understanding you correctly.

Thanks again,
~Ana

**dariober** · 12-04-2013, 09:17 AM

Originally posted by akjones View Post

Hi Dario,

Thanks very much, your code works! What do you mean by are you referring to the direction of the reads (5'-3' or 3'-5') or the order of the reads themselves, as in read 1 followed by read 2 of a pair? Sorry if that is a silly question, I just want to make sure I am understanding you correctly.

Thanks again,
~Ana

Hi- The fastq files mate 1 should have the same number and order of reads as in the fastq for mate 2. So, if a read is missing from one file, its mate should be removed from the other file as well to keep the correct pairing.

E.g. Say this is your file for mate 1:
read_1.fq:

Code:

@read1 1:N
ACTG
+
IIII
@read2 1:N
ACTG
+
IIII
@read3 1:N
ACTG
+
IIII

This file for mate 2 would be ok:

Code:

@read1 2:N
AAAA
+
IIII
@read2 2:N
TTTT
+
IIII
@read3 2:N
CCCC
+
IIII

This file for mate 2 would be "wrong" as one read is missing:

Code:

@read1 2:N
ACTG
+
IIII
@read3 2:N
ACTG
+
IIII

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Code to split paired end MiSeq data?

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News