Seqanswers Leaderboard Ad

**dschika** · 03-15-2015, 06:11 AM

This command should do the job:

Code:

awk '{if ($1 ~ /SRR/) {split($0, T, ".1 "); print T[1], T[2]}  else print $0}' YOURINPUT.fastq

Since you mentioned sed and awk I assume you know that $1 = first field, $2 = second field (fields separated by whitespaces if not defined otherwise by FS) and $0 = whole line.

The command checks if in the first field SRR is present ("if ($1 ~ /SRR/)"). If yes, it splits the content of the first line by ".1 " and stores the result in T. In this case ".1 " can be found only once in each line with SRR. Therefore, printing of T[0] and T[1] results in the line without ".1 ".

If the line does not contain SRR (i.e., we are at a line with sequence or quality values) print the whole line.

**yao_licr** · 03-15-2015, 07:36 AM

Thanks for the detailed explanation. It works well.

**dschika** · 03-15-2015, 08:02 AM

No problem, I just realized, that I was thinking way to complicated:

sed 's/\.1 / /g' YOURINPUT.fastq > out.fastq

Just replace ".1 " with " " and escape the "." with a backslash.

**yao_licr** · 03-15-2015, 08:11 AM

Sorry, ".1" is the pattern to be replace by " ", so why do you put an empty space after 1? I mean the empty space in "/\.1 /" .

Thanks!

**yao_licr** · 03-15-2015, 08:17 AM

I got it now, as I have .1.1 in the first read; if there is no empty space after "\.1", both of them will be replaced. In your script, ".1 " is the pattern but not ".1" .

Thanks!

**dschika** · 03-15-2015, 08:20 AM

Correct, you're welcome!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

substitution using sed or awk

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News