Seqanswers Leaderboard Ad

**kmcarr** · 05-02-2011, 07:09 AM

These FASTQ files were probably not produced directly by the Illumina software, perhaps they were output by srf2fastq. srf2fastq records the read segment number after the slash. If the protocol used was a standard illumina index recipe then read 1 is segment 1, the index read is segment 2 and read 2 is segment 3. Alternatively the processing pipeline may be configured to discard the last base(s) of the reads; in this case the saved part of read 1 is segment 1, the discarded part of read 1 is segment 2 and then the saved part of read 2 is segment 3. As you can see in both of these situations the /3 will be added to what you are calling read 2.

You can fix these names with sed:

Code:

# sed -i.bak '/^[@\+]<unique part of your sequence identifier>/ s/\/3$/\/2/' filename.fastq

This will replace the 3 with a 2 at the end of your sequence identifier lines. It will save a copy of the original file as filename.fastq.bak

**trickytank** · 05-02-2011, 08:56 PM

Hey thanks for that. Thanks for the explanation.

I wrote something myself earlier to do a similar thing, except I was also checking things were as I expected:

Code:

#!/usr/bin/perl -w

use warnings;

=pod

This script fixes the second read pair names in FastQ files, from 
/3 to /2. 

Usage: 
cat input-fastq.txt | perl correct-readpairs.pl > output.fq 

=cut

$row = 1;
while(<STDIN>) { 
        if($row % 2) { # only every first and third row to act on.
                 if(! s/\/3$/\/2/o) {
                        die "We have not considered the possibility of the following line:\n$_";
                }
        }
        print $_;
        $row++;
}

**sklages** · 05-03-2011, 02:46 AM

Originally posted by trickytank View Post

I've had these reads in Fastq format from an Illumina machine but noticed the paired end names are not ending in the usual /1 and /2, but instead /1 and /3. Picard doesn't handle these properly. I want to check this doesn't have another meaning I should know about.

Thanks.

First end:

Code:

@SequenceIdentifier#0/1
[base calls]
+SequenceIdentifier#0/1
[quality scores]

second end:

Code:

@SequenceIdentifier#0/3
[base calls]
+SequenceIdentifier#0/3
[quality scores]

This was probably a run with an index read; in this case the index read gets the number "2", the pairs are "1" and "3".

hth,
Sven

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Illumina paired-end names

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News