Seqanswers Leaderboard Ad

**m_elena_bioinfo** · 05-20-2010, 06:42 AM

Here is another example of my txt file:

HWUSI-EAS68R 1 3 1 1097 20058 0 1 AGACCTCATTATTATCTGTGTGTCTGCATTTTCTAATCCTTTTTGCCCCAG ]^aaa]][]]E^^^`]]]^]aaaaaaa_a
\aa_``_a_aa[a]]`YZ[WSY 1
HWUSI-EAS68R 1 3 1 1097 17901 0 1 TGCTGATGAGATTTATGACTGCAAGGTGGAGCACTGGGGCCTGGACCAGCC bbbbb]^`Y`Kbbbbaaa_`^^b^b]\]_
`bbbbbbb^b_b^bbb_b\bbb 1
HWUSI-EAS68R 1 3 1 1097 17710 0 1 TGGCGCACCCTAAGGCTCAGTCAGTAACCCGTACACAAACTCGTCCCTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBB 0

**rmdavies** · 05-20-2010, 08:24 AM

This looks like an Illumina .qseq.txt file, although I'm a bit puzzled as to why the last base of the sequence is in its own column. Possibly this is an artifact from when you copied the data into the forum?

Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):

Code:

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
    chomp;
    my ($instr, $run_id, $lane, $tile, $x, $y, $index, $read,
	$bases, $q_line, $filter) = split /\t/, $_;

    # Turn dots into Ns in the base calls
    $bases =~ tr/./N/;

    # Convert Illumina's quality values to true Phred scale
    $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;

    if ($index) {
	print "\@${instr}_$run_id:$lane:$tile:$x:$y\#$index/$read\n";
    } else {
	print "\@${instr}_$run_id:$lane:$tile:$x:$y/$read\n";
    }
    print "$bases\n+\n$q_line\n";
}

Once you have your fastq file, you should be able to use it as input to bwa in order to get your alignment.

**m_elena_bioinfo** · 05-20-2010, 09:53 AM

Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
thanks again for the precious help!!!
M.Elena

**Asifullah** · 08-12-2010, 02:44 AM

Originally posted by m_elena_bioinfo View Post

Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
thanks again for the precious help!!!
M.Elena

I am a new user of illumina data and i myself facing the same type of read format which u have pointed. I want to convert it in to simple FASTq format. could you please give the commend line for running this perl script. I can,t find its running commend line for successful operation. My email address is ([email protected]). i will highly oblige to lessen from your side.
regard,
asifullah

**epigen** · 03-01-2011, 07:47 AM

convert Illumina scores into Phred scores in a BAM file

Originally posted by rmdavies View Post

Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):

Although this thread is quite old, I found it extremely useful. Thanks for providing the efficient way to convert Illumina scores into Phred scores, rmdavies!
I used it to transform the scores in a BAM file:

samtools view -h Illumina_score.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam

Saved us a lot of fastq file transformations and we did not have to run all the BWA alignments again.

**jamminbeh** · 06-15-2011, 09:24 AM

Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.

**sklages** · 06-16-2011, 02:57 AM

Originally posted by jamminbeh View Post

Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.

to run: perl myScript.pl FASTQ_file > newfile.fq

It is not intended to work with fastq files but with qseq files.

**Kaas** · 10-03-2013, 02:20 AM

I saved the above mentioned script as Sanger.pl and tested it on the first 12 lines of a Illumina 1.5 fastq file

perl Sanger.pl SRR329951_2.fastq.fa > newfile.fq

put I am given the following errors for all lines
Use of uninitialized value in transliteration (tr///) at Sanger.pl line 23, <> line 1.
Use of uninitialized value in transliteration (tr///) at Sanger.pl line 29, <> line 1.
Use of uninitialized value $run_id in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $lane in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $tile in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $x in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $y in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $read in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $bases in concatenation (.) or string at Sanger.pl line 43, <> line 1.
Use of uninitialized value $q_line in concatenation (.) or string at Sanger.pl line 43, <> line 1.

Any idea what I am doing wrong?

cat SRR823966_2.fastq
@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2
GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT
+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2
AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT
+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90
PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

head -12 newfile.fq
@@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2_::::/

+

@GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT_::::/

+

@+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90_::::/

+

@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+

@@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2_::::/

+

@AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT_::::/

+

@+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90_::::/

+

@PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+

@@SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90/2_::::/

+

@CTCGAGCAGGAGAGGGGCCCTGGCTGCTGAGGGGTCCCTGTCCAATAACCCCCACACCGATCATGTCCCTCACAGTTTCCATCTCAGACG_::::/

+

@+SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90_::::/

+

@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+

**mastal** · 10-03-2013, 03:28 AM

Hi Kaas,

What are you trying to achieve by using this script?
Because your starting file, SRR823966_2.fastq, looks like it's already in fastq format.

The script converts Illumina's old qseq.txt format to fastq.
Since your file is not in qseq format, the script is not extracting the right information into the variables,
which is why you are getting all those 'Use of uninitialized value' errors.

**Kaas** · 10-03-2013, 03:32 AM

Thanks mastal
I arrived at this thread from http://seqanswers.com/forums/showthread.php?t=5210 because i needed to convert from Illumina 1.5 (phred64) to Illumina 1.9 (Phred33). i misread and used the perl script from this this thread.

**HESmith** · 10-03-2013, 08:15 AM

BFAST contains a Perl script (ill2fastq) for this conversion.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

strange Illumina txt format

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News