SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to transform BAM format to .TXT or .BED? zhenshao Bioinformatics 12 11-14-2015 05:03 PM
Conversion from ‘qseq.txt’ to ‘fastq’ format joseph Bioinformatics 37 06-25-2014 10:30 PM
Conversion of qseq.txt format to fastq rakeshponnala Illumina/Solexa 7 01-08-2014 07:40 AM
unrecognized .txt format agc Bioinformatics 3 11-18-2010 03:36 AM
Illumina S_sequence.txt is a fastq format? [email protected] Bioinformatics 1 04-21-2010 07:24 PM

Reply
 
Thread Tools
Old 05-20-2010, 06:40 AM   #1
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default strange Illumina txt format

Dear NGS-users,
I have a problem for the analysis of reads from f single-read 36bp SureSelect run (Illumina).

My reads file is in .txt format.

I usually use BWA for the alignment and then use SAMTOOLS for the pileup.
With alignment step, BWA returns me a 64kb sai file, and a sam file of 27Kb of dimension. Probably these files are uncorrect and incomplete. The next step to convert sam2bam crushes with this message:

[samopen] SAM header is present: 657 sequences.
[sam_read1] reference 'SN:hg18_knownGene_uc002qho.2 LN:16765

' is recognized as '*'.
[main_samview] truncated file.


I think that the problem is the strange format of the initial txt file (here is an example):
HWUSI-EAS68R 1 3 1 995 11343 0 1 .CTTG........T.....GGGG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBB 0
HWUSI-EAS68R 1 3 1 995 18576 0 1 .ACAG........C.....GTTG............................ BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBB 0

Why is so strange my txt file?
May depend on what?
Thank you very much
M.Elena
m_elena_bioinfo is offline   Reply With Quote
Old 05-20-2010, 06:42 AM   #2
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default

Here is another example of my txt file:

HWUSI-EAS68R 1 3 1 1097 20058 0 1 AGACCTCATTATTATCTGTGTGTCTGCATTTTCTAATCCTTTTTGCCCCAG ]^aaa]][]]E^^^`]]]^]aaaaaaa_a
\aa_``_a_aa[a]]`YZ[WSY 1
HWUSI-EAS68R 1 3 1 1097 17901 0 1 TGCTGATGAGATTTATGACTGCAAGGTGGAGCACTGGGGCCTGGACCAGCC bbbbb]^`Y`Kbbbbaaa_`^^b^b]\]_
`bbbbbbb^b_b^bbb_b\bbb 1
HWUSI-EAS68R 1 3 1 1097 17710 0 1 TGGCGCACCCTAAGGCTCAGTCAGTAACCCGTACACAAACTCGTCCCTGCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBB 0
m_elena_bioinfo is offline   Reply With Quote
Old 05-20-2010, 08:24 AM   #3
rmdavies
Member
 
Location: Great Britain

Join Date: Dec 2009
Posts: 13
Default

This looks like an Illumina .qseq.txt file, although I'm a bit puzzled as to why the last base of the sequence is in its own column. Possibly this is an artifact from when you copied the data into the forum?

Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):

Code:
#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
    chomp;
    my ($instr, $run_id, $lane, $tile, $x, $y, $index, $read,
	$bases, $q_line, $filter) = split /\t/, $_;

    # Turn dots into Ns in the base calls
    $bases =~ tr/./N/;

    # Convert Illumina's quality values to true Phred scale
    $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;

    if ($index) {
	print "\@${instr}_$run_id:$lane:$tile:$x:$y\#$index/$read\n";
    } else {
	print "\@${instr}_$run_id:$lane:$tile:$x:$y/$read\n";
    }
    print "$bases\n+\n$q_line\n";
}
Once you have your fastq file, you should be able to use it as input to bwa in order to get your alignment.
rmdavies is offline   Reply With Quote
Old 05-20-2010, 09:53 AM   #4
m_elena_bioinfo
Member
 
Location: Ospedali Riuniti di Bergamo, ITALY

Join Date: Oct 2009
Posts: 99
Default

Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
thanks again for the precious help!!!
M.Elena
m_elena_bioinfo is offline   Reply With Quote
Old 08-12-2010, 02:44 AM   #5
Asifullah
Junior Member
 
Location: Pakistan

Join Date: Aug 2010
Posts: 5
Default

Quote:
Originally Posted by m_elena_bioinfo View Post
Thank you very much rmdavies! You are right and my file is a .qseq one! By converting the file, now I can get the alignment with bwa. I'm a beginner with illumina data, and I could not know what is this format.
thanks again for the precious help!!!
M.Elena
I am a new user of illumina data and i myself facing the same type of read format which u have pointed. I want to convert it in to simple FASTq format. could you please give the commend line for running this perl script. I can,t find its running commend line for successful operation. My email address is ([email protected]). i will highly oblige to lessen from your side.
regard,
asifullah
Asifullah is offline   Reply With Quote
Old 03-01-2011, 06:47 AM   #6
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default convert Illumina scores into Phred scores in a BAM file

Quote:
Originally Posted by rmdavies View Post
Assuming that this is a .qseq.txt file, you can convert it to fastq format with the following perl script (note that this script converts the quality values to phred+33 format):
Although this thread is quite old, I found it extremely useful. Thanks for providing the efficient way to convert Illumina scores into Phred scores, rmdavies!
I used it to transform the scores in a BAM file:

samtools view -h Illumina_score.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam

Saved us a lot of fastq file transformations and we did not have to run all the BWA alignments again.
epigen is offline   Reply With Quote
Old 06-15-2011, 09:24 AM   #7
jamminbeh
Member
 
Location: Los Angeles

Join Date: Aug 2009
Posts: 11
Default

Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.
jamminbeh is offline   Reply With Quote
Old 06-16-2011, 02:57 AM   #8
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

Quote:
Originally Posted by jamminbeh View Post
Do you know how to run the perl script in unix and make it utilize the fastq file? thanks.
to run: perl myScript.pl FASTQ_file > newfile.fq

It is not intended to work with fastq files but with qseq files.
sklages is offline   Reply With Quote
Old 10-03-2013, 02:20 AM   #9
Kaas
Member
 
Location: Copenhagen, Denmark

Join Date: Dec 2012
Posts: 17
Default

I saved the above mentioned script as Sanger.pl and tested it on the first 12 lines of a Illumina 1.5 fastq file

perl Sanger.pl SRR329951_2.fastq.fa > newfile.fq

put I am given the following errors for all lines
Use of uninitialized value in transliteration (tr///) at Sanger.pl line 23, <> line 1.
Use of uninitialized value in transliteration (tr///) at Sanger.pl line 29, <> line 1.
Use of uninitialized value $run_id in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $lane in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $tile in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $x in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $y in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $read in concatenation (.) or string at Sanger.pl line 39, <> line 1.
Use of uninitialized value $bases in concatenation (.) or string at Sanger.pl line 43, <> line 1.
Use of uninitialized value $q_line in concatenation (.) or string at Sanger.pl line 43, <> line 1.

Any idea what I am doing wrong?

cat SRR823966_2.fastq
@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2
GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT
+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2
AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT
+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90
PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

head -12 newfile.fq
@@SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90/2_::::/

+

@GGAGACTGTAGTTGGGTAGAGGGTCAGGTGTCGGGGTACTCGTGAGTTGTGTTGGCGGTTGTGTAGTTTAGTATATGTGTGATTGTTTGT_::::/

+

@+SRR823966.19590458 FCC076MACXX:3:2308:13081:200822 length=90_::::/

+

@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+

@@SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90/2_::::/

+

@AAACATGTAACTTATTTATTTTTACCATTGTTGGGCTGGCGTGGTGGTTTGTGAGTGGGCCTTTGAGTTTGATGTCAGTCTGGTCTGTGT_::::/

+

@+SRR823966.19590459 FCC076MACXX:3:2308:14378:200834 length=90_::::/

+

@PPJSSS\aQQKR]JQbfQ]biiiJJRJR[RSbfhiHYbgHHO^eaeheGW[bfHW]bgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+

@@SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90/2_::::/

+

@CTCGAGCAGGAGAGGGGCCCTGGCTGCTGAGGGGTCCCTGTCCAATAACCCCCACACCGATCATGTCCCTCACAGTTTCCATCTCAGACG_::::/

+

@+SRR823966.19590460 FCC076MACXX:3:2308:15143:200791 length=90_::::/

+

@BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB_::::/

+
Kaas is offline   Reply With Quote
Old 10-03-2013, 03:28 AM   #10
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Hi Kaas,

What are you trying to achieve by using this script?
Because your starting file, SRR823966_2.fastq, looks like it's already in fastq format.

The script converts Illumina's old qseq.txt format to fastq.
Since your file is not in qseq format, the script is not extracting the right information into the variables,
which is why you are getting all those 'Use of uninitialized value' errors.
mastal is offline   Reply With Quote
Old 10-03-2013, 03:32 AM   #11
Kaas
Member
 
Location: Copenhagen, Denmark

Join Date: Dec 2012
Posts: 17
Default

Thanks mastal
I arrived at this thread from http://seqanswers.com/forums/showthread.php?t=5210 because i needed to convert from Illumina 1.5 (phred64) to Illumina 1.9 (Phred33). i misread and used the perl script from this this thread.
Kaas is offline   Reply With Quote
Old 10-03-2013, 08:15 AM   #12
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 498
Default

BFAST contains a Perl script (ill2fastq) for this conversion.
HESmith is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO