SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Conversion of qseq.txt format to fastq rakeshponnala Illumina/Solexa 7 01-08-2014 07:40 AM
seq.txt, qseq.txt and fastq NicoBxl Bioinformatics 5 01-03-2014 08:35 AM
how to convert fastq to export or qseq format? feng Bioinformatics 3 06-15-2011 05:46 AM
fastq to qseq format seq_GA Bioinformatics 0 03-24-2011 07:44 PM
.bcl to *qseq.txt conversion E_Klee Illumina/Solexa 3 08-10-2010 01:19 PM

Reply
 
Thread Tools
Old 04-13-2010, 08:53 PM   #21
Taz
Junior Member
 
Location: USA

Join Date: Apr 2010
Posts: 4
Default

Hiya,

So the script I type in is:

Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.txt | ./qseq2fastq.pl>s_1_sequence.fastq

I'm pretty new to the whole programing world and I can't actually open the text file as it's too large. From the html though the first couple of lines looks like this:

@GA-I_0001:1:1:1036:19043#0/1
AGCTTATCAGACTGATGTTGACCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAA
+GA-I_0001:1:1:1036:19043#0/1
\aaaaaaaaaQ^a]XY[[X`aa]\^YQWUOONNN[[Y[YYZYR^VWPWUVVVVZaaY\aBBBBBBBBBBBBBBB
@GA-I_0001:1:1:1036:14097#0/1
TGCAAATCCATGCAAAACTGCTGTAGGCACCCTCAATGATAGGAAGAGCTCGTATGCCGTCTTCTGTTCGAAAA
+GA-I_0001:1:1:1036:14097#0/1
]__VYPR]YWL[]U][FWT`WWU[R[RYX]HRRPQ[S[VNHRIPOYV[YHW[TP`\__BBBBBBBBBBBBBBBB
@GA-I_0001:1:1:1037:13636#0/1
GAGATGGGCGCCGCGAGGCGTCCAGTCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCT
+GA-I_0001:1:1:1037:13636#0/1
\b_bbb_^^abbb[]P]]aYXL]O]]Y]]aa`__VZ`^^TaaaaT``[aa[aQYQUVYQSZ`X]MOONM^`VM^
@GA-I_0001:1:1:1037:10039#0/1

Basically I know that this is FASTQ format. I'm trying to run the file using the Hannon FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_tool..._trimmer_usage) to analyse my data but it's not recognising the input. I assumed it was because the file was txt and not fq or fa, but I'm not sure why it's not recognising it. I was trying to run the FASTQ/A Clipper.

thanks for the help!

Taz
Taz is offline   Reply With Quote
Old 04-13-2010, 09:13 PM   #22
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Taz View Post
Hiya,

So the script I type in is:

Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.txt | ./qseq2fastq.pl>s_1_sequence.fastq

I'm pretty new to the whole programing world and I can't actually open the text file as it's too large. From the html though the first couple of lines looks like this:

@GA-I_0001:1:1:1036:19043#0/1
AGCTTATCAGACTGATGTTGACCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAA
+GA-I_0001:1:1:1036:19043#0/1
\aaaaaaaaaQ^a]XY[[X`aa]\^YQWUOONNN[[Y[YYZYR^VWPWUVVVVZaaY\aBBBBBBBBBBBBBBB
@GA-I_0001:1:1:1036:14097#0/1
TGCAAATCCATGCAAAACTGCTGTAGGCACCCTCAATGATAGGAAGAGCTCGTATGCCGTCTTCTGTTCGAAAA
+GA-I_0001:1:1:1036:14097#0/1
]__VYPR]YWL[]U][FWT`WWU[R[RYX]HRRPQ[S[VNHRIPOYV[YHW[TP`\__BBBBBBBBBBBBBBBB
@GA-I_0001:1:1:1037:13636#0/1
GAGATGGGCGCCGCGAGGCGTCCAGTCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCT
+GA-I_0001:1:1:1037:13636#0/1
\b_bbb_^^abbb[]P]]aYXL]O]]Y]]aa`__VZ`^^TaaaaT``[aa[aQYQUVYQSZ`X]MOONM^`VM^
@GA-I_0001:1:1:1037:10039#0/1

Basically I know that this is FASTQ format. I'm trying to run the file using the Hannon FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_tool..._trimmer_usage) to analyse my data but it's not recognising the input. I assumed it was because the file was txt and not fq or fa, but I'm not sure why it's not recognising it. I was trying to run the FASTQ/A Clipper.

thanks for the help!

Taz
You have the FASTQ data already. The FASTQ format is just in the text form. You can directly rename the .txt to .fastq (or .fq). And then go ahead for the downstream processing.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 04-13-2010, 09:47 PM   #23
Taz
Junior Member
 
Location: USA

Join Date: Apr 2010
Posts: 4
Default

I tried changing .txt to either .fq or .fastq. I put the following script in:
fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fq]

fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fastq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fastq]

fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)

and this is what I got. Do you know what this means?

Taz
Taz is offline   Reply With Quote
Old 04-13-2010, 10:02 PM   #24
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Taz View Post
I tried changing .txt to either .fq or .fastq. I put the following script in:
fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fq]

fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fastq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fastq]

fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)

and this is what I got. Do you know what this means?

Taz
Are you running this on Linux or Windows machine. It could be a translation of the carriage return or newline between the two OSes.
nilshomer is offline   Reply With Quote
Old 04-14-2010, 05:34 PM   #25
Taz
Junior Member
 
Location: USA

Join Date: Apr 2010
Posts: 4
Default

Thanks for all the help. I figured out what I was doing wrong. I had brackets around all my variables!
Taz is offline   Reply With Quote
Old 07-19-2010, 12:08 PM   #26
jeongrih
Junior Member
 
Location: Boston

Join Date: Jun 2010
Posts: 1
Talking Thanks guys. just added a line of code..

Thanks for the perl code for converting!

Anyway, for the case of handling a qseq file containing "." instead of "N" in sequence part, I just added a line of code for replacing "." with "N",

print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
print "$parts[8]\n";

to end up with

print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
$parts[8] =~ s/\./N/g;
print "$parts[8]\n";

jeOng
jeongrih is offline   Reply With Quote
Old 09-15-2010, 05:39 AM   #27
sdarko
Member
 
Location: Bethesda, MD

Join Date: Apr 2009
Posts: 51
Default

Quote:
Originally Posted by kmcarr View Post
No, they are not the same format!

QSEQ is a format created by Illumina and it uses a single line of tab separated fields to denote read id information, sequence and quality. The fields for in a QSEQ file are
Code:
MachineID     run#     lane#     tile#     x-coord     y-coord     index     read#     sequence     q-sores    p/f flag
The majority of these fields are specific to Illumina Genome Analyzers and thus the QSEQ format is not appropriate for sequence from other platforms.

The FASTQ format was originally defined by the Sanger Center and an excellent description of it can be found here. This link also describes how the fields from the QSEQ file are aggregated into the read name for the FASTQ file as well as describing the variations to quality score encoding introduced by Solexa/Illumina.
kmcarr,

In the qseq format, what is the p/f flag and what does it stand for?

Thanks,
Sam
sdarko is offline   Reply With Quote
Old 09-15-2010, 06:11 AM   #28
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

Quote:
Originally Posted by sdarko View Post
kmcarr,

In the qseq format, what is the p/f flag and what does it stand for?

Thanks,
Sam
p/f == pass/fail; it signifies whether the read has passed or failed the Illumina filter. Passed reads will have a '1' in this column, failed reads a '0'.

Be aware that the Illumina read passing filter only considers the signal to noise ratio across the first 25 cycles of a read. It is not a measure of overall read quality.
kmcarr is offline   Reply With Quote
Old 09-15-2010, 06:22 AM   #29
sdarko
Member
 
Location: Bethesda, MD

Join Date: Apr 2009
Posts: 51
Default

Quote:
Originally Posted by kmcarr View Post
p/f == pass/fail; it signifies whether the read has passed or failed the Illumina filter. Passed reads will have a '1' in this column, failed reads a '0'.

Be aware that the Illumina read passing filter only considers the signal to noise ratio across the first 25 cycles of a read. It is not a measure of overall read quality.
So, when constructing fastq files from qseq files, are reads that don't pass typically separated from reads that do pass?
sdarko is offline   Reply With Quote
Old 09-15-2010, 06:43 AM   #30
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

Quote:
Originally Posted by sdarko View Post
So, when constructing fastq files from qseq files, are reads that don't pass typically separated from reads that do pass?
Typically yes, at least that is the default behavior of the Illumina pipeline when it constructs its s_n_sequence.txt files.

Looking back now at the bare bones script I provided way back at the beginning of this thread I see that it includes all reads, passed or failed, in the fastq output. I'll leave it as an exercise for the class to modify the script to only output passed reads (bonus points if you make this optional via command line argument).
kmcarr is offline   Reply With Quote
Old 05-18-2011, 08:26 AM   #31
labrat73
Member
 
Location: Louisiana

Join Date: Nov 2010
Posts: 12
Default

Quote:
Originally Posted by Xi Wang View Post
You can use the script below (name it qseq2fastq.pl and replace the former one):

Code:
#!/usr/bin/perl

use warnings;
use strict;

while (<>) {
	chomp;
	my @parts = split /\t/;
	print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
	print "$parts[8]\n";
	print "+","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
	print "$parts[9]\n";
}
Greetings Xi Wang,

I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

$ perl qseq2fastq.pl sequence.fastq > test.fastq

However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

Many thanks!
labrat73 is offline   Reply With Quote
Old 05-18-2011, 11:41 AM   #32
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by labrat73 View Post
Greetings Xi Wang,

I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

$ perl qseq2fastq.pl sequence.fastq > test.fastq

However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

Many thanks!
You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.
sklages is offline   Reply With Quote
Old 05-18-2011, 02:00 PM   #33
labrat73
Member
 
Location: Louisiana

Join Date: Nov 2010
Posts: 12
Default

Quote:
Originally Posted by sklages View Post
You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.
sklages-

thanks so much for your reply. i'm a bit confused because my file has the fastq extension and it looks like this:

@SRR101483.1 SCS_0014:6:1:1063:16736/1
GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
+
DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C:>5?==@A@

when i try to run it, though, i keep getting an error. i compared it to other files that i've run and that's when i noticed that in other files, the title name appears again after the "+", immediately before the base qualities. i'm trying to convert or edit this file so that it looks like this:

@SRR101483.1 SCS_0014:6:1:1063:16736/1
GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
+SRR101483.1 SCS_0014:6:1:1063:16736/1
DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C:>5?==@A@

i hope this makes sense and appreciate any advice you could offer.

best-

labrat73
labrat73 is offline   Reply With Quote
Old 05-18-2011, 02:57 PM   #34
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Use [ code ] and [ /code ] tags to prevent the forum messing up the display of examples.

Your files is already FASTQ format - without the redundant optional repeated identifier on the plus lines. You don't need to make that change.

As sklages said earlier, the script this thread is about converting from the Illumina qseq format into FASTQ.
maubp is offline   Reply With Quote
Old 06-25-2014, 04:51 PM   #35
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default fastq validator

has anyone tried using this to test?

I have a very similar problem here where my .txt is in this format
where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?


@HWI-ST604_0134:4:1101:1391:1882#0/1
NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
+HWI-ST604_0134:4:1101:1391:1882#0/1
BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
@HWI-ST604_0134:4:1101:1493:1886#0/1
NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
+HWI-ST604_0134:4:1101:1493:1886#0/1
BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB
arcolombo698 is offline   Reply With Quote
Old 06-25-2014, 09:51 PM   #36
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by arcolombo698 View Post
has anyone tried using this to test?

I have a very similar problem here where my .txt is in this format
where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?


@HWI-ST604_0134:4:1101:1391:1882#0/1
NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
+HWI-ST604_0134:4:1101:1391:1882#0/1
BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
@HWI-ST604_0134:4:1101:1493:1886#0/1
NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
+HWI-ST604_0134:4:1101:1493:1886#0/1
BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB
I don't get it. There is a "linebreak" (newline) after your '+' line. So this is normal fastq format.

Btw, the '+' line is *not* optional, its content is! There must always be at least the '+' sign as header for the quality line. But it is optional to write any information after that (in the same line).
sklages is offline   Reply With Quote
Old 06-25-2014, 10:28 PM   #37
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.
Brian Bushnell is offline   Reply With Quote
Old 06-25-2014, 10:30 PM   #38
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by Brian Bushnell View Post
The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.
You're right, maybe a copy&paste issue ..?
sklages is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO