SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
How to convert sra-lite format to fastq? tbusch0000 Bioinformatics 23 08-21-2013 08:53 PM
Convert illumina v1.5 fastq to sanger fastq zouzou Bioinformatics 29 05-14-2012 09:07 PM
how to convert fastq to export or qseq format? feng Bioinformatics 3 06-15-2011 05:46 AM
format problem:convert fastq to seq/qual file anyone1985 Bioinformatics 1 04-10-2009 08:27 AM

Reply
 
Thread Tools
Old 10-19-2010, 05:39 AM   #1
feng
Member
 
Location: US

Join Date: Oct 2010
Posts: 50
Default how to convert general fastq to fastq int format?

Is there any one using FASTX? It need fastq int format. Do you know how to convert general fastq into fastq int? Many thanks for any suggestion.
feng is offline   Reply With Quote
Old 10-19-2010, 06:45 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

You're not talking about Bill Pearson's tool FASTX, see Pearson et al (1997). Comparison of DNA sequences with protein sequences.
http://www.ncbi.nlm.nih.gov/pubmed/9403055

You probably aren't talking about the FASTX-Toolkit either, since that supports multiple FASTQ variants.
http://hannonlab.cshl.edu/fastx_toolkit/

What are you talking about?
maubp is offline   Reply With Quote
Old 10-19-2010, 09:04 AM   #3
feng
Member
 
Location: US

Join Date: Oct 2010
Posts: 50
Default

I mean the second one. I tried to use Fastx to trim reads in fastq. It seems this program need fastq (int)? It there any more parameter for this?

Thanks.
feng is offline   Reply With Quote
Old 10-19-2010, 09:34 AM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

The FASTX-Toolkit can read standard FASTQ files with ASCII qualities

Could you tell us the command line you are trying to use that fails, and show us the first few reads of your FASTQ file (using the [ code ] and [ /code ] tags in the forum, or the # button on the advanced editor).
maubp is offline   Reply With Quote
Old 10-19-2010, 08:49 PM   #5
feng
Member
 
Location: US

Join Date: Oct 2010
Posts: 50
Default

Hi, I did

$ ./fastq_quality_trimmer -t 20 -l 30 -i sra_data.fastq -o sra_data.fastq.quality.trimmed

fastq_quality_trimmer: Invalid quality score value (char '*' ord 42 quality value -22) on line 12

the fist 12 lines of reads

@SRR001030.1.1 Hela.tar.gz:8:1:328:133.1 length=27
TCGAGATTTCTACAGTCCTTCGATAAC
+SRR001030.1.1 Hela.tar.gz:8:1:328:133.1 length=27
IIIIIIIIIII8IIIIIIIIIIIII4I
@SRR001030.2.1 Hela.tar.gz:8:1:96:66.1 length=27
ATGTACGGTAAATGGAAAAAAAAAAAA
+SRR001030.2.1 Hela.tar.gz:8:1:96:66.1 length=27
IIIIIIIIIIIIIIIIIIIIIIIIIII
@SRR001030.3.1 Hela.tar.gz:8:1:400:280.1 length=27
TCGGATGCCTACTTCTGCTTGAAAACA
+SRR001030.3.1 Hela.tar.gz:8:1:400:280.1 length=27
IIIIIIIIIIIIIII*&IIIIII/III


Any suggestion?
feng is offline   Reply With Quote
Old 10-19-2010, 11:47 PM   #6
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Are you using an older version of the toolkit? The files you showed are valid FastQ and use the Sanger encoding method. The FastX download page shows that automatic encoding detection was introduced in v0.0.13 so if you're using a version which is older than that it might be assuming an Illumina encoding.
simonandrews is offline   Reply With Quote
Old 10-20-2010, 01:49 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

I think as Simon suggests your (old) version of FASTX-toolkit is probably assuming you have Solexa/Illumina FASTQ (which have a narrow range of allowed characters), but you have Sanger FASTQ. Try the FASTX command line option -Q 33 here.
maubp is offline   Reply With Quote
Old 10-20-2010, 07:33 AM   #8
feng
Member
 
Location: US

Join Date: Oct 2010
Posts: 50
Default Thanks

I used -Q 33. It works. Many thanks again.
feng is offline   Reply With Quote
Old 10-21-2010, 06:58 PM   #9
golharam
Member
 
Location: Philadelphia, PA

Join Date: Dec 2009
Posts: 55
Default

i'm running into the same problem. I just got read off an Illumina GAIIx. Here are the first few lines:

@GEN-SEQ-ANA_0012:2:1:1562:1167#0/1
GAATACGTTCGCGTCACACAGTATCAACGGAAGCGGGTAAATGAAGGCGACACAGGGGATAAGCAGGGTTTCATGAAGTATCTTGGGCACGTGCCAGCGAG
+
A;-A4=B:?0?2?########################################################################################
@GEN-SEQ-ANA_0012:2:1:1626:1169#0/1
GAGGAAGGCGGTTTTGAAGGAGAGGGGAGGCTTTCGGACCAAGGGAAGGAAGGGAGGGTAAGAAAAGGAAAAAGAATTTGTGAGGGAGAAGGGTTTTTATC
+
D@EB:?DD;BF=EEEE>@BB4A;BAA;';;/??88AA################################################################
@GEN-SEQ-ANA_0012:2:1:1959:1166#0/1
GTGAGGGGATGTTCACTAGCTTGCCTACTTCGTCGAAGATCAGCTTGGCCTGGGTATTCGCGGTCCCTGCTGTTTTAAAGTTGGCGCCTGCTGCGTCCGCT
+
@6@)@B@<(?EBDBEBGBDB?B8BE?8,B0:A#####################################################################


When I try to run:

gunzip -dc s_2_reads_passed_filter.fastq.gz | fastx_quality_stats

I get the error

fastx_quality_stats: Invalid quality score value (char '-' ord 45 quality value -19) on line 4

I'm running the latest version:


[golharam@vail input]$ fastx_quality_stats -h
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE]
Part of FASTX Toolkit 0.0.13 by A. Gordon (gordon@cshl.edu)
golharam is offline   Reply With Quote
Old 10-22-2010, 12:05 AM   #10
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Have you tried the solution discussed above? This tells FASTX to treat the qualities as Sanger FASTQ...

gunzip -dc s_2_reads_passed_filter.fastq.gz | fastx_quality_stats -Q 33

Are you sure your FASTQ files are in the original form from your Illumina GAIIx? Do you know what version of the pipeline it was?
maubp is offline   Reply With Quote
Old 10-22-2010, 05:06 AM   #11
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,156
Default

The long strings of # are a give away that this FASTQ is encoded in the standard Sanger format (Phred + 33). '#' is ASCII 35; if this was still Illumina format (Phred + 64) these would be 'B' which is the tell tale Illumina Quality Control Indicator.

Do as maubp and others have suggested and add the -Q33 option to your fastx command.

In feng and golharam's defense the -Q parameter to the fastx commands is not documented and does not appear in the help message. It is only discoverable by reading the source code (or the very helpful replies on SeqAnswers). The authors of the fastx toolkit could really help by documenting this option.
kmcarr is offline   Reply With Quote
Old 10-22-2010, 05:09 AM   #12
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by kmcarr View Post
In feng and golharam's defense the -Q parameter to the fastx commands is not documented and does not appear in the help message. It is only discoverable by reading the source code (or the very helpful replies on SeqAnswers). The authors of the fastx toolkit could really help by documenting this option.
The -Q option is on http://hannonlab.cshl.edu/fastx_toolkit/ as part of a release announcement, but otherwise I agree with you.
maubp is offline   Reply With Quote
Old 10-22-2010, 07:30 AM   #13
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,156
Default

Quote:
Originally Posted by maubp View Post
The -Q option is on http://hannonlab.cshl.edu/fastx_toolkit/ as part of a release announcement, but otherwise I agree with you.
Thanks for the pointer, I had missed that.
kmcarr is offline   Reply With Quote
Old 10-28-2010, 10:08 AM   #14
MQ-BCBB
Member
 
Location: Maryland

Join Date: May 2009
Posts: 25
Default

Thanks, thanks for the -Q 33 tip. So helpful!
MQ-BCBB is offline   Reply With Quote
Old 10-28-2010, 12:19 PM   #15
golharam
Member
 
Location: Philadelphia, PA

Join Date: Dec 2009
Posts: 55
Default

Agreed. The toolkit should really add that as a documented parameter to come up when running from the shell.
golharam is offline   Reply With Quote
Old 03-08-2011, 09:24 PM   #16
bhakti
Junior Member
 
Location: India

Join Date: Dec 2010
Posts: 3
Default

Thank you for "-Q 33" tip. Very useful !
bhakti is offline   Reply With Quote
Old 08-21-2011, 11:44 PM   #17
zhouzf
Junior Member
 
Location: china

Join Date: Jun 2011
Posts: 1
Default

I got another problem except "#".

fastx_quality_stats: Invalid quality score value (char 'J' ord 74 quality value 41) on line 8.

line8: @CCFFFFFDDFHHIJIIJJGIIGIGJGEEHIIIGG>FHGJIGHIGGIIJGIH9DHHGGCCHE7?B?3;(;=AB9@@@CDCA;3?<A##############
zhouzf is offline   Reply With Quote
Old 08-22-2011, 07:08 AM   #18
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,156
Default

Quote:
Originally Posted by zhouzf View Post
I got another problem except "#".

fastx_quality_stats: Invalid quality score value (char 'J' ord 74 quality value 41) on line 8.

line8: @CCFFFFFDDFHHIJIIJJGIIGIGJGEEHIIIGG>FHGJIGHIGGIIJGIH9DHHGGCCHE7?B?3;(;=AB9@@@CDCA;3?<A##############
You need to upgrade to the latest version of FASTX Toolkit (verion 0.0.13). FASTX Toolkit has several functions which check the validity of the data as it is processing it; one of these checks is to see if the quality values fall within a reasonable range. This range used to be defined as -15 to 40. The latest version of the Toolkit expands this range to -15 to 93.

Older versions of FASTX Toolkit do not work with the latest version of the Illumina basecalling software. In the latest version of Illumina's software they allow a maximum Q-Scores of 41 ("J"). If you are using an older version of FASTX Toolkit it will report an error at the first "J" it encounters. Upgrade to FASTX Toolkit 0.0.13 and your problem will be solved.
kmcarr is offline   Reply With Quote
Old 09-26-2012, 07:28 AM   #19
seb.lees
Member
 
Location: France, Poitiers

Join Date: Sep 2012
Posts: 12
Default

Quote:
Originally Posted by kmcarr View Post
Older versions of FASTX Toolkit do not work with the latest version of the Illumina basecalling software. In the latest version of Illumina's software they allow a maximum Q-Scores of 41 ("J"). If you are using an older version of FASTX Toolkit it will report an error at the first "J" it encounters. Upgrade to FASTX Toolkit 0.0.13 and your problem will be solved.
It was exactly what I was faced to, Thanks !
seb.lees is offline   Reply With Quote
Old 08-13-2013, 09:25 AM   #20
Will Nelson
Member
 
Location: Arizona

Join Date: Nov 2010
Posts: 16
Default Still not documented

3 years later, the -Q 33 option is still needed and still not documented.
Will Nelson is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO