SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting consensus fastq to fasta zlu Bioinformatics 18 08-17-2011 09:11 AM
Any scripts converting fastq 2 scarf mingkunli Bioinformatics 1 06-09-2011 05:08 AM
Converting BED format to bar gwbyeon Bioinformatics 1 08-31-2010 11:31 PM
Converting Solexa FASTQ file to unique sequence tags DrD2009 Bioinformatics 16 08-08-2010 11:30 PM
Converting FASTQ to RMAP prb files ShaunMahony Bioinformatics 2 05-13-2008 08:47 AM

Reply
 
Thread Tools
Old 07-16-2009, 07:51 AM   #1
asafle
Junior Member
 
Location: Israel

Join Date: Jun 2009
Posts: 1
Default Converting Solexa new format to FASTQ

Hi,
1. I got from Illumina sequnces in the following single-lined format:
HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.

2. Many of the sequences are in the format <seq tag><3' adaptor><AAA...>. Therefore I think MAQ fails to remove the 3' adaptor (because it is not in the 3' end of the sequence). Any idea how to overcome this in MAQ or other progrms?

Thanks
Asaf
asafle is offline   Reply With Quote
Old 07-22-2009, 04:27 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.

http://sourceforge.net/mailarchive/f..._name=maq-help

Option Two - Use Biopython 1.51b (or later)

Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).
maubp is offline   Reply With Quote
Old 07-24-2009, 07:42 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by asafle View Post
Hi,
1. I got from Illumina sequnces in the following single-lined format:
HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.
I didn't read you message quite carefully enough. That looks like a 50bp read, a kind of FASTQ entry forced onto one line. Are there any tabs in there? What was the filename - the extension might be of interest?

I would guess converted to an Illumina 1.3+ FASTQ file it probably looks like this:

Code:
@HWI-EAS306:1:1:16:678#0/1
GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
+
a_\XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Or, as a Sanger standard FASTQ file,

Code:
@HWI-EAS306:1:1:16:678#0/1
GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
+
B@=9/8A/,2>9>6####################################
Converted to a PHRED QUAL file,

Code:
>HWI-EAS306:1:1:16:678#0/1
33 31 28 24 14 23 32 14 11 17 29 24 29 21 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
If you have some other files with this one, you can probably confirm if this interpretation is correct or not.

Peter
maubp is offline   Reply With Quote
Old 08-01-2009, 10:07 AM   #4
polivares
Member
 
Location: Manchester, UK

Join Date: Jan 2009
Posts: 29
Default

Quote:
Originally Posted by maubp View Post
You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.

http://sourceforge.net/mailarchive/f..._name=maq-help

Option Two - Use Biopython 1.51b (or later)

Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).
Too add an option I'd recommend to patch maq with : this patch
hope it's helpful
polivares is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO