![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to convert general fastq to fastq int format? | feng | Bioinformatics | 21 | 07-04-2014 12:40 AM |
i converted illumina fastq into sanger fastq, need advice | Aicen | Bioinformatics | 5 | 08-27-2012 07:24 AM |
Convert illumina v1.5 fastq to sanger fastq | zouzou | Bioinformatics | 29 | 05-14-2012 10:07 PM |
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? | jjw14 | Illumina/Solexa | 2 | 06-01-2010 05:35 PM |
format problem:convert fastq to seq/qual file | anyone1985 | Bioinformatics | 1 | 04-10-2009 09:27 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Newcastle upon Tyne Join Date: Aug 2009
Posts: 18
|
![]()
Hello everyone,
I am new to next-gen sequencing and this forum. Hope someone can help me out here. To practice and test software tools for alignment, I downloaded a short reads dataset of a yeast genome and tried to convert the sanger-fastq format data to Maq’s BFQ ( I didn't know that SRA provides sanger-format fastq and MAQ prefer the other format of fastq). Command line I used was Code:
maq fastq2bfq SRR002051.fastq SRR002051.bfq Code:
[seq_read_fastq] Inconsistent sequence name: ;E)$$$%%%%$%$""&"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: 32-)"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: *IDI*II%A;1+3&"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: $$,$"#&&%4&+$("""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: 6&%*I)''%11#"+-"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: 43&"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: (I#$,)B:E/(&"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: I5.=;&#!"-"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: """""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: (%%+%$/"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: $&/#2#&%!%"!"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: /%%!$#%*#"&"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: +6+/&%+&%$"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: +F)'$5*&+9%""+%"""""". Continue anyway. [seq_read_fastq] Inconsistent sequence name: %%'"!"""""". Continue anyway. I am wondering if anyone have already wrote a sanger-format fastq to illumina-format fastq cnoverter, it will be really helpful to me. Thanks. |
![]() |
![]() |
![]() |
#2 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
You have already got Sanger style FASTQ files from the NCBI SRA, and MAQ likes standard Sanger FASTQ files. You would only need to convert if you started with Solexa or Illumina encoded FASTQ files
![]() Maybe the problem is something else - could you post the first 20 lines or so of the FASTQ file in the forum - use the [ code ] data [ /code ] tags to make it display nicely. |
![]() |
![]() |
![]() |
#3 |
Member
Location: California Join Date: Sep 2008
Posts: 45
|
![]()
It is not the '@' symbol that not allowed, it is an '@' which follows an illegal space in the description. Unfortunately, many of the fastq files are not properly formatted and contain spaces in the sequence name which causes maq to mess up. Clean up the sequence names and the tool will work.
|
![]() |
![]() |
![]() |
#4 | |
Member
Location: Newcastle upon Tyne Join Date: Aug 2009
Posts: 18
|
![]() Quote:
Here's the 20 lines of the FASTQ file I downloaded from SRA Code:
@SRR002051.1 :8:1:325:773 length=33 AAAGAACATTAAAGCTATATTATAAGCAAAGAT +SRR002051.1 :8:1:325:773 length=33 IIIIIIIIIIIIIIIIIIIIIIIII'II@I$)- @SRR002051.2 :8:1:409:432 length=33 AAGTTATGAAATTGTAATTCCAATATCGTAAGC +SRR002051.2 :8:1:409:432 length=33 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII07 @SRR002051.3 :8:1:488:490 length=33 AATTTCTTACCATATTAGACAAGGCACTATCTT +SRR002051.3 :8:1:488:490 length=33 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII&I @SRR002051.4 :8:1:899:554 length=33 AGATTTCTAATATGGTTAAGAAGCGAACTTTTT +SRR002051.4 :8:1:899:554 length=33 IIIIIIIIIIIIIIIIIII?IIIIII<IIIIII @SRR002051.5 :8:1:464:463 length=33 AAAGCAGCAGCACGTAGTTCTTCATCCTTCTTC +SRR002051.5 :8:1:464:463 length=33 IIIIIIIIIIIIIIIIIIIIIIIFIIIIII%.I Code:
@SRR002051.1 AAAGAACATTAAAGCTATATTATAAGCAAAGAT + :8:1:325:773``````=33IIIIIIIIIIII @I$)- NNNNNNNGTNAAGTTATGAAATTGTAATTCCAATATCGTAAGC + !!!!!!!5:!73``````=33IIIIIIIIIIII"""""""""" @SRR002051.3 AATTTCTTACCATATTAGACAAGGCACTATCTT + :8:1:488:490``````=33IIIIIIIIIIII @SRR002051.4 AGATTTCTAATATGGTTAAGAAGCGAACTTTTT + :8:1:899:554``````=33IIIIIIIIIIII @SRR002051.5 AAAGCAGCAGCACGTAGTTCTTCATCCTTCTTC + :8:1:464:463``````=33IIIIIIIIIIII Last edited by byb121; 12-22-2009 at 03:38 AM. |
|
![]() |
![]() |
![]() |
#5 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Looking at that, I think aaronh is right - MAQ doesn't like the descriptions after the identifiers. I would file a bug on MAQ.
In the short term, you could convert this and remove the descriptions using another tool. e.g. In Biopython 1.51 or later using the SeqIO interface: Code:
from Bio import SeqIO def remove_descr(records): """Iterate over SeqRecord objects clearing their description.""" for rec in records : rec.description = "" yield rec records = remove_descr(SeqIO.parse(open("byb121_sra.fastq"), "fastq")) out_handle = open("byb121_maq.fastq", "w") count = SeqIO.write(records, out_handle, "fastq") out_handle.close() print "Converted %i records" % count |
![]() |
![]() |
![]() |
#6 |
Member
Location: Newcastle upon Tyne Join Date: Aug 2009
Posts: 18
|
![]()
Thanks a lot. Since it's a short-term practice anyway, I will just get rid of those spaces or perhaps everything after the space. It ls always good to know that I didn't do anything wrong
![]() If MAQ can fix the problem it'll be really really great. |
![]() |
![]() |
![]() |
#7 |
Member
Location: China Join Date: Sep 2010
Posts: 12
|
![]()
leaving the "+“ line (third-line) empty, the maq will parse this sequence.
Before: Code:
$cat test.fastq @SRR228083.sra.1HWI-EAS158_0001:5:1:1089:19990length=36 CACTTTGCGTAACGTACACTGGGNTCGCTGAANTAG +SRR228083.sra.1 HWI-EAS158_0001:5:1:1089:19990 length=36 BBABB@B@<4:7:>:>2;3>;>?#@########### @SRR228083.sra.2HWI-EAS158_0001:5:1:1089:13103length=36 GCGCGGTGGTCCCACCTGACCCCNTGCCGAACNCAG +SRR228083.sra.2 HWI-EAS158_0001:5:1:1089:13103 length=36 CCCCCC@CA@C@CCC=BCAB>7@#@>?-@####### $maq fastq2bfq test.fastq test.bfq [seq_read_fastq] Inconsistent sequence name: B@<4:7:>:>2;3>;>?#@###########. Continue anyway. [seq_read_fastq] Inconsistent sequence name: CA@C@CCC=BCAB>7@#@>?-@#######. Continue anyway. -- finish writing file 'test.bfq' -- 2 sequences were loaded. Code:
$cat test.new.fastq @SRR228083.sra.1HWI-EAS158_0001:5:1:1089:19990length=36 CACTTTGCGTAACGTACACTGGGNTCGCTGAANTAG + BBABB@B@<4:7:>:>2;3>;>?#@########### @SRR228083.sra.2HWI-EAS158_0001:5:1:1089:13103length=36 GCGCGGTGGTCCCACCTGACCCCNTGCCGAACNCAG + CCCCCC@CA@C@CCC=BCAB>7@#@>?-@####### $maq fastq2bfq test.new.fastq test.new.bfq -- finish writing file 'test.bfq' -- 2 sequences were loaded. |
![]() |
![]() |
![]() |
Tags |
converter, fastq, maq |
Thread Tools | |
|
|