SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
split fastq file Balat Bioinformatics 10 09-22-2016 07:55 AM
Converting FASTA/qual file pair from 454 to FASTQ oiiio Bioinformatics 9 01-01-2016 03:55 PM
Convert fastq from NCBI SRA to fasta and qual? kmkocot Bioinformatics 7 10-09-2012 09:15 AM
split a fastq file lfaino Bioinformatics 4 04-14-2011 03:28 PM
Split GA FASTQ file aritakum Bioinformatics 3 06-10-2010 04:15 AM

Reply
 
Thread Tools
Old 01-05-2011, 05:48 PM   #1
ewilbanks
Member
 
Location: Davis, CA

Join Date: Mar 2009
Posts: 82
Question Split fastq to fasta and qual file?

Hi all,

Does anyone have or know about good scripts to split a sanger format fastq file into the corresponding fasta and qual file?? I have a dataset that I'd like to quality trim with LUCY but I can't figure out how to get it split apart! I've tried using the app on the galaxy page -- but its producing weird errors that I don't understand. Any help much appreciated!!

-Lizzy
ewilbanks is offline   Reply With Quote
Old 01-05-2011, 11:14 PM   #2
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

This can be done with Biopieces (www.biopieces.org):

Code:
read_fastq -i test.fq | write_454 -o test.fna -q test.fna.qual -x

Cheers,


Martin
maasha is offline   Reply With Quote
Old 01-06-2011, 01:41 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

In Biopython the simplest way to do it is like this:

Code:
from Bio import SeqIO
SeqIO.convert("example.fastq", "fastq", "example.fasta", "fasta")
SeqIO.convert("example.fastq", "fastq", "example.qual", "qual")
You can be more cunning if you want to avoid making two passes through the FASTQ, but the above should be pretty fast anyway.

See also http://dx.doi.org/10.1093/nar/gkp1137 - I'd have suggested using EMBOSS seqret which can do FASTQ to FASTA, but I don't think it supports the QUAL format.
maubp is offline   Reply With Quote
Old 01-06-2011, 08:59 AM   #4
ewilbanks
Member
 
Location: Davis, CA

Join Date: Mar 2009
Posts: 82
Default

thank you!! The Biopython script did the trick-- even for a python newbie!
ewilbanks is offline   Reply With Quote
Old 01-06-2011, 09:19 AM   #5
ewilbanks
Member
 
Location: Davis, CA

Join Date: Mar 2009
Posts: 82
Default

Do you know how to use Biopython to do the reverse? Fasta +qual = fastq?
ewilbanks is offline   Reply With Quote
Old 01-06-2011, 09:33 AM   #6
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

Well, Biopieces can do that as well:

Code:
read_454 -i test.fna -q test.qual | write_fastq -o test.fq -x

In fact, Biopieces can also trim sequences based on quality scores by using trim_seq:


Code:
read_454 -i test.fna -q test.qual | trim_seq | write_fastq -o test.fq -x


Martin

Last edited by maasha; 01-06-2011 at 09:36 AM.
maasha is offline   Reply With Quote
Old 01-06-2011, 01:23 PM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by ewilbanks View Post
Do you know how to use Biopython to do the reverse? Fasta +qual = fastq?
Since you asked, yes, most easily done with the PairedFastaQualIterator function in the Bio.SeqIO.QualityIO module:

Code:
from Bio import SeqIO
from Bio.SeqIO.QualityIO import PairedFastaQualIterator
rec_iter = PairedFastaQualIterator(open("Quality/example.fasta"),
                                   open("Quality/example.qual"))
SeqIO.write(rec_iter, "Quality/temp.fastq", "fastq")
This isn't quite as easy as the reverse since we need to take two input files and read over them in sync - and the high level functions in Bio.SeqIO are all intended for just one file. This example is based on the example in the documentation here:

http://www.biopython.org/DIST/docs/a...taQualIterator
maubp is offline   Reply With Quote
Old 01-06-2011, 01:30 PM   #8
ewilbanks
Member
 
Location: Davis, CA

Join Date: Mar 2009
Posts: 82
Default

Thanks everyone!

@maasha, I'll have to check it out! Does trim_seq accept sanger format qualities or only Solexa?
ewilbanks is offline   Reply With Quote
Old 01-07-2011, 02:02 AM   #9
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

trim_seq works on Illumina type qualities.

read_fastq and read_454 convert to Illumina type qualities per default. Phred scores are automagically detected and converted. If you have Solexa scores there is a switch.

write_fastq output Illumina type qualities.

write_454 automagically convertes to decimal scores.



Cheers,


Martin
maasha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:11 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO