Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split fastq to fasta and qual file?

    Hi all,

    Does anyone have or know about good scripts to split a sanger format fastq file into the corresponding fasta and qual file?? I have a dataset that I'd like to quality trim with LUCY but I can't figure out how to get it split apart! I've tried using the app on the galaxy page -- but its producing weird errors that I don't understand. Any help much appreciated!!

    -Lizzy

  • #2
    This can be done with Biopieces (www.biopieces.org):

    Code:
    read_fastq -i test.fq | write_454 -o test.fna -q test.fna.qual -x

    Cheers,


    Martin

    Comment


    • #3
      In Biopython the simplest way to do it is like this:

      Code:
      from Bio import SeqIO
      SeqIO.convert("example.fastq", "fastq", "example.fasta", "fasta")
      SeqIO.convert("example.fastq", "fastq", "example.qual", "qual")
      You can be more cunning if you want to avoid making two passes through the FASTQ, but the above should be pretty fast anyway.

      See also http://dx.doi.org/10.1093/nar/gkp1137 - I'd have suggested using EMBOSS seqret which can do FASTQ to FASTA, but I don't think it supports the QUAL format.

      Comment


      • #4
        thank you!! The Biopython script did the trick-- even for a python newbie!

        Comment


        • #5
          Do you know how to use Biopython to do the reverse? Fasta +qual = fastq?

          Comment


          • #6
            Well, Biopieces can do that as well:

            Code:
            read_454 -i test.fna -q test.qual | write_fastq -o test.fq -x

            In fact, Biopieces can also trim sequences based on quality scores by using trim_seq:


            Code:
            read_454 -i test.fna -q test.qual | trim_seq | write_fastq -o test.fq -x


            Martin
            Last edited by maasha; 01-06-2011, 10:36 AM.

            Comment


            • #7
              Originally posted by ewilbanks View Post
              Do you know how to use Biopython to do the reverse? Fasta +qual = fastq?
              Since you asked, yes, most easily done with the PairedFastaQualIterator function in the Bio.SeqIO.QualityIO module:

              Code:
              from Bio import SeqIO
              from Bio.SeqIO.QualityIO import PairedFastaQualIterator
              rec_iter = PairedFastaQualIterator(open("Quality/example.fasta"),
                                                 open("Quality/example.qual"))
              SeqIO.write(rec_iter, "Quality/temp.fastq", "fastq")
              This isn't quite as easy as the reverse since we need to take two input files and read over them in sync - and the high level functions in Bio.SeqIO are all intended for just one file. This example is based on the example in the documentation here:

              Comment


              • #8
                Thanks everyone!

                @maasha, I'll have to check it out! Does trim_seq accept sanger format qualities or only Solexa?

                Comment


                • #9
                  trim_seq works on Illumina type qualities.

                  read_fastq and read_454 convert to Illumina type qualities per default. Phred scores are automagically detected and converted. If you have Solexa scores there is a switch.

                  write_fastq output Illumina type qualities.

                  write_454 automagically convertes to decimal scores.



                  Cheers,


                  Martin

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Innovations in Spatial Biology
                    by seqadmin


                    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                    3D Genomics
                    While spatial biology often involves studying proteins and RNAs in their...
                    01-01-2025, 07:30 PM
                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 01-09-2025, 04:04 PM
                  0 responses
                  439 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 01-09-2025, 09:42 AM
                  0 responses
                  443 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 01-08-2025, 03:17 PM
                  0 responses
                  459 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 01-03-2025, 11:18 AM
                  1 response
                  50 views
                  1 like
                  Last Post Tonia
                  by Tonia
                   
                  Working...
                  X