Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq to csfasta and .qual

    Is there a way to convert a fastq file back to the original csfasta and qual files?

  • #2
    Originally posted by samt View Post
    Is there a way to convert a fastq file back to the original csfasta and qual files?
    Here's my overly complicated PERL script. Note, I assume that the FASTQ qualities are in Sanger format and that the sequence is in color space (i.e adaptor + color calls).

    Code:
    #!/bin/perl
    
    use strict;
    use warnings;
    
    my $csfastq = shift;
    die unless defined($csfastq);
    my $csfasta = $csfastq; $csfasta =~ s/csfastq$/csfasta/;
    die unless !($csfastq eq $csfasta);
    my $qual = $csfastq; $qual =~ s/.csfastq$/_QV.qual/;
    die unless !($csfastq eq $qual);
    
    open(FHcsfastq, "$csfastq") || die;
    open(FHcsfasta, ">$csfasta") || die;
    open(FHqual, ">$qual") || die;
    my $state = 0;
    my ($n, $r, $q) = ("", "", "");
    while(defined(my $line = <FHcsfastq>)) {
        chomp($line);
        if(0 == $state) {
            &print_out(\*FHcsfasta, \*FHqual, $n, $r, $q);
            $n = $line;
            $n =~ s/^\@/>/;
        }
        elsif(1 == $state) {
            $r = $line;
        }
        elsif(3 == $state) {
            $q = $line;
            # convert back from SANGER phred
            my $tmp_q = "";
            for(my $i=0;$i<length($q);$i++) {
                my $Q = ord(substr($q, $i, 1)) - 33;
                die unless (0 < $Q);
                if(0 < $i) {
                    $tmp_q .= " ";
                }
                $tmp_q .= "$Q";
            }
            $q = $tmp_q;
        }
        $state = ($state+1)%4;
    }
    &print_out(\*FHcsfasta, \*FHqual, $n, $r, $q);
    close(FHcsfasta);
    close(FHcsfastq);
    close(FHqual);
    
    sub print_out {
        my ($FHcsfasta, $FHqual, $n, $r, $q) = @_;
    
        if(0 < length($n)) {
            print $FHcsfasta "$n\n$r\n";
            print $FHqual "$n\n$q\n";
        }
    }

    Comment


    • #3
      Thanks Nils, what are the arguments this script takes?

      Comment


      • #4
        Originally posted by samt View Post
        Thanks Nils, what are the arguments this script takes?
        Forgot to mention that. It takes in the *fastq filename as input. It automatically creates the output *csfasta and *_QV.qual files.

        Nils

        Comment


        • #5
          I thought so from reading the code but i executed it and got the error:

          Died at fastqtocs.pl line 9.


          ran the command:
          perl fastqtocs.pl SRR015251.fastq

          Comment


          • #6
            Originally posted by samt View Post
            I thought so from reading the code but i executed it and got the error:

            Died at fastqtocs.pl line 9.


            ran the command:
            perl fastqtocs.pl SRR015251.fastq
            Rename your file to *csfastq.

            Nils

            Comment


            • #7
              Sorry to keep asking, I do appreciate your help..it crashed at:
              Died at fastqtocs.pl line 34, <FHcsfastq> line 4.
              for(my $i=0;$i<length($q);$i++) {
              my $Q = ord(substr($q, $i, 1)) - 33;
              --> die unless (0 < $Q);

              From another post I read, is this a problem of negative qualities?

              Comment


              • #8
                Originally posted by samt View Post
                Sorry to keep asking, I do appreciate your help..it crashed at:
                Died at fastqtocs.pl line 34, <FHcsfastq> line 4.
                for(my $i=0;$i<length($q);$i++) {
                my $Q = ord(substr($q, $i, 1)) - 33;
                --> die unless (0 < $Q);

                From another post I read, is this a problem of negative qualities?
                It is, you could just replace the "die unless (0 < $Q);" with "if($Q < 0) { $Q = -1; }"...

                I don't allow negetive qualities, though I guess they could be "missing".

                Comment


                • #9
                  what about doing it backwards?

                  Is there any way to go from .qual and .csfasta to .fastq? I want to use my SOLiD data in NGS-Cell. .csfasta to .fasta is acceptable as well.

                  Comment


                  • #10
                    Both BFAST, BWA, and MAQ have solid2fastq scripts/programs.

                    Nils

                    Comment


                    • #11
                      phew

                      I thought I was going to have to write one myself.

                      Thanks,
                      Austin.

                      Comment


                      • #12
                        A question about the BFAST solid2fastq script:
                        The SOLiD reads I have use "." instead of "4" for "N" basecalls. These bases have a qual score of -1.
                        After running the script on my reads, all the "." remain as "4", and the "-1" values were converted to " (ASCII 34). Should I manually convert the "." in the sequences to 4, and convert the " qualities to ! (ASCII 33, quality 0 ) ?

                        Example:
                        @226_3_65
                        T...11..3.2..1020.2.13.0.1....0...332..322.1233..1 3
                        +
                        """51"","*""4405",")'"2")"""")"""'5$""0),"2(*5 ""%+

                        came from
                        >226_3_65_F3
                        T...11..3.2..1020.2.13.0.1....0...332..322.1233..1 3
                        and
                        >226_3_65_F3
                        -1 -1 -1 20 16 -1 -1 11 -1 9 -1 -1 19 19 15 20 -1 11 -1 8 6 -1 17 -1 8 -1 -1 -1 -1 8 -1 -1 -1 6 20 3 -1 -1 15 8 11 -1 17 7 9 20 -1 -1 4 10
                        Last edited by juan; 10-28-2009, 09:04 AM.

                        Comment


                        • #13
                          should be?
                          @226_3_65
                          T44411443424410204241340414444044433244322412334413
                          +
                          !!!51!!,!*!!4405!,!)'!2!)!!!!)!!!'5$!!0),!2(*5!!%+
                          Last edited by juan; 10-29-2009, 08:10 AM.

                          Comment


                          • #14
                            Originally posted by juan View Post

                            T4441144342441020424134041444404443324432241233441 3
                            Note the space between the last and second last color (a no-no).

                            Comment


                            • #15
                              That's strange, the space between the 1 and the 3 at the end of the line is a bug in the FORUM code! When I tried to remove the space by clicking "edit", the space does not appear. It pops up during the posting. Look below for example:

                              4441144342441020424134041444404443324432241233441222222

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              34 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X