Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq to csfasta and .qual

    Is there a way to convert a fastq file back to the original csfasta and qual files?

  • #2
    Originally posted by samt View Post
    Is there a way to convert a fastq file back to the original csfasta and qual files?
    Here's my overly complicated PERL script. Note, I assume that the FASTQ qualities are in Sanger format and that the sequence is in color space (i.e adaptor + color calls).

    Code:
    #!/bin/perl
    
    use strict;
    use warnings;
    
    my $csfastq = shift;
    die unless defined($csfastq);
    my $csfasta = $csfastq; $csfasta =~ s/csfastq$/csfasta/;
    die unless !($csfastq eq $csfasta);
    my $qual = $csfastq; $qual =~ s/.csfastq$/_QV.qual/;
    die unless !($csfastq eq $qual);
    
    open(FHcsfastq, "$csfastq") || die;
    open(FHcsfasta, ">$csfasta") || die;
    open(FHqual, ">$qual") || die;
    my $state = 0;
    my ($n, $r, $q) = ("", "", "");
    while(defined(my $line = <FHcsfastq>)) {
        chomp($line);
        if(0 == $state) {
            &print_out(\*FHcsfasta, \*FHqual, $n, $r, $q);
            $n = $line;
            $n =~ s/^\@/>/;
        }
        elsif(1 == $state) {
            $r = $line;
        }
        elsif(3 == $state) {
            $q = $line;
            # convert back from SANGER phred
            my $tmp_q = "";
            for(my $i=0;$i<length($q);$i++) {
                my $Q = ord(substr($q, $i, 1)) - 33;
                die unless (0 < $Q);
                if(0 < $i) {
                    $tmp_q .= " ";
                }
                $tmp_q .= "$Q";
            }
            $q = $tmp_q;
        }
        $state = ($state+1)%4;
    }
    &print_out(\*FHcsfasta, \*FHqual, $n, $r, $q);
    close(FHcsfasta);
    close(FHcsfastq);
    close(FHqual);
    
    sub print_out {
        my ($FHcsfasta, $FHqual, $n, $r, $q) = @_;
    
        if(0 < length($n)) {
            print $FHcsfasta "$n\n$r\n";
            print $FHqual "$n\n$q\n";
        }
    }

    Comment


    • #3
      Thanks Nils, what are the arguments this script takes?

      Comment


      • #4
        Originally posted by samt View Post
        Thanks Nils, what are the arguments this script takes?
        Forgot to mention that. It takes in the *fastq filename as input. It automatically creates the output *csfasta and *_QV.qual files.

        Nils

        Comment


        • #5
          I thought so from reading the code but i executed it and got the error:

          Died at fastqtocs.pl line 9.


          ran the command:
          perl fastqtocs.pl SRR015251.fastq

          Comment


          • #6
            Originally posted by samt View Post
            I thought so from reading the code but i executed it and got the error:

            Died at fastqtocs.pl line 9.


            ran the command:
            perl fastqtocs.pl SRR015251.fastq
            Rename your file to *csfastq.

            Nils

            Comment


            • #7
              Sorry to keep asking, I do appreciate your help..it crashed at:
              Died at fastqtocs.pl line 34, <FHcsfastq> line 4.
              for(my $i=0;$i<length($q);$i++) {
              my $Q = ord(substr($q, $i, 1)) - 33;
              --> die unless (0 < $Q);

              From another post I read, is this a problem of negative qualities?

              Comment


              • #8
                Originally posted by samt View Post
                Sorry to keep asking, I do appreciate your help..it crashed at:
                Died at fastqtocs.pl line 34, <FHcsfastq> line 4.
                for(my $i=0;$i<length($q);$i++) {
                my $Q = ord(substr($q, $i, 1)) - 33;
                --> die unless (0 < $Q);

                From another post I read, is this a problem of negative qualities?
                It is, you could just replace the "die unless (0 < $Q);" with "if($Q < 0) { $Q = -1; }"...

                I don't allow negetive qualities, though I guess they could be "missing".

                Comment


                • #9
                  what about doing it backwards?

                  Is there any way to go from .qual and .csfasta to .fastq? I want to use my SOLiD data in NGS-Cell. .csfasta to .fasta is acceptable as well.

                  Comment


                  • #10
                    Both BFAST, BWA, and MAQ have solid2fastq scripts/programs.

                    Nils

                    Comment


                    • #11
                      phew

                      I thought I was going to have to write one myself.

                      Thanks,
                      Austin.

                      Comment


                      • #12
                        A question about the BFAST solid2fastq script:
                        The SOLiD reads I have use "." instead of "4" for "N" basecalls. These bases have a qual score of -1.
                        After running the script on my reads, all the "." remain as "4", and the "-1" values were converted to " (ASCII 34). Should I manually convert the "." in the sequences to 4, and convert the " qualities to ! (ASCII 33, quality 0 ) ?

                        Example:
                        @226_3_65
                        T...11..3.2..1020.2.13.0.1....0...332..322.1233..1 3
                        +
                        """51"","*""4405",")'"2")"""")"""'5$""0),"2(*5 ""%+

                        came from
                        >226_3_65_F3
                        T...11..3.2..1020.2.13.0.1....0...332..322.1233..1 3
                        and
                        >226_3_65_F3
                        -1 -1 -1 20 16 -1 -1 11 -1 9 -1 -1 19 19 15 20 -1 11 -1 8 6 -1 17 -1 8 -1 -1 -1 -1 8 -1 -1 -1 6 20 3 -1 -1 15 8 11 -1 17 7 9 20 -1 -1 4 10
                        Last edited by juan; 10-28-2009, 09:04 AM.

                        Comment


                        • #13
                          should be?
                          @226_3_65
                          T44411443424410204241340414444044433244322412334413
                          +
                          !!!51!!,!*!!4405!,!)'!2!)!!!!)!!!'5$!!0),!2(*5!!%+
                          Last edited by juan; 10-29-2009, 08:10 AM.

                          Comment


                          • #14
                            Originally posted by juan View Post

                            T4441144342441020424134041444404443324432241233441 3
                            Note the space between the last and second last color (a no-no).

                            Comment


                            • #15
                              That's strange, the space between the 1 and the 3 at the end of the line is a bug in the FORUM code! When I tried to remove the space by clicking "edit", the space does not appear. It pops up during the posting. Look below for example:

                              4441144342441020424134041444404443324432241233441222222

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X