Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by drio View Post
    Most aligners come with scripts to perform those conversions. bfast has (scripts/ill2fastq.pl).
    And if not, you can try any recent release of EMBOSS seqret, BioPerl, Biopython, BioRuby or BioJava.

    [EDIT: I am referring to how to convert between the FASTQ variants, see kmcarr's post below - QSEQ is not FASTQ]
    Last edited by maubp; 03-24-2010, 09:24 AM.

    Comment


    • #17
      Originally posted by Thomas Doktor View Post
      Originally posted by extrajos View Post
      Hi All,

      I am wondering if qseq format is the same thing as fastq-illumina (differing from fastq-sanger only in the calculation of the quality score) or if qseq and fastq-illumina are different formats.

      Thanks!
      They are the same format, the qseq files are simply tile-divided sequence files.
      No, they are not the same format!

      QSEQ is a format created by Illumina and it uses a single line of tab separated fields to denote read id information, sequence and quality. The fields for in a QSEQ file are
      Code:
      MachineID     run#     lane#     tile#     x-coord     y-coord     index     read#     sequence     q-sores    p/f flag
      The majority of these fields are specific to Illumina Genome Analyzers and thus the QSEQ format is not appropriate for sequence from other platforms.

      The FASTQ format was originally defined by the Sanger Center and an excellent description of it can be found here. This link also describes how the fields from the QSEQ file are aggregated into the read name for the FASTQ file as well as describing the variations to quality score encoding introduced by Solexa/Illumina.

      Comment


      • #18
        My mistake, you are correct.

        Comment


        • #19
          In need of help!

          Originally posted by Xi Wang View Post
          You can use the script below (name it qseq2fastq.pl and replace the former one):

          Code:
          #!/usr/bin/perl
          
          use warnings;
          use strict;
          
          while (<>) {
          	chomp;
          	my @parts = split /\t/;
          	print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
          	print "$parts[8]\n";
          	print "+","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
          	print "$parts[9]\n";
          }
          I downloaded my txt file from Gerald and I'm trying to convert it to fastq format. i've run the above script but I keep getting an empty fastq file. What do I do to correct this? help!

          Comment


          • #20
            Originally posted by Taz View Post
            I downloaded my txt file from Gerald and I'm trying to convert it to fastq format. i've run the above script but I keep getting an empty fastq file. What do I do to correct this? help!
            Could you show me a part (a few lines) of the file you downloaded? And also the command you typed to run the scripts. Thanks
            Xi Wang

            Comment


            • #21
              Hiya,

              So the script I type in is:

              Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.txt | ./qseq2fastq.pl>s_1_sequence.fastq

              I'm pretty new to the whole programing world and I can't actually open the text file as it's too large. From the html though the first couple of lines looks like this:

              @GA-I_0001:1:1:1036:19043#0/1
              AGCTTATCAGACTGATGTTGACCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAA
              +GA-I_0001:1:1:1036:19043#0/1
              \aaaaaaaaaQ^a]XY[[X`aa]\^YQWUOONNN[[Y[YYZYR^VWPWUVVVVZaaY\aBBBBBBBBBBBBBBB
              @GA-I_0001:1:1:1036:14097#0/1
              TGCAAATCCATGCAAAACTGCTGTAGGCACCCTCAATGATAGGAAGAGCTCGTATGCCGTCTTCTGTTCGAAAA
              +GA-I_0001:1:1:1036:14097#0/1
              ]__VYPR]YWL[]U][FWT`WWU[R[RYX]HRRPQ[S[VNHRIPOYV[YHW[TP`\__BBBBBBBBBBBBBBBB
              @GA-I_0001:1:1:1037:13636#0/1
              GAGATGGGCGCCGCGAGGCGTCCAGTCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCT
              +GA-I_0001:1:1:1037:13636#0/1
              \b_bbb_^^abbb[]P]]aYXL]O]]Y]]aa`__VZ`^^TaaaaT``[aa[aQYQUVYQSZ`X]MOONM^`VM^
              @GA-I_0001:1:1:1037:10039#0/1

              Basically I know that this is FASTQ format. I'm trying to run the file using the Hannon FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_tool..._trimmer_usage) to analyse my data but it's not recognising the input. I assumed it was because the file was txt and not fq or fa, but I'm not sure why it's not recognising it. I was trying to run the FASTQ/A Clipper.

              thanks for the help!

              Taz

              Comment


              • #22
                Originally posted by Taz View Post
                Hiya,

                So the script I type in is:

                Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.txt | ./qseq2fastq.pl>s_1_sequence.fastq

                I'm pretty new to the whole programing world and I can't actually open the text file as it's too large. From the html though the first couple of lines looks like this:

                @GA-I_0001:1:1:1036:19043#0/1
                AGCTTATCAGACTGATGTTGACCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAA
                +GA-I_0001:1:1:1036:19043#0/1
                \aaaaaaaaaQ^a]XY[[X`aa]\^YQWUOONNN[[Y[YYZYR^VWPWUVVVVZaaY\aBBBBBBBBBBBBBBB
                @GA-I_0001:1:1:1036:14097#0/1
                TGCAAATCCATGCAAAACTGCTGTAGGCACCCTCAATGATAGGAAGAGCTCGTATGCCGTCTTCTGTTCGAAAA
                +GA-I_0001:1:1:1036:14097#0/1
                ]__VYPR]YWL[]U][FWT`WWU[R[RYX]HRRPQ[S[VNHRIPOYV[YHW[TP`\__BBBBBBBBBBBBBBBB
                @GA-I_0001:1:1:1037:13636#0/1
                GAGATGGGCGCCGCGAGGCGTCCAGTCTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCT
                +GA-I_0001:1:1:1037:13636#0/1
                \b_bbb_^^abbb[]P]]aYXL]O]]Y]]aa`__VZ`^^TaaaaT``[aa[aQYQUVYQSZ`X]MOONM^`VM^
                @GA-I_0001:1:1:1037:10039#0/1

                Basically I know that this is FASTQ format. I'm trying to run the file using the Hannon FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_tool..._trimmer_usage) to analyse my data but it's not recognising the input. I assumed it was because the file was txt and not fq or fa, but I'm not sure why it's not recognising it. I was trying to run the FASTQ/A Clipper.

                thanks for the help!

                Taz
                You have the FASTQ data already. The FASTQ format is just in the text form. You can directly rename the .txt to .fastq (or .fq). And then go ahead for the downstream processing.
                Xi Wang

                Comment


                • #23
                  I tried changing .txt to either .fq or .fastq. I put the following script in:
                  fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
                  tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fq]

                  fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
                  tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fastq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fastq]

                  fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)

                  and this is what I got. Do you know what this means?

                  Taz

                  Comment


                  • #24
                    Originally posted by Taz View Post
                    I tried changing .txt to either .fq or .fastq. I put the following script in:
                    fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
                    tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fq]

                    fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)
                    tasleem-samjis-macbook:~ tasleemsamji$ /Users/tasleemsamji/Documents/bin/fastx_clipper [-a CTGTAGGCACCATCAATGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG] [-D] [-d 0] [-M 15] [-l 15] [-c] [-v] [-i /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence.fastq] [-o /Users/tasleemsamji/Documents/Joan\ Steitz\'s\ and\ Bob\ Means\'\ Lab/Data/1670\ Deep\ Sequencing/Data/old\ data/s_1_sequence_trimmed.fastq]

                    fastx_clipper: input file (-) has unknown file format (not FASTA or FASTQ), first character = \n (10)

                    and this is what I got. Do you know what this means?

                    Taz
                    Are you running this on Linux or Windows machine. It could be a translation of the carriage return or newline between the two OSes.

                    Comment


                    • #25
                      Thanks for all the help. I figured out what I was doing wrong. I had brackets around all my variables!

                      Comment


                      • #26
                        Thanks guys. just added a line of code..

                        Thanks for the perl code for converting!

                        Anyway, for the case of handling a qseq file containing "." instead of "N" in sequence part, I just added a line of code for replacing "." with "N",

                        print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
                        print "$parts[8]\n";

                        to end up with

                        print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
                        $parts[8] =~ s/\./N/g;
                        print "$parts[8]\n";

                        jeOng

                        Comment


                        • #27
                          Originally posted by kmcarr View Post
                          No, they are not the same format!

                          QSEQ is a format created by Illumina and it uses a single line of tab separated fields to denote read id information, sequence and quality. The fields for in a QSEQ file are
                          Code:
                          MachineID     run#     lane#     tile#     x-coord     y-coord     index     read#     sequence     q-sores    p/f flag
                          The majority of these fields are specific to Illumina Genome Analyzers and thus the QSEQ format is not appropriate for sequence from other platforms.

                          The FASTQ format was originally defined by the Sanger Center and an excellent description of it can be found here. This link also describes how the fields from the QSEQ file are aggregated into the read name for the FASTQ file as well as describing the variations to quality score encoding introduced by Solexa/Illumina.
                          kmcarr,

                          In the qseq format, what is the p/f flag and what does it stand for?

                          Thanks,
                          Sam

                          Comment


                          • #28
                            Originally posted by sdarko View Post
                            kmcarr,

                            In the qseq format, what is the p/f flag and what does it stand for?

                            Thanks,
                            Sam
                            p/f == pass/fail; it signifies whether the read has passed or failed the Illumina filter. Passed reads will have a '1' in this column, failed reads a '0'.

                            Be aware that the Illumina read passing filter only considers the signal to noise ratio across the first 25 cycles of a read. It is not a measure of overall read quality.

                            Comment


                            • #29
                              Originally posted by kmcarr View Post
                              p/f == pass/fail; it signifies whether the read has passed or failed the Illumina filter. Passed reads will have a '1' in this column, failed reads a '0'.

                              Be aware that the Illumina read passing filter only considers the signal to noise ratio across the first 25 cycles of a read. It is not a measure of overall read quality.
                              So, when constructing fastq files from qseq files, are reads that don't pass typically separated from reads that do pass?

                              Comment


                              • #30
                                Originally posted by sdarko View Post
                                So, when constructing fastq files from qseq files, are reads that don't pass typically separated from reads that do pass?
                                Typically yes, at least that is the default behavior of the Illumina pipeline when it constructs its s_n_sequence.txt files.

                                Looking back now at the bare bones script I provided way back at the beginning of this thread I see that it includes all reads, passed or failed, in the fastq output. I'll leave it as an exercise for the class to modify the script to only output passed reads (bonus points if you make this optional via command line argument).

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X