Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina Raw output

    Hey guys,

    here is maybe a stupid question but at my group there are some discussion about what output format the illumina Hiseq2000 produces. Am i right that it is a (or some) fastq file? Thus, there is no need to convert them for using bowtie and stuff like that?

    Thanks in advance!


    Best,

    Philip

  • #2
    The sequence produced at the end of analysis by illumina pipeline is a fastq format sequence file (if you chose not to do any alignments with ELAND).
    In the past (pipeline v.1.7 and earlier) the quality values in the sequence files were in the "illumina" format (and so would presumable need conversion to sanger quality values depending on your needs).
    With the "current" version of pipeline (v.1.8) default quality values have changed to sanger format.

    Comment


    • #3
      More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

      Of course most people will want to have, and perhaps only be given, the latter.

      Comment


      • #4
        Originally posted by westerman View Post
        More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

        Of course most people will want to have, and perhaps only be given, the latter.
        But this too has changed recently.

        During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it) is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities. RTA stores the base call data in a series of so called BCL files. There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included). BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

        Offline this data can be further processed through CASAVA, now currently at v1.8. With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more). CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files. This utility includes demultiplexing if the run was multiplexed. They also changed the file naming convention (no more s_1_sequence.txt) for every single run. The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned. They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).

        Comment


        • #5
          kmcarr is, of course, correct. 'qseq' is no longer. 'bcl' is how the Illumina stores its data. I should have double checked my memory before posting earlier this morning. Too many changes so quickly! That, and not having enough coffee. :-)

          Comment


          • #6
            So, you got the *.bcl files from a sequencing run and not the fastq. Thus, using CASAVA is crucial to get those?

            Comment


            • #7
              Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.

              Comment


              • #8
                thanks guys, you helped me a lot!

                Comment


                • #9
                  Originally posted by fkrueger View Post
                  Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.
                  Note that these instructions apply if you are using OLB v1.9 and CASAVA 1.7. The procedure is different now with CASAVA 1.8. Version 1.8 has a script, configureBclToFastq.pl, which coordinates the conversion of .bcl files directly to compressed fastq files, with demultiplexing if needed. GERALD is no longer included in CASAVA (there is a different script to manage alignments). Also, OLB is no longer required for any part of the normal post instrument analysis.

                  Comment


                  • #10
                    During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it)

                    is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities.

                    RTA stores the base call data in a series of so called BCL files.

                    There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included).

                    BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

                    Offline this data can be further processed through CASAVA,

                    now currently at v1.8.

                    With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more).

                    CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files.

                    This utility includes demultiplexing if the run was multiplexed.

                    They also changed the file naming convention (no more s_1_sequence.txt) for every single run.

                    The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned.

                    They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).
                    _________________
                    Optics4Birding
                    Last edited by stelabentley; 08-06-2021, 01:51 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    32 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X