Seqanswers Leaderboard Ad

**GenoMax** · 08-02-2011, 07:28 AM

The sequence produced at the end of analysis by illumina pipeline is a fastq format sequence file (if you chose not to do any alignments with ELAND).
In the past (pipeline v.1.7 and earlier) the quality values in the sequence files were in the "illumina" format (and so would presumable need conversion to sanger quality values depending on your needs).
With the "current" version of pipeline (v.1.8) default quality values have changed to sanger format.

**westerman** · 08-02-2011, 08:43 AM

More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

Of course most people will want to have, and perhaps only be given, the latter.

**kmcarr** · 08-02-2011, 12:46 PM

Originally posted by westerman View Post

More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

Of course most people will want to have, and perhaps only be given, the latter.

But this too has changed recently.

During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it) is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities. RTA stores the base call data in a series of so called BCL files. There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included). BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

Offline this data can be further processed through CASAVA, now currently at v1.8. With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more). CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files. This utility includes demultiplexing if the run was multiplexed. They also changed the file naming convention (no more s_1_sequence.txt) for every single run. The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned. They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).

**westerman** · 08-02-2011, 12:56 PM

kmcarr is, of course, correct. 'qseq' is no longer. 'bcl' is how the Illumina stores its data. I should have double checked my memory before posting earlier this morning. Too many changes so quickly! That, and not having enough coffee. :-)

**sphil** · 08-03-2011, 01:51 AM

So, you got the *.bcl files from a sequencing run and not the fastq. Thus, using CASAVA is crucial to get those?

**fkrueger** · 08-03-2011, 02:35 AM

Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.

**sphil** · 08-03-2011, 03:42 AM

thanks guys, you helped me a lot!

**kmcarr** · 08-03-2011, 04:42 AM

Originally posted by fkrueger View Post

Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.

Note that these instructions apply if you are using OLB v1.9 and CASAVA 1.7. The procedure is different now with CASAVA 1.8. Version 1.8 has a script, configureBclToFastq.pl, which coordinates the conversion of .bcl files directly to compressed fastq files, with demultiplexing if needed. GERALD is no longer included in CASAVA (there is a different script to manage alignments). Also, OLB is no longer required for any part of the normal post instrument analysis.

**stelabentley** · 08-06-2021, 01:48 PM

During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it)

is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities.

RTA stores the base call data in a series of so called BCL files.

There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included).

BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

Offline this data can be further processed through CASAVA,

now currently at v1.8.

With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more).

CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files.

This utility includes demultiplexing if the run was multiplexed.

They also changed the file naming convention (no more s_1_sequence.txt) for every single run.

The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned.

They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).
_________________
Optics4Birding

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Illumina Raw output

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News