SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sample Output of Helicos Raw Data foolishbrat Helicos / Direct Genomics 5 03-10-2021 05:08 AM
illumina raw genotype data format nans_bn Illumina/Solexa 1 11-21-2012 04:54 PM
23andme raw Illumina intensity reads rworthi Bioinformatics 4 12-01-2011 08:29 PM
What does Illumina raw data look like? kwebb Bioinformatics 21 12-02-2010 04:12 PM
How to convert cufflinks output to raw counts jebe RNA Sequencing 0 01-26-2010 11:29 AM

Reply
 
Thread Tools
Old 08-02-2011, 05:32 AM   #1
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default Illumina Raw output

Hey guys,

here is maybe a stupid question but at my group there are some discussion about what output format the illumina Hiseq2000 produces. Am i right that it is a (or some) fastq file? Thus, there is no need to convert them for using bowtie and stuff like that?

Thanks in advance!


Best,

Philip
sphil is offline   Reply With Quote
Old 08-02-2011, 07:28 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,121
Default

The sequence produced at the end of analysis by illumina pipeline is a fastq format sequence file (if you chose not to do any alignments with ELAND).
In the past (pipeline v.1.7 and earlier) the quality values in the sequence files were in the "illumina" format (and so would presumable need conversion to sanger quality values depending on your needs).
With the "current" version of pipeline (v.1.8) default quality values have changed to sanger format.
GenoMax is offline   Reply With Quote
Old 08-02-2011, 08:43 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

Of course most people will want to have, and perhaps only be given, the latter.
westerman is offline   Reply With Quote
Old 08-02-2011, 12:46 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by westerman View Post
More precisely (and note that I am still a beginner in terms of CASAVA 1.8 and the hiSeq) I believe that the output from the machine is 'qseq' format and that the first step in CASAVA processing converts qseq to fastq.

Of course most people will want to have, and perhaps only be given, the latter.
But this too has changed recently.

During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it) is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities. RTA stores the base call data in a series of so called BCL files. There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included). BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

Offline this data can be further processed through CASAVA, now currently at v1.8. With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more). CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files. This utility includes demultiplexing if the run was multiplexed. They also changed the file naming convention (no more s_1_sequence.txt) for every single run. The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned. They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).
kmcarr is offline   Reply With Quote
Old 08-02-2011, 12:56 PM   #5
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

kmcarr is, of course, correct. 'qseq' is no longer. 'bcl' is how the Illumina stores its data. I should have double checked my memory before posting earlier this morning. Too many changes so quickly! That, and not having enough coffee. :-)
westerman is offline   Reply With Quote
Old 08-03-2011, 01:51 AM   #6
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

So, you got the *.bcl files from a sequencing run and not the fastq. Thus, using CASAVA is crucial to get those?
sphil is offline   Reply With Quote
Old 08-03-2011, 02:35 AM   #7
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 625
Default

Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.
fkrueger is offline   Reply With Quote
Old 08-03-2011, 03:42 AM   #8
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

thanks guys, you helped me a lot!
sphil is offline   Reply With Quote
Old 08-03-2011, 04:42 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by fkrueger View Post
Yes, if you want to generate qseq files you need to run the conversion script setupBclToQseq.py. If you want to generate FastQ files as well you can specify --GERALD and request FastQ files (and/or alignments with ELAND) in the gerald configuration options. More information on this can found in the OLB1.9 User guide.
Note that these instructions apply if you are using OLB v1.9 and CASAVA 1.7. The procedure is different now with CASAVA 1.8. Version 1.8 has a script, configureBclToFastq.pl, which coordinates the conversion of .bcl files directly to compressed fastq files, with demultiplexing if needed. GERALD is no longer included in CASAVA (there is a different script to manage alignments). Also, OLB is no longer required for any part of the normal post instrument analysis.
kmcarr is offline   Reply With Quote
Old 08-06-2021, 01:48 PM   #10
stelabentley
Junior Member
 
Location: Irvine, CA

Join Date: Aug 2021
Posts: 1
Default

During the run the Real Time Analysis (RTA) software on the instrument control computer (that Dell T7500 sitting next to it)

is processing the images to determine cycle-by-cycle intensities for each cluster and then performing base calling based on those intensities.

RTA stores the base call data in a series of so called BCL files.

There is one BCL (suffix .bcl) file for each lane-tile-cycle (960 per cycle or 192,000 for a 2x100 PE run + 6,720 more for the index read if included).

BCL is a compact binary data file so you can't open these files to "look at them". This is the final output from the instrument and its RTA software.

Offline this data can be further processed through CASAVA,

now currently at v1.8.

With the introduction of 1.8 QSEQ files are gone (you can still produce them but they aren't used any more).

CASAVA 1.8 includes a utility to directly produce compressed (gzip) FASTQ files from the BCL files.

This utility includes demultiplexing if the run was multiplexed.

They also changed the file naming convention (no more s_1_sequence.txt) for every single run.

The format of the Read ID line has also changed somewhat as well as the encoding format for the Q-Scores as GenoMax mentioned.

They now produce FASTQ files adhering to the Sanger definition of ASCII(Phred+33).
_________________
Optics4Birding

Last edited by stelabentley; 08-06-2021 at 01:51 PM.
stelabentley is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO