View Single Post
Old 09-28-2009, 09:33 AM   #188
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Kal,

Bustard and GERALD are not files with a format in the sense you are asking. Bustard and GERALD are pipelines for processing Illumina short reads data. They generate many different output files with many different formats.

The Bustard pipeline performs base calling starting with signal intensity information. The primary output of the Bustard pipeline are qseq files. These files are a format peculiar to Illumina which contain the read ID, base calls and quality scores for each read on a single line as a set of tab separated values. Bustard may output other files (e.g. qval, prb) depending on options supplied when the pipeline is launched.

GERALD is the pipeline for performing alignments using one of two different aligners supplied with the Pipeline software. The first aligner, PhageAlign is only useful for very small genomes and data sets and is almost never used so I will forego any further mention of it. The primary aligner supplied with the Illumina pipeline is Eland. GERALD calls the Eland aligner and passes it a set of configuration parameters. Eland outputs a number of files which all have similar (but slightly different) formats. Some examples of the files generated by Eland are s_N_eland_extended.txt, s_N_eland_multi.txt (where N = lane number from the Illumina run). These files basically list each read, its sequence and quality scores, where it matches the reference sequence and what mismatches exist between the read and the reference. Which files Eland generates and details of their format will be dependent on the arguments used when invoking Eland. GERALD may also be used to output sequence files in FASTQ format.
kmcarr is offline