Hi,
I am trying to figure out exactly where a *.csfasta file resulting from an ABI SOLiD run comes from. In particular, I am wondering whether the csfasta files are generated from image files alone.
As best I can tell, the raw image files from the SOLiD machine are combined and clustering is used to generate per-base data for each bead on the flow cell. This data is then run through some simple filters and all reads that are full length and have a color call for each position are output to the csfasta file. Is this accurate?
One reason I am curious about this is that I know that the *_sequence.txt files generated by the Illumina GA do NOT come from image files alone --- as I understand it, the standard analysis pipeline includes calibration to a reference genome. I am wondering whether this might also be the case for the standard SOLiD pipeline.
I am trying to figure out exactly where a *.csfasta file resulting from an ABI SOLiD run comes from. In particular, I am wondering whether the csfasta files are generated from image files alone.
As best I can tell, the raw image files from the SOLiD machine are combined and clustering is used to generate per-base data for each bead on the flow cell. This data is then run through some simple filters and all reads that are full length and have a color call for each position are output to the csfasta file. Is this accurate?
One reason I am curious about this is that I know that the *_sequence.txt files generated by the Illumina GA do NOT come from image files alone --- as I understand it, the standard analysis pipeline includes calibration to a reference genome. I am wondering whether this might also be the case for the standard SOLiD pipeline.
Comment