View Single Post
Old 09-13-2010, 12:47 AM   #6
Simon Andrews
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871

Originally Posted by sandhya View Post
I understand the sentences separately but when I read them together I find them contradictory. Please let me know about any reading material to familiarise with these concepts.
The wikipedia article on FastQ format summarises the different versions pretty well.

Originally Posted by sandhya View Post
Again what does 'main sequence repositories' mean?
Places like the NCBI short read archive or the European nucleotide archive. They will keep their data in a single encoding format (Sanger) to avoid this kind of confusion, so Illumina data submitted to them will have its quality encoding changed.

Originally Posted by sandhya View Post
Nevertheless, I was able to read in the datasets using R with the 'fastq' format. So guess I can continue with the programming
It's worth checking that you used the correct options. It's possible to read quality values using the wrong encoding and get no errors, but find that you've recorded the qualities incorrectly (though probably not by much in most cases).
simonandrews is offline   Reply With Quote