View Single Post
Old 01-17-2011, 12:47 PM   #11
skruglyak
Member
 
Location: San Diego

Join Date: Sep 2010
Posts: 44
Default

Quote:
Originally Posted by simonandrews View Post
From my point of view there are good and bad things in the list:

1,2,3 All good!

4 Not really an issue for us so no feeling either way

5,6 Probably bad (for us at least). Having a folder heirarchy which you can only predict by looking up the sample sheet doesn't make my life any easier. I realise that there are problems with just using technical names for results, but I can see this causing more grief. I'd be interested to see what sort of names you get if you use a blank sample sheet as I guess that that's what we'd do if we want to manage samples and projects from outside the Illumina software.

I take it that in the example posted the run folder at the top of the tree would equate to the current Gerald folder so that it would still be simple to do multiple analysis runs of the same data and get easily separated output?

Also in the example tree why are the BAM files under 'Build' and not 'Aligned'?

7 Good and Bad. I agree that the world seems to have settled on BAM as its file format of choice. The compact size will certainly be welcome, and if we can get SRA/ENA to accept BAM files as submissions then lots of people will be happier - but in the mean time there are a bunch of processing steps which were pretty easy with the old eland output, which will be much harder from a BAM file. Just writing a simple filter to extract some entries from a BAM file and write them out to a new one is really non-trivial if done from scratch, whereas it might just be a grep on an eland file.
Thank you very much for the feedback! Regarding 5 and 6, if there is no sample sheet, we will have simple default names for the project and the sample. I can send you more detail if you would like.

The motivation behind the change is that we are thinking about increased throughput that will lead to many samples on a single flow cell. The ability to organize the results by project and sample will hopefully be useful. Also, the demultiplexing output is well suited for such a structure.

Running repeated analysis on the same data and getting easily separated output folders will continue to be supported.

The BAM files are under BUILD because they are the result of the post alignment process (sorting is done) and because multiple alignment events (flow cells) can be combined into a single build of CASAVA. The ALIGNED folder will contain the zipped exports.

If you need to parse information out of the BAM file, it would seem that conversion to SAM would get you to the text file that you need.

Thanks again,

Semyon
skruglyak is offline   Reply With Quote