If we are going to do this then the possibility of a very interesting project arises. We have put a lot of effort at CRI into experimental deisgn to reduce the impact of confounding factors see http://1.usa.gov/jh7U1h "The cost of reducing starting RNA quantity for Illumina BeadArrays: a bead-level dilution experiment."
As a group of people running well over 1000 instruments (from the google map) we could put together a definitive MAQC style paper as a group collaboration.
This would also allow comparison of just about every technology. As a group we have access to everything!
Design might look like:
Bringing all this together with the bioinformatics community on SEQanswers doing the analysis would be great. The run requirements would be small so anyone with an instrument could get involved. Similarly for some of the basic analysis efforts. It would be a real community challenge. And we could make the whole thing public so anyone could get on board.
We'd end up with a rich data set that showed how similar the technologies were on a genome sequencing study. We'd be able to say a lot about how insert size and run type affect genome sequencing or analysis. And we would have a lot of instrument variation data, is SOLiD more variable than Illumina, Ion, PacBio?
Lastly if you wanted to make it a personalised Genome experiment we could offer our own genomes up for sequencing. I think I'd be happy to put mine forward. Anyone else?
As a group of people running well over 1000 instruments (from the google map) we could put together a definitive MAQC style paper as a group collaboration.
This would also allow comparison of just about every technology. As a group we have access to everything!
Design might look like:
- Genome - pick one relativley simple such as C. elegans
- Prep - ideally several groups would make libraries from a defined strain from http://www.cbs.umn.edu/CGC/index.html
- Insert size - we could vary this by collecting 200, 300 and 400bp insert libraries
- Barcode - use Illumina TruSeq barcodes on libraries so a pool could be made for that platform. Similarly for other platforms.
- Instrument - use any instrument. Perhaps a different person could co-ordinate each technology separately, I'd be happy to organise Illumina.
- Run Type - SE and PE of different lengths. If barcoded then this could be spiked into any run pretty easily.
- Institution - run the pool across as many institutes as possible
- Other - there are so many factors we could vary with the right design and co-ordination. Feel free to suggest others.
Bringing all this together with the bioinformatics community on SEQanswers doing the analysis would be great. The run requirements would be small so anyone with an instrument could get involved. Similarly for some of the basic analysis efforts. It would be a real community challenge. And we could make the whole thing public so anyone could get on board.
We'd end up with a rich data set that showed how similar the technologies were on a genome sequencing study. We'd be able to say a lot about how insert size and run type affect genome sequencing or analysis. And we would have a lot of instrument variation data, is SOLiD more variable than Illumina, Ion, PacBio?
Lastly if you wanted to make it a personalised Genome experiment we could offer our own genomes up for sequencing. I think I'd be happy to put mine forward. Anyone else?
Comment