![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GeneProf - Next-Gen Analysis for Next-Gen Data | florian | Bioinformatics | 0 | 01-30-2012 03:21 AM |
Miso's open source | joyce kang | Bioinformatics | 1 | 01-25-2012 07:25 AM |
Targeted resequencing - open source | stanford_genome_tech | Genomic Resequencing | 3 | 09-27-2011 04:27 PM |
EKOPath 4 going open source | dnusol | Bioinformatics | 0 | 06-15-2011 02:10 AM |
PubMed: Swift: Primary Data Analysis for the Illumina Solexa Sequencing Platform. | Newsbot! | Literature Watch | 0 | 06-25-2009 06:00 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]()
Right now that primary data is processed with closed source proprietary tools provided by the manufacturer. That's really unfortunate because the data is being used to draw scientific conclusions. It's difficult to trust your data and understand the artifacts in it if the data analysis algorithms are not open to peer review. Not only that but it means you can't easily change things and try out new methods.
Until recently I was working at the Sanger Institute and in order to address this we have been developing a primary data analysis package for next-gen sequence data. At the moment our tools are aimed at Illumina data, but it should be possible to adapt them for processing SOLiD images as well. I've recently left Sanger, to pursue a career in next-next-gen sequencing at Oxford Nanopore Technologies. I'm going to continue developing Swift, as will my colleagues (particularly Tom Skelly who's put a lot of work in to Swift) at Sanger. While Swift is fully functional, it could do with more validation and testing. However, we've decided that we'd like to make it available to the wider community in the hope of gaining support and ideally attracting more developers. Right now, the post image analysis corrections (basecalling) in Swift work well, generally it produces error rates lower than the Illumina pipeline. It's probably ready for production usage, so feel free to try it out and let us know what you find. The native image analysis works but is more of a work in progress, we'd like people to try it out too and tell us what happens. Swift is available under LGPL3 at: http://swiftng.sourceforge.net You'll need to check it out of the subversion repository to run it, but it should be reasonably straight forward. Please email me if you have any trouble. I'm very interested in getting any feedback, positive or negative. You can either post here or contact me direct: new at sgenomics dot org. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Cambridge Join Date: May 2008
Posts: 50
|
![]()
i wonder if if can be put onto a boot DVD and run on the iPar computers - data mirrored in real time using the sanger mirroring scripts ?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]()
Yes, this absolutely should be possible and is something we'd like to look in to. Users interested in doing this are encouraged to make contract.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: london, uk Join Date: Jul 2008
Posts: 35
|
![]()
Could you maybe share some stats as to how Swift performs vs the current version of Bustard?
E.g. amount of data/reads mapped, error rate for the same lane analysed both ways. thanks david |
![]() |
![]() |
![]() |
#5 | |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]() Quote:
In terms of runtime, a GA1 single end takes around 10mins end to end. GA2 37 cycles paired end takes around an hour end to end. |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]()
In terms of memory usage we're trying to stay within a 2Gb limit. A 37Gb paired end peaks at around 1Gb.
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Atlanta, Georgia Join Date: Oct 2008
Posts: 14
|
![]()
BTW - the link: http://swiftng.sourceforge.net appears to be broken.
The connection seems to be a problem only from my desktop at work (which is behind a US government firewall). From other locations i can get through OK. Last edited by timread; 11-18-2008 at 12:44 PM. Reason: clarification of connection problem |
![]() |
![]() |
![]() |
#8 |
Member
Location: Cambridge Join Date: May 2008
Posts: 50
|
![]()
works for me
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Denmark Join Date: Nov 2008
Posts: 1
|
![]()
Is it normal to see different output when running the same binary version of swift on the same computer for multiple times and running it on different computers? I observed both. It looks like most of the differences in the fastq output is the quality scores.
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]() Quote:
On the same computer is a little odd, how different are the results? If it's a small difference then this could be down to the FFTW implementation we are using which sometimes employs a non-deterministic algorithm. |
|
![]() |
![]() |
![]() |
#11 | |
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 |
Member
Location: NJ Join Date: Nov 2008
Posts: 28
|
![]()
I'm quite interested in using open-source software for scientific work. We have recently acquired an Illumina GAII machine, and are trying to come up with data management solutions. Right now we are planning to throw away the images after the primary analysis (base-calling) is completed. We are saving the intensity and noise files, but not the images, which seems to be fairly common. However, it seems that this software requires the original images, which makes sense, but would limit our ability to use it on past experiments.
Would it be feasible to use swift on the Firecrest output (intensity and noise)? Do many labs actually save the image files? It seems like an ideal initial setup would be to process the images with both the Illumina pipeline and Swift. Has anyone yet set this up? |
![]() |
![]() |
![]() |
#13 |
Member
Location: Oxford Join Date: Jul 2008
Posts: 24
|
![]()
sanger have it set up - talk to Tom Skelley.
Images are still very diagnostic of any issue with your sample or sequencer (or run). Looking at images allowed sanger to optimise their pipeline. For example, when your flowcell quality goes down, or an operator gets oil on the flowcell etc., or your focusing is off and you suddenly get lots of strange new 'contaminants' in your output file as a result, or your base qualities all drop halfway through your project, youe data goes bad and you look and your clusters look wierd coz of an issue with your cluster station, or theres stuff growing in your reagents appearing as blobs on the images (but not visible to the naked eye), or your flowcell surface isnt there etc etc. You should keep them for QC - then throw them. Generally (but not in all cases) higher throughput labs with big projects indulge in some image retention for some period. |
![]() |
![]() |
![]() |
#14 | |||
Member
Location: northern hemisphere Join Date: Mar 2008
Posts: 50
|
![]() Quote:
Quote:
Quote:
If you're interested in trying out Swift drop me an email at new at sgenomics dot org. It's in ``active development'' at the moment and I'm happy to work with people on any issues that come up. |
|||
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: USA Join Date: Jan 2008
Posts: 482
|
![]()
Are there any updates on SWIFT? data sizes, number of files generated, comparison with Illumina pipeline results..
|
![]() |
![]() |
![]() |
#16 |
Member
Location: Baltimore, MD Join Date: Jun 2009
Posts: 65
|
![]()
Hello,
I know that this is an old thread, but I'm curious to know how Swift compares vs the most recent SolexaPipeline versions. Thank you! Leonardo
__________________
L. Collado Torres, Ph.D. student in Biostatistics. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|