SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GeneProf - Next-Gen Analysis for Next-Gen Data florian Bioinformatics 0 01-30-2012 03:21 AM
Miso's open source joyce kang Bioinformatics 1 01-25-2012 07:25 AM
Targeted resequencing - open source stanford_genome_tech Genomic Resequencing 3 09-27-2011 04:27 PM
EKOPath 4 going open source dnusol Bioinformatics 0 06-15-2011 02:10 AM
PubMed: Swift: Primary Data Analysis for the Illumina Solexa Sequencing Platform. Newsbot! Literature Watch 0 06-25-2009 06:00 AM

Reply
 
Thread Tools
Old 09-19-2008, 03:25 PM   #1
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default Swift: Open source primary data analysis for Next-gen sequencers

Right now that primary data is processed with closed source proprietary tools provided by the manufacturer. That's really unfortunate because the data is being used to draw scientific conclusions. It's difficult to trust your data and understand the artifacts in it if the data analysis algorithms are not open to peer review. Not only that but it means you can't easily change things and try out new methods.

Until recently I was working at the Sanger Institute and in order to address this we have been developing a primary data analysis package for next-gen sequence data. At the moment our tools are aimed at Illumina data, but it should be possible to adapt them for processing SOLiD images as well.

I've recently left Sanger, to pursue a career in next-next-gen sequencing at Oxford Nanopore Technologies. I'm going to continue developing Swift, as will my colleagues (particularly Tom Skelly who's put a lot of work in to Swift) at Sanger.

While Swift is fully functional, it could do with more validation and testing. However, we've decided that we'd like to make it available to the wider community in the hope of gaining support and ideally attracting more developers.

Right now, the post image analysis corrections (basecalling) in Swift work well, generally it produces error rates lower than the Illumina pipeline. It's probably ready for production usage, so feel free to try it out and let us know what you find.

The native image analysis works but is more of a work in progress, we'd like people to try it out too and tell us what happens.

Swift is available under LGPL3 at: http://swiftng.sourceforge.net

You'll need to check it out of the subversion repository to run it, but it should be reasonably straight forward. Please email me if you have any trouble.

I'm very interested in getting any feedback, positive or negative. You can either post here or contact me direct: new at sgenomics dot org.
new300 is offline   Reply With Quote
Old 09-20-2008, 03:22 AM   #2
cgb
Member
 
Location: Cambridge

Join Date: May 2008
Posts: 50
Default cool

i wonder if if can be put onto a boot DVD and run on the iPar computers - data mirrored in real time using the sanger mirroring scripts ?
cgb is offline   Reply With Quote
Old 09-20-2008, 03:14 PM   #3
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by cgb View Post
i wonder if if can be put onto a boot DVD and run on the iPar computers - data mirrored in real time using the sanger mirroring scripts ?
Yes, this absolutely should be possible and is something we'd like to look in to. Users interested in doing this are encouraged to make contract.
new300 is offline   Reply With Quote
Old 09-23-2008, 04:08 AM   #4
dvh
Member
 
Location: london, uk

Join Date: Jul 2008
Posts: 35
Default

Could you maybe share some stats as to how Swift performs vs the current version of Bustard?
E.g. amount of data/reads mapped, error rate for the same lane analysed both ways.
thanks
david
dvh is offline   Reply With Quote
Old 11-15-2008, 01:53 PM   #5
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by dvh View Post
Could you maybe share some stats as to how Swift performs vs the current version of Bustard?
E.g. amount of data/reads mapped, error rate for the same lane analysed both ways.
thanks
david
I'm still in the process of validating it on non-phiX data. For the phiX data I've looked at, against the 1.0 pipeline I've seen 20% more PF reads at a similar error rate.

In terms of runtime, a GA1 single end takes around 10mins end to end. GA2 37 cycles paired end takes around an hour end to end.
new300 is offline   Reply With Quote
Old 11-15-2008, 01:57 PM   #6
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

In terms of memory usage we're trying to stay within a 2Gb limit. A 37Gb paired end peaks at around 1Gb.
new300 is offline   Reply With Quote
Old 11-17-2008, 11:20 AM   #7
timread
Member
 
Location: Atlanta, Georgia

Join Date: Oct 2008
Posts: 14
Default

BTW - the link: http://swiftng.sourceforge.net appears to be broken.

The connection seems to be a problem only from my desktop at work (which is behind a US government firewall). From other locations i can get through OK.

Last edited by timread; 11-18-2008 at 12:44 PM. Reason: clarification of connection problem
timread is offline   Reply With Quote
Old 11-18-2008, 12:54 AM   #8
cgb
Member
 
Location: Cambridge

Join Date: May 2008
Posts: 50
Default

works for me
cgb is offline   Reply With Quote
Old 11-20-2008, 03:41 AM   #9
iris42
Junior Member
 
Location: Denmark

Join Date: Nov 2008
Posts: 1
Default

Is it normal to see different output when running the same binary version of swift on the same computer for multiple times and running it on different computers? I observed both. It looks like most of the differences in the fastq output is the quality scores.
iris42 is offline   Reply With Quote
Old 11-20-2008, 07:50 AM   #10
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by iris42 View Post
Is it normal to see different output when running the same binary version of swift on the same computer for multiple times and running it on different computers? I observed both. It looks like most of the differences in the fastq output is the quality scores.
Running on different computers it's quite likely that the output will vary slightly as they are likely to have different floating point implementations.

On the same computer is a little odd, how different are the results? If it's a small difference then this could be down to the FFTW implementation we are using which sometimes employs a non-deterministic algorithm.
new300 is offline   Reply With Quote
Old 11-20-2008, 07:51 AM   #11
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by timread View Post
BTW - the link: http://swiftng.sourceforge.net appears to be broken.

The connection seems to be a problem only from my desktop at work (which is behind a US government firewall). From other locations i can get through OK.
Odd, you can try: http://sgenomics.org/swift/ which should also work.
new300 is offline   Reply With Quote
Old 11-24-2008, 07:17 AM   #12
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

I'm quite interested in using open-source software for scientific work. We have recently acquired an Illumina GAII machine, and are trying to come up with data management solutions. Right now we are planning to throw away the images after the primary analysis (base-calling) is completed. We are saving the intensity and noise files, but not the images, which seems to be fairly common. However, it seems that this software requires the original images, which makes sense, but would limit our ability to use it on past experiments.

Would it be feasible to use swift on the Firecrest output (intensity and noise)?

Do many labs actually save the image files?

It seems like an ideal initial setup would be to process the images with both the Illumina pipeline and Swift. Has anyone yet set this up?
lparsons is offline   Reply With Quote
Old 11-24-2008, 08:21 AM   #13
clivey
Member
 
Location: Oxford

Join Date: Jul 2008
Posts: 24
Default

sanger have it set up - talk to Tom Skelley.

Images are still very diagnostic of any issue with your sample or sequencer (or run). Looking at images allowed sanger to optimise their pipeline. For example, when your flowcell quality goes down, or an operator gets oil on the flowcell etc., or your focusing is off and you suddenly get lots of strange new 'contaminants' in your output file as a result, or your base qualities all drop halfway through your project, youe data goes bad and you look and your clusters look wierd coz of an issue with your cluster station, or theres stuff growing in your reagents appearing as blobs on the images (but not visible to the naked eye), or your flowcell surface isnt there etc etc. You should keep them for QC - then throw them. Generally (but not in all cases) higher throughput labs with big projects indulge in some image retention for some period.
clivey is offline   Reply With Quote
Old 11-25-2008, 01:33 AM   #14
new300
Member
 
Location: northern hemisphere

Join Date: Mar 2008
Posts: 50
Default

Quote:
Originally Posted by lparsons View Post
I'm quite interested in using open-source software for scientific work. We have recently acquired an Illumina GAII machine, and are trying to come up with data management solutions. Right now we are planning to throw away the images after the primary analysis (base-calling) is completed. We are saving the intensity and noise files, but not the images, which seems to be fairly common. However, it seems that this software requires the original images, which makes sense, but would limit our ability to use it on past experiments.
Are you using iPar to process the images and then mirroring off the intensity files? Swift will process from intensity files (as produced by the UNIX pipeline). I've heard the iPar intensity format is different from that used by the UNIX pipeline if someone wants to send me a sample file I'll write a parser for it.

Quote:
Originally Posted by lparsons View Post
Would it be feasible to use swift on the Firecrest output (intensity and noise)?
Yes it's feasible, I would hope the results would be comparable with the Illumina pipeline.

Quote:
Originally Posted by lparsons View Post

Do many labs actually save the image files?

It seems like an ideal initial setup would be to process the images with both the Illumina pipeline and Swift. Has anyone yet set this up?
As mentioned Sanger save the images while they do QC, the images are mirrored off as the run progresses and processed using the UNIX pipeline on a separate cluster.

If you're interested in trying out Swift drop me an email at new at sgenomics dot org. It's in ``active development'' at the moment and I'm happy to work with people on any issues that come up.
new300 is offline   Reply With Quote
Old 02-27-2009, 08:17 AM   #15
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Are there any updates on SWIFT? data sizes, number of files generated, comparison with Illumina pipeline results..
bioinfosm is offline   Reply With Quote
Old 02-19-2010, 08:15 AM   #16
lcollado
Member
 
Location: Baltimore, MD

Join Date: Jun 2009
Posts: 65
Default

Hello,

I know that this is an old thread, but I'm curious to know how Swift compares vs the most recent SolexaPipeline versions.

Thank you!
Leonardo
__________________
L. Collado Torres, Ph.D. student in Biostatistics.
lcollado is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO