Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Would you use the SolexaPipeline or an external base caller?

    Hello,

    A few of my lab co-workers and myself are interested on working with some GAIIx data from a bacterial genome. And well, we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).

    I've browsed the forums a bit and posted on some old threads:
    Alta-Cyclic discussion
    Alta-Cyclic newsbot (well, ECO )
    Bayes-Call discussion
    Swift discussion

    Anyhow, we are just starting to look for information on base-callers and any tips, recommendations and the like are more than welcome ^_^. After all, we want to get as much as we can from the data we received.

    Thank you!
    Leonardo
    L. Collado Torres, Ph.D. student in Biostatistics.

  • #2
    Originally posted by lcollado View Post
    A few of my lab co-workers and myself are interested on working with some GAIIx data from a bacterial genome. And well, we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).
    My advice is that you would be better off spending your time on the downstream analysis rather than fiddling at the edges of the base-calling stage. Alternative base callers are unlikely to provide significant improvements to the GAPipeline - perhaps small increases in quality and yields in the 5% mark. You are analysing bacterial genomes, so the genome size is typically under 10 Mbp, and it appears going beyond 100x coverage gains little. Chances are you'll have more than enough good reads, and improvoving them slightly won't affect the downstream results.

    Comment


    • #3
      I agree with Torst.

      It is also worth pointing out that Illumina made major improvements to the base caller in the most recent release of the SolexaPipeline.

      Before that, Bustard (the base caller) was in fact considered a weak part. Somebody looked through its code (can't find the reference at the moment) and found that it uses quite naive algorithms, which had a lot of room for improvement. Hence, it was comparatively easy to develop something better. Since the recent major overhaul of Bustard, it might be much harder to compete with it, and I imagine that it has now the lead again over third-party tools.

      For us, this update made quite dramatic differences: Shortly before the release of the new pipeline version, we made some yeast RNA-Seq runs on our GAIIx and got (if I remember correctly) c. 13 mio reads per lane passing the chastity filter. As we still had the images, we ran the analysis again after installing the new version of the pipeline and now have more than 19 mio reads.

      Simon

      Comment


      • #4
        Originally posted by Simon Anders View Post
        I agree with Torst.
        Before that, Bustard (the base caller) was in fact considered a weak part. Somebody looked through its code (can't find the reference at the moment) and found that it uses quite naive algorithms, which had a lot of room for improvement. Hence, it was comparatively easy to develop something better. Since the recent major overhaul of Bustard, it might be much harder to compete with it, and I imagine that it has now the lead again over third-party tools.
        Simon
        Any chance you can find that reference? Where did you read it?
        -drd

        Comment


        • #5
          Actually, yes.

          It is this one:

          Nava Whiteford: The Solexa Pipeline


          Note that it treats an old version of the pipeline (the report is dated Dec 2008!). As I said, a lot was changed recently.

          Simon

          Comment


          • #6
            Thanks for the replies and advice ^_^


            The general, hmm, dilemma (if the word fits) is that our boss wants us to focus on a 2nd project (de novo assembling) rather than on this project which is about TSS/operons (similar to this recent paper). But well, I feel uncomfortable working on, say, the "3rd lvl" without knowing that the 1st (base calling) and 2nd lvl (mapping) are solid. By knowing that they are solid it doesn't mean that we'll re-do the work, but at least know more about it and understand them better. Also, de novo assembly is very much explored compared to TSS/operons (I feel that way) in bacteria, therefore TSS/operons are more relevant. If we understand more about all the "lvls" we might find something interesting or at least learn more in the end; which is something I feel is open from taking a look at the above paper.

            For discussion sake, our boss argues that even if you could get 50% (yes, just for discussion) more data from the 1st 2 lvls on the TSS/operons, it isn't really worth it as we are working with a bacteria. From what I see it is that if we could get that much data, we should get more biological data (real data) than noise as the ratio between them would favor this. Say, 40% more data and 10% extra noise, or 30% and 20% in a "bad" case. With more overall data, we could separate "real" data from noise a bit more easily in the following lvls.

            I guess that in the end what I need is a good enough guide for the 1st lvl (2nd one is quite popular) that would be enough to feel comfortable with the Pipeline without having to spend the time to get into the kinks of base calling.

            From your posts, I infer that you agree with our boss.

            Thank you!
            Leonardo
            Last edited by lcollado; 02-24-2010, 10:13 AM.
            L. Collado Torres, Ph.D. student in Biostatistics.

            Comment


            • #7
              Leonardo

              Originally posted by lcollado View Post
              we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).
              I just re-read your post and realised you were using GAPipeline 1.4. That is quite an old version, and in fact the jump from 1.3 to 1.4 was a BIG improvement in algorithm. Most people are using 1.5 now, and migrating to 1.6 (which only works with the latest chemistry).

              This only reinforces my advice to focus on downstream analysis given the data you have. If you were really serious about your strategy of going to the lowest levels first, you would work with biochemists to improve the whole sequencing-by-synthesis procedure! :-)

              Comment


              • #8
                Thanks for your feedback ^_^

                And well, the bottom up for me starts at base calling because I'm not into chemistry. I do get your point though
                L. Collado Torres, Ph.D. student in Biostatistics.

                Comment


                • #9
                  Originally posted by Simon Anders View Post
                  Actually, yes.

                  It is this one:

                  Nava Whiteford: The Solexa Pipeline


                  Note that it treats an old version of the pipeline (the report is dated Dec 2008!). As I said, a lot was changed recently.

                  Simon
                  Almost. In fact it is quite hard to make huge improvements as Nava showed in SWIFT. The whole platform has been revised in parallel to the pipeline. This includes changes to optics, affecting pixels per cluster and signal/noise (as well as number of cluster/image). Tweaks to chemistry reagents, protocols and dyes - improving SN and reducing phasing signals. All of which have a big impact on raw data quality and subsequent base calls. I have wondered what a 1.6 Pipeline would make of GA1 data and what a 1.3 pipeline would make of of GAIIx. I doubt the differences can all be ascribed to software.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 11:49 AM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X