Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lcollado
    Member
    • Jun 2009
    • 65

    Would you use the SolexaPipeline or an external base caller?

    Hello,

    A few of my lab co-workers and myself are interested on working with some GAIIx data from a bacterial genome. And well, we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).

    I've browsed the forums a bit and posted on some old threads:
    Alta-Cyclic discussion
    Alta-Cyclic newsbot (well, ECO )
    Bayes-Call discussion
    Swift discussion

    Anyhow, we are just starting to look for information on base-callers and any tips, recommendations and the like are more than welcome ^_^. After all, we want to get as much as we can from the data we received.

    Thank you!
    Leonardo
    L. Collado Torres, Ph.D. student in Biostatistics.
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Originally posted by lcollado View Post
    A few of my lab co-workers and myself are interested on working with some GAIIx data from a bacterial genome. And well, we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).
    My advice is that you would be better off spending your time on the downstream analysis rather than fiddling at the edges of the base-calling stage. Alternative base callers are unlikely to provide significant improvements to the GAPipeline - perhaps small increases in quality and yields in the 5% mark. You are analysing bacterial genomes, so the genome size is typically under 10 Mbp, and it appears going beyond 100x coverage gains little. Chances are you'll have more than enough good reads, and improvoving them slightly won't affect the downstream results.

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      I agree with Torst.

      It is also worth pointing out that Illumina made major improvements to the base caller in the most recent release of the SolexaPipeline.

      Before that, Bustard (the base caller) was in fact considered a weak part. Somebody looked through its code (can't find the reference at the moment) and found that it uses quite naive algorithms, which had a lot of room for improvement. Hence, it was comparatively easy to develop something better. Since the recent major overhaul of Bustard, it might be much harder to compete with it, and I imagine that it has now the lead again over third-party tools.

      For us, this update made quite dramatic differences: Shortly before the release of the new pipeline version, we made some yeast RNA-Seq runs on our GAIIx and got (if I remember correctly) c. 13 mio reads per lane passing the chastity filter. As we still had the images, we ran the analysis again after installing the new version of the pipeline and now have more than 19 mio reads.

      Simon

      Comment

      • drio
        Senior Member
        • Oct 2008
        • 323

        #4
        Originally posted by Simon Anders View Post
        I agree with Torst.
        Before that, Bustard (the base caller) was in fact considered a weak part. Somebody looked through its code (can't find the reference at the moment) and found that it uses quite naive algorithms, which had a lot of room for improvement. Hence, it was comparatively easy to develop something better. Since the recent major overhaul of Bustard, it might be much harder to compete with it, and I imagine that it has now the lead again over third-party tools.
        Simon
        Any chance you can find that reference? Where did you read it?
        -drd

        Comment

        • Simon Anders
          Senior Member
          • Feb 2010
          • 995

          #5
          Actually, yes.

          It is this one:

          Nava Whiteford: The Solexa Pipeline


          Note that it treats an old version of the pipeline (the report is dated Dec 2008!). As I said, a lot was changed recently.

          Simon

          Comment

          • lcollado
            Member
            • Jun 2009
            • 65

            #6
            Thanks for the replies and advice ^_^


            The general, hmm, dilemma (if the word fits) is that our boss wants us to focus on a 2nd project (de novo assembling) rather than on this project which is about TSS/operons (similar to this recent paper). But well, I feel uncomfortable working on, say, the "3rd lvl" without knowing that the 1st (base calling) and 2nd lvl (mapping) are solid. By knowing that they are solid it doesn't mean that we'll re-do the work, but at least know more about it and understand them better. Also, de novo assembly is very much explored compared to TSS/operons (I feel that way) in bacteria, therefore TSS/operons are more relevant. If we understand more about all the "lvls" we might find something interesting or at least learn more in the end; which is something I feel is open from taking a look at the above paper.

            For discussion sake, our boss argues that even if you could get 50% (yes, just for discussion) more data from the 1st 2 lvls on the TSS/operons, it isn't really worth it as we are working with a bacteria. From what I see it is that if we could get that much data, we should get more biological data (real data) than noise as the ratio between them would favor this. Say, 40% more data and 10% extra noise, or 30% and 20% in a "bad" case. With more overall data, we could separate "real" data from noise a bit more easily in the following lvls.

            I guess that in the end what I need is a good enough guide for the 1st lvl (2nd one is quite popular) that would be enough to feel comfortable with the Pipeline without having to spend the time to get into the kinks of base calling.

            From your posts, I infer that you agree with our boss.

            Thank you!
            Leonardo
            Last edited by lcollado; 02-24-2010, 10:13 AM.
            L. Collado Torres, Ph.D. student in Biostatistics.

            Comment

            • Torst
              Senior Member
              • Apr 2008
              • 275

              #7
              Leonardo

              Originally posted by lcollado View Post
              we want to start from the bottom up hence why right now we want to evaluate whether to use a base caller different to the SolexaPipeline (I think it was 1.4).
              I just re-read your post and realised you were using GAPipeline 1.4. That is quite an old version, and in fact the jump from 1.3 to 1.4 was a BIG improvement in algorithm. Most people are using 1.5 now, and migrating to 1.6 (which only works with the latest chemistry).

              This only reinforces my advice to focus on downstream analysis given the data you have. If you were really serious about your strategy of going to the lowest levels first, you would work with biochemists to improve the whole sequencing-by-synthesis procedure! :-)

              Comment

              • lcollado
                Member
                • Jun 2009
                • 65

                #8
                Thanks for your feedback ^_^

                And well, the bottom up for me starts at base calling because I'm not into chemistry. I do get your point though
                L. Collado Torres, Ph.D. student in Biostatistics.

                Comment

                • cgb
                  Member
                  • May 2008
                  • 50

                  #9
                  Originally posted by Simon Anders View Post
                  Actually, yes.

                  It is this one:

                  Nava Whiteford: The Solexa Pipeline


                  Note that it treats an old version of the pipeline (the report is dated Dec 2008!). As I said, a lot was changed recently.

                  Simon
                  Almost. In fact it is quite hard to make huge improvements as Nava showed in SWIFT. The whole platform has been revised in parallel to the pipeline. This includes changes to optics, affecting pixels per cluster and signal/noise (as well as number of cluster/image). Tweaks to chemistry reagents, protocols and dyes - improving SN and reducing phasing signals. All of which have a big impact on raw data quality and subsequent base calls. I have wondered what a 1.6 Pipeline would make of GA1 data and what a 1.3 pipeline would make of of GAIIx. I doubt the differences can all be ascribed to software.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  41 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  102 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  123 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  114 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...