Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • For sequence quality assessment, do the raw image files provide any added value?

    I just got done with an analysis based on a Sanger sequencing assay, where having the raw trace files was absolutely invaluable. They saved me from publishing some very "interesting" results which were actually experimental artifacts because they showed up in the trace files as sequencing anomalies which weren't caught by the SNP detection software we used.

    Now I'm moving on to a project based on SOLiD sequencing, and I've learned that my collaborators are throwing away the raw fluorescence image files. The fact that I will not be able to go back to the raw data to check for anomalies makes me nervous, and I am trying to decide whether to push back on this policy. I am wondering whether people assessing SOLiD sequence calls ever get any added value from examining a sample of pertinent raw image files.

  • #2
    Me too

    I would love to see this too. I have the same problem. Are there even any samples out there? I could imagine even going a step further back and looking at calibration files if they exist.

    Comment


    • #3
      Originally posted by throwaway View Post
      I just got done with an analysis based on a Sanger sequencing assay, where having the raw trace files was absolutely invaluable. They saved me from publishing some very "interesting" results which were actually experimental artifacts because they showed up in the trace files as sequencing anomalies which weren't caught by the SNP detection software we used.

      Now I'm moving on to a project based on SOLiD sequencing, and I've learned that my collaborators are throwing away the raw fluorescence image files. The fact that I will not be able to go back to the raw data to check for anomalies makes me nervous, and I am trying to decide whether to push back on this policy. I am wondering whether people assessing SOLiD sequence calls ever get any added value from examining a sample of pertinent raw image files.
      Saving a sampling of images is great for debugging after the fact, but not for re-use in downstream analysis. Saving every 1000th image or something would seem a good data reduction. Anyone else have some thoughts?

      Comment


      • #4
        ..........
        Last edited by Alex Coventry; 04-09-2010, 02:35 PM.

        Comment


        • #5
          So, what kind of debugging do you use the images for? I have heard of people using them to diagnose primer failures, but not much beyond that.

          Comment


          • #6
            Originally posted by throwaway View Post
            So, what kind of debugging do you use the images for? I have heard of people using them to diagnose primer failures, but not much beyond that.
            Pretty much exactly that.

            Comment


            • #7
              The cost to store images often outweighs the value. I have never seen a reason to go back to the raw image files. If your primary metrics are fine, the images will not help.

              Comment


              • #8
                Originally posted by snetmcom View Post
                The cost to store images often outweighs the value. I have never seen a reason to go back to the raw image files. If your primary metrics are fine, the images will not help.
                It also depends on the company, but you might need a subset of images to provide to the company to prove that there is a problem with the sequencer or reagents. Just a thought.
                Last edited by nilshomer; 04-11-2010, 09:16 PM. Reason: speak-and-spell

                Comment


                • #9
                  Image QC

                  Was very useful when setting up and debugging machines, chemistry and protocols at Solexa and Sanger.

                  Often systemic issues with the instrument showed up as easily recognisible problems in the images. It is easy to know what an ideal image should look like and any deviation from this can be quickly and easily recognized by eye (human brain is very good at this sort of pattern recognition). For example, contaminants in the reagents would show as bright blobs that would then get falsely called as clusters leading to pseudo sequences. Blobs adjacent to real clusters would spill over signal and skew base calls (and perhaps your SNP calls). Badly set up optics would lead to uneven illumination across the tile giving poor or artefactual base calls. Flaws in the focusing software, occuring sporadically would do the same. Manufacturing faults in the flowcells would lead to flowcell walls being images, altering focusing, and giving rise to pseudo-sequences. Other flaws in the flowcell coatings would show up in the images but not in the metrics, 'black holes' in the surfaces. Primer problems would give rise distinct sub-populations of 'speck' clusters. Optical duplicates could be spotted using the X,Y coordinates of similar sequences (I think based on alignment start and stops) and backtracking to the images demonstrated that some of this was due to egde effects and stage movements (and others) - and so on.

                  In those days on GAI and IIs you could breeze into the lab and watch the images popping up on a selected number of tiles and hopefully they would be reassuringly normal.

                  Yes indeed, if the manufacturers changed something in reagents or in the instrumentation and it was not beneficial or introduced a sporadic bug then the images provided a very powerful way to communicate that to them - especially when those bugs were only affecting some machines or batches and not others. We went back to images a lot in fact, contrary to what one of the posters asserts, although i do not know if this still holds true with more recent systems.

                  The argument was - when the instruments and reagents become completely reliable black box, low variability systems with well defined behavior and outputs - of course then you don't need to back track to images.

                  For sure, these days, a lot of QC metrics are provided and it may be possible to spot things more easily from these, and of course, far more images are produced now making it much more difficult to keep and review them. But personally, given the rate of change of these systems and the issues with variability, if i were working on a very large project spanning and long time period, I'd backtrack a bit and ensure that a run or base-call on a recent run equates to a base call on one of my earlier data sets where i may be combining these data or making comparisons.

                  If you are modifying run protocols, cluster creation or other aspects of the system that are non-standard then its probably wise to back rack to images.

                  Personally, I like to know whats going on under the hood, then i can head off a breakdown and get more mileage and economy. Some people like to just turn the key and drive.
                  Last edited by clivey; 04-12-2010, 01:09 AM.

                  Comment


                  • #10
                    Thanks for the information, clivey. It's hard to know how many of those issues will arise with SOLiD runs, but it provides a place to start looking.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    81 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X