Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • CHoyt
    Junior Member
    • Dec 2011
    • 1

    Looking for a few NGS-ers willing to share a bad experience about NGS data analysis

    Hi, everyone! I'm looking for a few people who would be willing to share a bad experience regarding NGS data analysis. Any takers?

    Thanks!
    -Carlton
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    I'll bite, one of the less potentially embarrassing ones:

    We're had several successful little projects doing de novo assembly of phage genomes with 454, but in one case all we got was host contaminant and what looked like human mitochondria. Moral: do more QC on the sample before sequencing. Otherwise you can waste your sequencing money & some analysis time.

    Semi-anonymous user names may discourage posts though. I'm sure people here could share horror stories of colleagues coming to them with "We've just done some sequencing, could you assemble it for us please" with no idea of the scale of the problem nor how much analysis time they should have budgeted for. Probably the best warnings would be saved for off the record conversations at the pub/bar at conferences!

    Comment

    • maubp
      Peter (Biopython etc)
      • Jul 2009
      • 1544

      #3
      Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

      e.g. http://seqanswers.com/forums/showthread.php?t=15896

      Edit: To make my point more explicit (thanks Simon), the point is you should be diligent in your record keeping (electronic lab book or whatever works for you) and include the version number of key packages and datasets/databases since this can sometimes make a surprising difference to the results. This goes beyond high throughput sequencing, and applies to Bioinformatics as a whole.
      Last edited by maubp; 12-08-2011, 02:32 AM.

      Comment

      • adaptivegenome
        Super Moderator
        • Nov 2009
        • 436

        #4
        My favorite one of all time:


        Check out supplementary table 1

        Comment

        • simonandrews
          Simon Andrews
          • May 2009
          • 870

          #5
          Originally posted by maubp View Post
          Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.
          To try to make a wider point - this is why we advocate getting our users to visualise and explore their data. Running a tool, however good it may be, tends to make people too trusting in the results produced. If you can actually view those results in a number of different ways then you get a much better feel for how much confidence they can have in the hits they see.

          For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by simonandrews View Post
            For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.
            Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.

            Comment

            • polyatail
              Member
              • Dec 2010
              • 25

              #7
              Originally posted by genericforms View Post
              My favorite one of all time:


              Check out supplementary table 1
              Buccal swab?

              Comment

              • adaptivegenome
                Super Moderator
                • Nov 2009
                • 436

                #8
                Originally posted by polyatail View Post
                Buccal swab?
                LOL! Must have been!

                Comment

                • simonandrews
                  Simon Andrews
                  • May 2009
                  • 870

                  #9
                  Originally posted by maubp View Post
                  Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.
                  As if to prove a point, I saw this tweet this morning.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  13 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  48 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  107 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  125 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...